HBase의 현재 상황 및 앞으로의 방향을 알 수 있는 메일
- Posted at 2007/12/04 23:53
- Filed under project/lucene_hadoop
hadoop 메일링 리스트에 다음과 같은 내용의 글이 올라왔다.
I'd say that the current state of Hbase is more suited to offline processing than to online serving duties, but I do envision that the roadmap for Hbase could extend to cover those capabilities. Currently, however, Michael and Jim are spending most of their time stabilizing the core of the system and working on basic performance bottlenecks, especially as several large scale Hbase installations are starting to pop up and file issues.
Here are some of the things that I think would move Hbase in the right direction for online serving:
1. Atomic appends for a single writer (HADOOP-1700): We have to have atomic appends for the commit log or durability is not guaranteed. This is a pressing issue in any case for any offline processing use case that requires a 100% guarantee on durability.
2. Real-time master failover: Need to make sure there is zero downtime on failure of the HDFS master and the Hbase master. Perhaps the Zookeeper project will provide the key part of the solution although I don't have much visibility into where Zookeeper stands and what its roadmap looks like. Can anyone say anything more?
3. More performance work: Michael did some performance measurements a while back that seemed to indicate a lot of time spent back-and-forth in RPC. We're exploring Thrift as a lighter-weight RPC mechanism, but there are probably other things to be done to reduce this cost. More analysis and measurement would be helpful.
4. Tighter integration between HDFS and Hbase: Preference for running the region server on the same node as one of the replicas of the underlying tables would lower latency.
5. Memory caching: Instead of pinning a whole Hbase table in RAM, I'd recommend the use of memcached in front of Hbase to provide cached read access.
Once these things are in place, Hbase could provide a reasonably performant large-scale online serving system. The main advantages of such a system would be its flexible schema, automatic repartitioning, and centralized administration, especially when compared with a system based around many separate MySQL instances with memcached in front of them. It would not have full ACID properties but there are many interesting applications that don't require strong guarantees in those areas.
Anyone who'd like to start tackling any of the above items should feel free to chime in here or jump on the Hbase IRC - more contributors always welcome!
작성자는 Chad Walters라고 되어 있는데 메일을 보니 HBase 메인 개발자가 근무하는 PowerSet에 같이 일하고 있는 것 같다. PowerSet에서 HBase는 대부분 개발하고 있는 것 같은데....
내용은 아직 HBase는 온라인 서비스에는 이용하기 어렵고, 현재는 코어 모듈의 안정화에 최우선 작업중이라고 한다. 그리고 큰 규모의 클러스터에도 안정적으로 운영되도록 하는데도 집중한다고.
Here are some of the things that I think would move Hbase in the right direction for online serving:
1. Atomic appends for a single writer (HADOOP-1700): We have to have atomic appends for the commit log or durability is not guaranteed. This is a pressing issue in any case for any offline processing use case that requires a 100% guarantee on durability.
2. Real-time master failover: Need to make sure there is zero downtime on failure of the HDFS master and the Hbase master. Perhaps the Zookeeper project will provide the key part of the solution although I don't have much visibility into where Zookeeper stands and what its roadmap looks like. Can anyone say anything more?
3. More performance work: Michael did some performance measurements a while back that seemed to indicate a lot of time spent back-and-forth in RPC. We're exploring Thrift as a lighter-weight RPC mechanism, but there are probably other things to be done to reduce this cost. More analysis and measurement would be helpful.
4. Tighter integration between HDFS and Hbase: Preference for running the region server on the same node as one of the replicas of the underlying tables would lower latency.
5. Memory caching: Instead of pinning a whole Hbase table in RAM, I'd recommend the use of memcached in front of Hbase to provide cached read access.
Once these things are in place, Hbase could provide a reasonably performant large-scale online serving system. The main advantages of such a system would be its flexible schema, automatic repartitioning, and centralized administration, especially when compared with a system based around many separate MySQL instances with memcached in front of them. It would not have full ACID properties but there are many interesting applications that don't require strong guarantees in those areas.
Anyone who'd like to start tackling any of the above items should feel free to chime in here or jump on the Hbase IRC - more contributors always welcome!
작성자는 Chad Walters라고 되어 있는데 메일을 보니 HBase 메인 개발자가 근무하는 PowerSet에 같이 일하고 있는 것 같다. PowerSet에서 HBase는 대부분 개발하고 있는 것 같은데....
내용은 아직 HBase는 온라인 서비스에는 이용하기 어렵고, 현재는 코어 모듈의 안정화에 최우선 작업중이라고 한다. 그리고 큰 규모의 클러스터에도 안정적으로 운영되도록 하는데도 집중한다고.
Posted by 김형준
- Response
- No Trackback , 1 Comment
Trackback URL : http://www.jaso.co.kr/trackback/197
Comments List
-
역시나 아직 리얼타임으론 어렵군요






