HBase is not designed to replace MySQL entirely and in my opinion will not go in that direction in the future either. Here's why:

1. HBase is designed to handle large volumes of data and large number of concurrent clients. It really shines when you operate it at scale. If the storage requirements of an application are not something that requires a distributed database, HBase is not the right choice. There is no benefit in using an HBase cluster to store a few 100 GBs of data. That can be very easily handled by a single node database. You can replicate it to achieve higher durability guarantees and fault tolerance. That's an entirely different discussion.

2. To achieve scalability, HBase makes trade offs on certain properties. For instance, HBase does not provide cross row transactions.

3. Scalability in terms of volume was one requirement that informed the design of HBase. Another requirement was having a flexible data model. HBase does not have the same data model as traditional database systems like MySQL. There is no concept of predefined columns and types. HBase has column families (think of them as groups of columns), with each column family allowing arbitrary number of columns, without strict type definition. The column names are actually given at write time and not during table instantiation. This allows for flexibility in the type of data that is stored.

4. HBase does not have a query language like SQL. The way to access data in HBase is through the API provided - get, put and scan. It's simple but restrictive.

5. HBase provides a more granular control over I/O performance (IMO) that you can extract from it as compared to MySQL. You can design your tables such that data that is accessed together is stored together on the disk, giving you predictable disk seeks when you access that data. The downside of this is that you have to be familiar with the internal workings of HBase so you can leverage these properties.

Though through Hive, you can do as much regular SQLs as in MySQL, but more likely you get benefit from large scale sorting, statistics analysis etc.. non OLTP type SQLs. Which get  integration with Hadoop MapReduce for multiple threads processing. Not meant to use HBase as conventional database at all.

Comments powered by CComment