четверг, 11 ноября 2010 г.

10 rules for choosing scalable DBMS suitable for your OLTP application

I've just read "Ten Rules for Scalable Performance in Simple Operation Datastores" by Michael Stonebraker and Rick Cattell. We've just discussed the scalability question at work. So, I've found this article quite in time.
The authors state 10 rules for scalable systems performing simple operations over data (read or write a few items).
In short, the rules are:

  1. Only shared-nothing system can scale well (in case of adequate table partitioning and replicating read-mostly data). Shared-disk systems suffer from synchronization overheads.
  2. High level languages can be fast. Compare, e.g. slow relational DBMS and fast navigational - e.g. network or hierarchical - who did win the most of the market share? At first SQL DBMS were considered extremely slow, but just for now optimizer is much smarter then most of the programmers. Also, API matters. However, stored procedures are necessary to achieve adequate performance (eliminating communication and query compilation overhead).
  3. Your application (or DBMS) should leverage huge memory sizes (64 GB per 16 hosts - a terabyte of main memory is not a dream). However, existing DBMS have a lot of overhead in query processing - locking, logging, maintaining buffer pool, multi-thread locking (latching) overhead).
  4. HA and automatic recovery is necessary for scalable application. HA includes possibility of automatic data repartitioning, normal operations during DoS, hardware failures, lost network connectivity, application errors, DBMS errors and so on. Sounds like a dream and not fully achivable. But at least we should avoid outages for the most situations and part of recovery actions should be done automatically (e.g. data repartitioning after node failure).
  5. On-line everything. All operations should be on-line: adding nodes, performing updates, changing data schema and so on.
  6. Avoid multi-node operations. You should select partitioning key properly to avoid multi-node operations and synchronizations. Such operations just can't be fast.
  7. Don't build ACID yourself. If you need it (and in most cases you do), use available solutions (DBMSs). CAP theorem is not always a principal thing - ACID is more precious then impossibility of cluster partitioning. Modern LAN technologies allow you build redundant network, and loose of WAN-connected office is not a great problem - using WAN-connected nodes in your cluster transparently is madness because of network delays. If connection breaks, replication may be restarted later.
  8. The system must not have a lot of knobs. A system should be easy to administer and be able to do auto tuning. I wish my DBMS were so clever :)
  9. Node performance should be considered. 20 monster nodes is much better then 1000 half-broken outdated junk. And supporting them is usually much cheaper.
  10. OpenSource makes you free. Without comments. I'd like to eliminate our outdated Oracle installation. We can't update it because it's too expensive, and can't rewrite our system from scratches....

I can't help but agreeing with them...