High Performance Database Load Balancing Between Data Centers
The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. That's why it has been used by 40% of the Fortune 100 companies according to its website.
Technically Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
This week I went to a webinar "Best Practices of Cassandra in Production" because we are using the similar stack and I would like to find out more about it. The speaker is Adam Zegelin, Co-founder of Instaclutr, a Managed Service Provider for open technologies such as Apache Cassandra, Kafka, Elasticsearch & Redis in the cloud and on-premise.
I learned a lot about Cassandra best practices from architecture, design and operation perspectives, but one thing particularly interesting is the removal of the Java driver support for multi-region load balancing. In my previous practice, I normally try to leverage the load balancing features as much as possible to the bottom of the stack. In the webinar, Adam told the audience that Cassandra Java driver removed the multi-region loading balancing since V4 because the Cassandra team would like to push the load balancing across data centers to the infrastructure layer, which is moving the load balancing feature upper in the stack.
After diving deeper into the Cassandra Java driver document at https://docs.datastax.com/en/developer/java-driver/4.8/manual/core/load_balancing/#default-policy, I have a better understanding of this change.
It's all about the simplicity and the robust infrastructure from simplifying the stack design. Based on the doc, "In previous versions, the driver provided a wide variety of built-in load balancing policies; in addition, they could be nested into each other, yielding an even higher number of choices. In our experience, this has proven to be too complicated: it’s not obvious which policy(ies) to choose for a given use case, and nested policies can sometimes affect each other’s effects in subtle and hard to predict ways."
Cassandra team is taking a more opinionated approach starting from version 4. Since V4, the driver provides a default load balancing policy as the best choice for most cases, while it also provides the custom implementation capability for special use case requirements.
The local only default policy allow application only connect to a single datacenter.
In the previous version, you can have app1 load balancing/failover the request to Cassandra DC2 when Cassandra DC1 loses function. Meanwhile App2 can load balancing/failover the request to Cassandra DC1 when Cassandra DC2 loses function. Sounds great but that adds unnecessary complexity and you barely need it. Normally that means some catastrophic issue in Region1 if highly redundant and highly available Cassandra DC1 loses function. And you most likely should push all client traffic to Region2 as well.
This philosophy behind the Java driver change highly matches our infrastructure experience and our practice. When we designed and implemented the once most widely used data centers for banks and government agencies, we always have the redundant tech stacks in all data centers and leverages APM(application performance management) integration to instruct the infrastructure load balancers to switch the traffic between data centers. In fact we have multi layers of the monitoring to make sure the automatic switch over and low/no false positive.
From theory to practice, and from practice to theory, that's how our industry evolves. Thanks to the open technology providers to enable business build more resilient and performant applications for the more demanding business needs.
More Secure solutions are built on top of the industry proven open technologies, we believe that will bring to our customers the most affordable, effective and secure business solutions. Please contact us to find out how your business will benefit from that too.