Fat clients: Are they useful ?

In Cassandra’s distributed architecture, every node acts as a coordinator for client queries. The client-side driver picks coordinator nodes based on the chosen load balancing policy and thus, distributed queries to all nodes. There is some discussion in the community on benefits of white-listing a subset of nodes as coordinators (aka “fat clients”). This design… Read more

Migrating Cassandra cluster to Azure Premium Storage (at scale!)

Background Office 365 uses Cassandra to learn deeply about its users. Our cluster has 150 TB data and runs on 300 nodes. We use Azure D14 VMs running Ubuntu, and the data is stored on locally attached SSDs. We use spark jobs running in Azure HDI to compute insights on the “big data” stored in Cassandra cluster…. Read more

Real world Repairs

Background Maintaining anti-entropy in Cassandra is achieved via the repair operation. This is vital, yet not well-supported in the Cassandra world. Thus, one need to think of innovative options to make this work at scale. Following are the tools out there to run repairs on 2.1 clusters. Netflix Tickler: https://github.com/ckalantzis/cassTickler (Read at CL.ALL via CQL… Read more

Data Modeling Guidelines

Think of your access patterns before your design your schema. Unlike traditional SQL technologies, C* works best when the tables are designed for the specific queries you want to run against them. This gives good in depth coverage if everything that cqlsh syntax provides. At times, you may want to pre-aggregate / compute the data before… Read more

Debugging Multi-Node Cassandra Cluster on Windows

First : Run a single node cluster, and set breakpoints. Download latest version of cassandra from https://github.com/apache/cassandra. Click “Clone in Desktop” button on the right side of the page for quickest download experience. You can also setup “Egit” so that you can get source control in Eclipse (Just like Team window in VS). Follow: Eclipse… Read more

Test your Cassandra Knowledge

Our team is responsible for running multiple Cassandra clusters reliably and flawlessly at scale leveraging Azure PaaS. As a result, I am able to appreciate the technology and resonate with its steep learning curve. The following quiz was made to help alleviate the on boarding pain for folks. Most of the questions are not a simple “How to… Read more