Cassandra supports various strategies to partition data across nodes in a cluster. User can choose available strategies as required by an a...| distributeddatastore.blogspot.com
Scaling services is a hard problem but operating them at the scale is even more hard problem. You need to consider the operational aspects of running it at scale during design time. Service downtime can cause huge losses to business and leads to poor customer experience.| Key Concepts
RocksDB is developed in C/C++ programming language and has a JAVA binding which provides a set of JAVA classes to access RocksDB. In a Java application, the memory is managed by JVM. Object allocations are done on JVM heap and GC manages the heap. But RocksDB allocates C/C++ object using the OS malloc library. When RocksDB is used in a Java application we need to extra careful about the allocations made by RocksDB. | Key Concepts
RocksDB is one of the popular open source embedded key value database| Key Concepts
RocksDB Put operation creates a new record in the specified DB. | Key Concepts
RocksDB Get operation retrieves the record value for a given key. RocksDB code is very flexible and has several levels of abstractions. If you are planning to understand the code path you can start reading from the functions specified in this post.| Key Concepts
RocksDB and LevelDB has an abstraction called Env (environment) which provides an interface to access Operating system specific functions. This abstraction is there in the traditional BerkeleyDB also. It nicely separates Database code from the OS specific functionality by encapsulating it.| Key Concepts
RocksDB is one of the popular open source embedded key value database used by several other popular systems. It is a derivative of LevelDB which is developed by Google. More and more opensource as wells as commercial systems started using RocksDB due to its high performance, flexibility and tuning features.| Key Concepts
Cassandra Out Of Memory (OOM) exceptions can cause performance impact and reduce availability. The OOM exception is reported when Cassandra is unable to allocate heap space. Some times the application workload alone can cause Cassandra to quickly use all the available heap space and throw OOM exceptions. Common cause for OOM is application workload combined with background jobs such as repair or Compactions can increase heap usage and eventually lead to OOM exceptions. Some of the component...| Key Concepts
Cassandra compaction is a background process which merges multiple SSTables of a Column Family to one or more SSTables to improve the read performance and to reclaim the space occupied deleted data. Cassandra supports different Compaction strategies to accommodate different workloads. In this blog post, the size tiered strategy is described.| Key Concepts
Cassandra Commit Log is an append only log that is used to track the mutations made to Column Families and provide durability for those mutations. Every mutation is appended to a Commit Log first before applied to an in memory write-back cache called Memtable. If a Memtable is not flushed to disk to create a SSTable then the mutations stored in it can be lost if Cassandra suddenly shutdowns or crashes. At startup, Cassandra replays the Commit Log to recover the data that was not stored to dis...| Key Concepts
Memtables store the current mutations applied to Column families. They function as a write back cache and provide faster write performance and faster read performance for recently written data. Mutations are organized in sorted order using skip list data structure in the Memtable. Each Column family is associated with its own Memtable. In this blog, on-heap based Memtable type is described. | Key Concepts
Recently we had a strange issue with Cassandra repair. A table is replicated across three | Key Concepts
Cassandra SSTable storage format is changed in 3.0 to support higher level CQL structure directly at Storage engine level. Older format of SSTable was designed to support a very simple model of storing basic key/value pairs which was adequate to support Thrift API. In 2.* and earlier versions of Cassandra, Thrift was predominantly used to define the database schema in the older versions of 2.*. But thrift's column family definition is not sufficient to create tables to store structured applic...| Key Concepts
In Cassandra 2.* version, a CQL table is stored using the same storage format that is used for storing thrift based column family. Cassandra stores extra information as part of column names to detect CQL clustering keys and other CQL columns. | Key Concepts
In an eventual consistent system like Cassandra, information about deleted keys should be stored to avoid reading the deleted data. When a row or column is deleted, this information is stored as tombstone. Tombstones are stored until gc grace period associated with column family not reached. Only major compaction removes tombstones that are older than gc grace period. Tombstones are spread to all replicas when repair is performed. It is important to run repair regularly to eliminate resurre...| Key Concepts
Paxos consensus algorithm is used to achieve consensus across a set of unreliable processes which are running independently. Consensus is required to agree on a data value proposed by one of the processes. Reaching consensus becomes difficult when processes fail or messages can be lost or delayed. An example of this kind of environment is asynchronous distributed system.| Key Concepts
Shamir's secret sharing method provides a technique to secure data by decomposing the data in to multiple blocks and store them on different machines. It transforms data in to n blocks such that even access to k -1 blocks of data doesn't reveal any information about the original data. This technique is called (k, n) threshold scheme which can be used to secure data without using encryption keys. | Key Concepts
Cassandra nodetool provides a command to run repair on a specified endpoint. The repair command uses Anti-Entropy service which detects inconsistencies and repairs | Key Concepts
In a distributed system like Cassandra, data replication enables high availability and durability. Cassandra replicates rows in a column family on to multiple endpoints based on the replication strategy associated to its keyspace. The endpoints which store a row are called replicas or natural endpoints for that row. Number of replicas and their location are determined by replication factor and replication strategy. Replication strategy controls how the replicas are chosen and replication fact...| Key Concepts
Cassandra is a distributed hash table (DHT) based storage system where data is dynamically partitioned to a set of storage nodes in a cluster using a data partitioning scheme. It provides two data partition schemes called RandomPartitioner and ByteOrderedPartitioner. This blog article provides an overview of RandomPartitioner which is the commonly used partitioning scheme.| Key Concepts
SSTable is an abbreviation for Sorted String Table. It is the fundamental storage building block in few of the modern Log Structured Merge Tree(LSM) based distributed database systems and key-value stores. It is used in Cassandra, BigTable and other systems. SSTable stands for Sorted Strings Table which stores a set of immutable row fragments or partitions in sorted order | Key Concepts
Cassandra's AntiEntropy service uses Merkle trees to detect the inconsistencies in data between replicas. Merkle tree is a hash tree where leaves contain hashes of individual data blocks and parent nodes contain hashes of their respective children. It provides an efficient way to find differences in data blocks stored on replicas and reduces the amount of data transferred to compare the data blocks. | Key Concepts
Scaling services is a hard problem but operating them at the scale is even more hard problem. You need to consider the operational aspects of running it at scale during design time. Service downtime can cause huge losses to business and leads to poor customer experience.| distributeddatastore.blogspot.com
Cassandra's AntiEntropy service uses Merkle trees to detect the inconsistencies in data between replicas. Merkle tree is a hash tree where ...| distributeddatastore.blogspot.com