What is RocksDB?
RocksDB is an embedded key-value store developed by Facebook (Meta), originally forked from Google's LevelDB. It is the storage engine inside many large-scale systems — Apache Kafka Streams, CockroachDB, Cassandra (in some setups), MyRocks, TiKV, and countless service-internal stores.
This page is a tour of the engine's mental model. For the canonical reference, see the RocksDB Wiki and the RocksDB Overview page.
Key properties
- Embedded. Runs in-process, no separate server. Your application is the database.
- Persistent key-value store. Both keys and values are arbitrary byte strings.
- Ordered by key. Keys are stored in sorted order (byte-wise lexicographic by default). This makes range scans first-class.
- Log-Structured Merge tree. Writes append to an in-memory
MemTableand a Write-Ahead Log (WAL), then flush to immutable on-disk SST files that are merged by background compaction. See the LSM-tree wiki entry and the Compaction overview for the gory details. - Optimised for flash and RAM. Sequential I/O, configurable block cache, compressible SST files.
- Tunable. Read amplification, write amplification, and space amplification can be traded off via options. See the Tuning Guide.
How a write moves through the engine
- The WAL is what makes a write durable across crashes. The WAL wiki covers its format and recovery semantics.
- The MemTable is an in-memory sorted structure (skiplist by default) that absorbs new writes.
- SST files ("Sorted String Table") are the on-disk immutable files. Compaction merges them and discards obsolete versions. See Creating and Ingesting SST files.
What RocksDB-Sharp wraps
RocksDB exposes a stable C API (include/rocksdb/c.h) on top of its C++ core. RocksDB-Sharp is a P/Invoke binding over that C API, plus an idiomatic C# class hierarchy on top.
| Native | RocksDB-Sharp |
|---|---|
rocksdb_t* |
RocksDb |
rocksdb_options_t* |
DbOptions |
rocksdb_readoptions_t* |
ReadOptions |
rocksdb_writeoptions_t* |
WriteOptions |
rocksdb_iterator_t* |
Iterator |
rocksdb_writebatch_t* |
WriteBatch |
rocksdb_writebatch_wi_t* |
WriteBatchWithIndex |
rocksdb_snapshot_t* |
Snapshot |
rocksdb_column_family_handle_t* |
ColumnFamilyHandle |
rocksdb_checkpoint_t* |
Checkpoint |
rocksdb_sstfilewriter_t* |
SstFileWriter |
rocksdb_mergeoperator_t* |
MergeOperator |
rocksdb_transaction_log_iterator_t* |
TransactionLogIterator |
Features you get for free
The binding exposes essentially the full RocksDB C API. A non-exhaustive list:
- Column families — multiple logical key spaces in one physical DB.
- Merge operator — read-modify-write without the read.
- Snapshots — consistent point-in-time reads.
- Iterators — ordered scans with bounds, prefix mode, and tailing mode.
- Checkpoints — hard-link-based fast backups.
- SST file ingestion — bulk-load pre-built files.
- TTL databases — automatic expiry of old keys.
- Read-only and secondary instances — multi-process read access and live followers.
- Block cache, Bloom filters, prefix extractors, and other tuning knobs.
- WAL replication via
GetUpdatesSince.
What RocksDB is not
- Not a SQL database. No query planner, no joins, no schemas. You design key encodings yourself.
- Not distributed. RocksDB runs on one machine, in one process. Distribution is layered on top by the consumer (e.g. CockroachDB shards RocksDB instances behind Raft).
- Not a network service. Reads and writes are in-process method calls. To expose RocksDB over the network you build the service yourself — and you almost certainly want to put it behind something like the Replication guide.
Next: opening a database
Now that you know what's inside, jump into Opening a database to see the different Open overloads — read-write, read-only, secondary, TTL, and column-family-aware.