Skip to main content

Performance

The full shape of one run, so you can judge whether it is honest. Not a single hero number: connections, throughput, the whole latency spread.

The headline

Connections36,000
Durable writes / sec325,000
p5094 ms
p95111 ms
p99201 ms

Every latency is end-to-end, including replication and both fsyncs, over mTLS on the client and replication paths. That is encrypted, durable, replicated throughput, not a page-cache number you cannot trust.

Method

Concurrency36,000 durable writes in flight at once, across four load-generating clients. A saturation number well past Postgres's connection wall, not a single-threaded ping.
Payloadone "Hello World" event per acknowledged write
Hardwaretwo AWS i4i.16xlarge data nodes: 64 vCPU, four local NVMe drives striped RAID0
Networkap-southeast-2, single availability zone
SecuritymTLS on client connections and on cluster replication
Batchingthe server amortises fsync and replication across concurrent writes; each client write is still acknowledged on its own
Write pathevery write is fdatasync'd to disk on both nodes through Direct I/O, replicated to the follower, and acknowledged only after both succeed

Cost

The two i4i.16xlarge data nodes run $13.16 an hour on-demand in ap-southeast-2, about $9,600 a month, before reserved or spot discounts.

It scales down hard. Two i4i.large cost about $300 a month and still hold 30,000 durable writes a second at p99 158 ms. Same architecture, same write path, smaller box.

Why it is this fast

It is not clever code; it is architectural alignment. An i4i.16xlarge is 64 cores of NVMe and io_uring, and many databases were designed before that hardware existed and leave it idle. Celeriant is built backward from it: Direct I/O, thread-per-core, batched fsync and replication, kernel TLS (kTLS) offload. See Durability and safety for the mechanism.

Reproduce it

The benchmark is meant to be re-run, not taken on faith. One sweep on AWS, reproducible for a few dollars: stand up the two nodes and a load generator and check the number yourself. Tested in a single availability zone; expect worse numbers for cross-AZ.

Pre-release

These figures are from the current pre-1.0 build on the configuration above. This is a small-payload, write-rate-bound test; large payloads become bandwidth-bound. Your workload, payload size, and hardware will move the number; the method is what lets you predict which way.