Skip to content

Scaling

PutFS is single-node by design. Scale vertically first – a well-configured server with ZFS goes a long way.

Horizontal scaling (more nodes) is possible as long as a single dataset fits on a single node. Splitting one dataset across multiple nodes would require distributed consensus or a routing layer – exactly the complexity PutFS wants to avoid. Instead, route entire datasets to different servers at the load balancer level.

Vertical scaling

Resource How
Storage zpool add – no rebalancing, no downtime
Read throughput More RAM → larger ZFS ARC
Write throughput NVMe SLOG absorbs sync write latency
Small file IOPS ZFS special vdev on NVMe/SSD
Concurrency More granian workers (--workers)
Network Bond NICs or upgrade to 10G/25G

A single well-configured server goes a long way. A 64 GB RAM machine with ZFS ARC serves most read workloads entirely from memory. Adding an NVMe special vdev moves all small file I/O off spinning disks. An NVMe SLOG absorbs write latency. At that point we're limited by network, not storage.

Typical progression:
1. Start with what you have (any disk, any filesystem)
2. Add RAM (ZFS ARC serves hot reads from memory)
3. Add SSD/NVMe as ZFS special vdev (small files + metadata on flash)
4. Add NVMe SLOG (write latency → sub-millisecond)
5. Add SSD L2ARC (warm reads served from flash)
6. Add more disks to the pool (zpool add, online, no downtime)

No data migration, no rebalancing, no cluster coordination at any step.

Disk full

PutFS returns 507 Insufficient Storage when the filesystem is full. No silent data loss, no corruption.

Horizontal scaling

Read replicas

Add secondaries with replication and a load balancer. Reads round-robin, writes go to primary. See Replication for nginx/HAProxy config.

Bucket routing

Route different datasets to different servers at the load balancer:

map $uri $backend {
    ~^/bucket-a/  server1:8000;
    ~^/bucket-b/  server2:8000;
    default       server1:8000;
}

server {
    location / {
        proxy_pass http://$backend;
    }
}

New tenants or datasets get assigned to the server with capacity. Existing data stays where it is – no rebalancing. Each server runs its own PutFS + nginx + ZFS pool independently.

Further reading