Scaling
PutFS is single-node by design. Scale vertically first – a well-configured server with ZFS goes a long way.
Horizontal scaling (more nodes) is possible as long as a single dataset fits on a single node. Splitting one dataset across multiple nodes would require distributed consensus or a routing layer – exactly the complexity PutFS wants to avoid. Instead, route entire datasets to different servers at the load balancer level.
Vertical scaling
| Resource | How |
|---|---|
| Storage | zpool add – no rebalancing, no downtime |
| Read throughput | More RAM → larger ZFS ARC |
| Write throughput | NVMe SLOG absorbs sync write latency |
| Small file IOPS | ZFS special vdev on NVMe/SSD |
| Concurrency | More granian workers (--workers) |
| Network | Bond NICs or upgrade to 10G/25G |
A single well-configured server goes a long way. A 64 GB RAM machine with ZFS ARC serves most read workloads entirely from memory. Adding an NVMe special vdev moves all small file I/O off spinning disks. An NVMe SLOG absorbs write latency. At that point we're limited by network, not storage.
Typical progression:
1. Start with what you have (any disk, any filesystem)
2. Add RAM (ZFS ARC serves hot reads from memory)
3. Add SSD/NVMe as ZFS special vdev (small files + metadata on flash)
4. Add NVMe SLOG (write latency → sub-millisecond)
5. Add SSD L2ARC (warm reads served from flash)
6. Add more disks to the pool (zpool add, online, no downtime)
No data migration, no rebalancing, no cluster coordination at any step.
Disk full
PutFS returns 507 Insufficient Storage when the filesystem is full. No silent data loss, no corruption.
Horizontal scaling
Read replicas
Add secondaries with replication and a load balancer. Reads round-robin, writes go to primary. See Replication for nginx/HAProxy config.
Bucket routing
Route different datasets to different servers at the load balancer:
map $uri $backend {
~^/bucket-a/ server1:8000;
~^/bucket-b/ server2:8000;
default server1:8000;
}
server {
location / {
proxy_pass http://$backend;
}
}
New tenants or datasets get assigned to the server with capacity. Existing data stays where it is – no rebalancing. Each server runs its own PutFS + nginx + ZFS pool independently.