Why PutFS?

Whilst this whole thing might look like an esoteric experiment that would never make it into production, here is a write-up of the reasoning behind it, why we built (and will use) it, and what we discovered along the way.

The starting point

We, the Data and Research Center, are the organization behind the open source software OpenAleph. That's a solid project with a wide community that has been used for almost a decade now in hundreds of investigative newsrooms and research teams around the world. Obviously, this requires reliable and scalable storage, and for our managed services we ran our own MinIO cluster.

With MinIO shutting down its community edition and becoming AIStor targeting enterprises that can afford licenses fees that exceed our yearly turnaround, it was time to revisit the storage requirements for OpenAleph and take the time to rethink the deployment model entirely. After thorough research, it turned out there wasn't a solution that fit what we actually needed – even though the requirements seemed straightforward.

The actual requirements

MinIO (and similar solutions) are oversized for most projects. What a web application like OpenAleph actually needs from a storage backend is pretty simple:

File storage and retrieval via HTTP, fast
Random access and directory listing, fast
Incremental backups and failover replication
Self-hosted, full control and independence from vendors
If possible, S3 compatibility (but we could rewrite the storage adapter)
If possible, stay on HDD for budget but still get SSD-like acceleration

Hello, ZFS

That last point is where ZFS came in: with ARC, L2ARC, SLOG, and a special vdev, ZFS can be configured to behave almost like an SSD while the bulk of storage still sits on cheap spinning disks.

And the more we looked at it, the more we realized: All the features we need can be provided with basic UNIX tools and services, the filesystem or the OS. File serving? nginx. Auth? nginx. Replication? rsync or zrepl. Encryption? LUKS or ZFS native encryption. Versioning? ZFS, LVM or btrfs snapshots. Quotas? zfs set quota. Pre-signed urls? nginx, again. Storage tiering? Guess what: ZFS. None of this requires a distributed object store with its own metadata layer.

What was actually missing

The conclusion was: with the filesystem handling snapshots, replication, SSD acceleration, tenant isolation via pools/datasets – we actually just want a well-tuned ZFS pool behind a well-tuned nginx. The fastest way to serve files is nginx's sendfile (kernel zero-copy from file descriptor to socket).

The only thing missing was a PUT endpoint, or in other words the write path through nginx (of course there is and always has been webdav built into nginx but that just feels weird).¹

So at a rainy afternoon in Berlin winter, we wrote that PUT call. It's about 100 lines of async Python.

And then we benchmarked it.

What we found

The benchmarks validated our assumptions that this concept must outperform MinIO. And yes, it does. On a dedicated server with a ZFS HDD mirror + SSD special vdev, PutFS outperformed MinIO (running on xfs formatted SSD) by:

3–4x on concurrent small file reads
2–6x on single-file writes across all sizes
4–7x on directory listing
3x less memory
40% less CPU at peak load

This wasn't because we wrote clever code. It's because we didn't write code. MinIO (and now RustFS and similar products) maintains its own metadata store, manages its own caching, builds custom storage formats on top of raw block devices. Every read and write passes through layers of abstraction. PutFS skips all of that – reads go from disk to socket via sendfile, writes go from socket to file. There is no metadata database, no custom format, no distributed consensus.

A quick side note: If an application doesn't require much code but delegates most logic to basic UNIX tools and the operating system, it doesn't need to be written in go or rust to be able to market it as "fast" ;-)

What we got for free

Beyond raw performance, the simplicity of PutFS turned out to have practical benefits we have wished for anyways:

Processing on the storage itself

OpenAleph processes millions of documents – OCR, text extraction, entity recognition. Previously this meant syncing data from the MinIO cluster to a processing server, doing the work, and pushing results back. Now the processing jobs can run directly on the ZFS server that holds the data. No network transfer, no intermediate storage.

Better presigned URLs

OpenAleph's UI shows document previews via presigned download links. With MinIO, these were S3 presigned URLs – time-limited but not IP-bound, not method-bound, and validated by MinIO's IAM on every request. With PutFS, we use nginx secure_link which binds the token to the client's IP address, the HTTP method, the exact file path, and a 30-second expiry. The token is verified inline by nginx (nanoseconds, no round-trip to an auth service), and the file is served via sendfile – zero Python, zero S3 overhead. A leaked link is useless: wrong IP → 403, wrong method → 403, expired → 410.

Real backups

With MinIO and related systems, the backup strategy usually is the built-in site replication – backups depend on the same software, the same storage format, and the same vendor. If MinIO has a bug that corrupts data, our replicated backup has the same corruption. With PutFS, the data is plain files. Back up with restic to a completely different server running a different filesystem. Or rsync to an ext4 box. Or zfs send to an off-site ZFS pool. Or all three. The backup chain is independent of PutFS, independent of ZFS, and independent of any single tool. If ZFS has a bug, our restic backup on ext4 is unaffected. If PutFS disappears, the rsync cron job still works. That kind of supply chain isolation is impossible when the backup is "replicate the same proprietary system to another node."

Higher security

MinIO and RustFS ship admin UIs, gRPC management APIs, IAM policy engines, custom storage formats, and built-in replication – each one a potential vulnerability. RustFS alone had path traversal, hardcoded auth tokens, and stored XSS – all within months. PutFS has none of that, by design. The entire codebase we need to audit is ~100 lines of Python. Auth is nginx (audited by millions of deployments). File serving is sendfile (kernel). Cache management, file permissions, or encryption is on the OS, kernel or filesystem layer itself. The less custom code between the internet and our files, the less there is to exploit.

Tenant isolation

Tenant isolation is another area where delegating to the OS wins. MinIO handles multi-tenancy in its own IAM layer – a bug there (like the authorization bypass in RustFS) can expose one tenant's data to another. With PutFS, each tenant is a separate deployment pointing at a separate ZFS dataset. Isolation is enforced by Unix file permissions and ZFS dataset boundaries at the kernel level – not by application code. One tenant's PutFS process literally cannot read another tenant's files, even if the application has a bug, because the OS won't allow it. Multiple deployments can share the same ZFS pool while being fully isolated via dataset permissions, encryption keys, and quotas. This is the same isolation model that hosting providers have trusted for decades (hopefully).

Composability

Need WebDAV for desktop users? Add an nginx location. Need SFTP for developers? It's already a filesystem. Need object locking? chattr +i. Each feature is a single-purpose tool that does one thing well, not a checkbox in a monolithic admin UI that just gets removed when VC demands it.

No vendor lock-in

Not even PutFS lock-in. The data is a directory tree. ls works. rsync works. If we decide tomorrow that PutFS was a mistake, we tar the data and move on. No export tool, no migration path, no format conversion.

The trade-offs

PutFS is not for everyone. The honest gaps:

No distributed storage. Single-node by design, or 1 primary write note and N secondary read (replica) nodes. If you need data spread across multiple machines with active-active replication, use Ceph or Garage.
No full S3 API. We implement the core subset to experiment around (GET, PUT, DELETE, HEAD, ListObjectsV2). S3 Select, Lambda triggers, and cross-account IAM federation are out of scope.
No admin UI. If you need a web interface to browse buckets and manage users, build it on top or use a file manager via webdav.

For most self-hosted deployments that don't need active-active multi-site or a full AWS-compatible API surface, these trade-offs are worth it. We plan to run PutFS in production for OpenAleph, and the infrastructure is simpler, faster, and cheaper than the MinIO cluster it replaces.

After building PutFS we found out that pretix.eu is relying on a similar idea. ↩