Monitoring

PutFS has no built-in metrics endpoint. Monitor the components it delegates to: nginx and the filesystem.

nginx

location /nginx_status {
    stub_status;
    allow 127.0.0.1;
    deny all;
}

Exposes active connections, accepts, handled, requests. Useful for saturation alerts.

nginx-prometheus-exporter scrapes stub_status and exposes standard metrics:

nginx-prometheus-exporter -nginx.scrape-uri=http://127.0.0.1/nginx_status

With access_log off (recommended for performance), there's no request-level telemetry. If you need it, use buffered file logging:

access_log /var/log/nginx/putfs.log buffer=64k flush=5s;

Then ship to your pipeline:

nginx log → Vector/Fluentd → filter PUT/DELETE → webhook/Loki/S3

Each log line contains method, path, status, response time — enough to derive per-dataset write rates, error rates, and latency percentiles.

Warning

Logging to stdout (access_log /dev/stdout) causes a 27x throughput penalty. Always log to a buffered file if you need access logs.

zpool iostat -v tank 5       # per-vdev I/O every 5 seconds
zpool iostat -l tank 5       # latency histograms

arcstat 1                    # live ARC hit rate, size, evictions
arc_summary                  # detailed ARC breakdown

Key metrics:

ARC hit rate – above 90% is good, above 98% means most reads come from RAM
Demand metadata hits – directory lookups / stat calls. Drops here degrade listing performance
L2ARC hit rate – if low, working set exceeds SSD + RAM

zfs_exporter exposes pool, dataset, and ARC metrics:

zfs_exporter --collector.dataset-snapshot=false

Alert	Condition	Why
Disk full	`zpool list -Hp` capacity > 85%	PutFS returns 507 when full
ARC hit rate drop	`arcstat` demand_hits / demand_total < 0.9	Reads falling through to disk
Replication lag	`zrepl status` last snapshot > 15 min	Gap exposure increasing on failure
nginx errors	5xx rate > threshold	API or filesystem issues