Skip to content

Monitoring

PutFS has no built-in metrics endpoint. Monitor the components it delegates to: nginx and the filesystem.

nginx

stub_status

location /nginx_status {
    stub_status;
    allow 127.0.0.1;
    deny all;
}

Exposes active connections, accepts, handled, requests. Useful for saturation alerts.

Prometheus

nginx-prometheus-exporter scrapes stub_status and exposes standard metrics:

nginx-prometheus-exporter -nginx.scrape-uri=http://127.0.0.1/nginx_status

Access logs as event stream

With access_log off (recommended for performance), there's no request-level telemetry. If you need it, use buffered file logging:

access_log /var/log/nginx/putfs.log buffer=64k flush=5s;

Then ship to your pipeline:

nginx log → Vector/Fluentd → filter PUT/DELETE → webhook/Loki/S3

Each log line contains method, path, status, response time — enough to derive per-dataset write rates, error rates, and latency percentiles.

Warning

Logging to stdout (access_log /dev/stdout) causes a 27x throughput penalty. Always log to a buffered file if you need access logs.

ZFS

Pool I/O

zpool iostat -v tank 5       # per-vdev I/O every 5 seconds
zpool iostat -l tank 5       # latency histograms

ARC

arcstat 1                    # live ARC hit rate, size, evictions
arc_summary                  # detailed ARC breakdown

Key metrics:

  • ARC hit rate – above 90% is good, above 98% means most reads come from RAM
  • Demand metadata hits – directory lookups / stat calls. Drops here degrade listing performance
  • L2ARC hit rate – if low, working set exceeds SSD + RAM

Prometheus

zfs_exporter exposes pool, dataset, and ARC metrics:

zfs_exporter --collector.dataset-snapshot=false

Alerting suggestions

Alert Condition Why
Disk full zpool list -Hp capacity > 85% PutFS returns 507 when full
ARC hit rate drop arcstat demand_hits / demand_total < 0.9 Reads falling through to disk
Replication lag zrepl status last snapshot > 15 min Gap exposure increasing on failure
nginx errors 5xx rate > threshold API or filesystem issues