Silo — Requirements and Limits

What hardware Silo needs, what it expects from its host, and what to set if you’re tuning for scale.

Hardware requirements

Silo is deliberately light — the JVM heap is the floor and everything else is disk. It runs comfortably on a small VM or a shared CI node.

Resource	Minimum	Recommended
CPU	1 vCPU	2–4 vCPU — request handling is virtual-thread based and scales with cores
RAM	512 MB	1 GB — RSS stays < 200 MB idle and < 500 MB under 100 rps mixed traffic
JVM	21+	21+ (virtual threads, current Ktor; no native image needed)
Disk	`max-bytes` + reserve	Local SSD/NVMe on ext4/xfs — not NFS (see below)
Inodes	`max-entries` + reserve	One inode per cache entry; size the filesystem for the entry cap
Network	100 Mbps	1 Gbps+ for large CI fleets pushing big task outputs

Sizing the disk. Budget silo.storage.max-bytes for the cache itself plus silo.storage.reserved-free-bytes (default 5 GB) of headroom — Silo starts returning 503 once free space drops below the reserve, so a 100 GB cache wants ~110 GB of volume. At high entry counts inodes run out before bytes do; size the filesystem accordingly (see Disk / inode exhaustion).

Memory. The Docker image launches with -XX:+UseG1GC -XX:MaxRAMPercentage=75 -XX:+ExitOnOutOfMemoryError, so the heap tracks the container’s memory limit — give the container ≥ 768 MB under load and the JVM sizes itself. PUT/GET bodies are streamed, never buffered, so peak memory stays flat regardless of artifact size.

Recommended setup at a glance

Item	Recommendation
OS	Linux (any modern distro) or macOS for development
Filesystem	xfs (production) or ext4 with `dir_index` (default since 2008)
Mount options	`defaults,noatime` is fine — Silo tracks access in SQLite
`ulimit -n`	≥ 65,536
Disk free	At least `silo.storage.reserved-free-bytes` (default 5 GB) over the configured cap
Inode free	At least `silo.storage.reserved-free-inodes` (default 100,000)
Network	Behind a reverse proxy for TLS (see `docs/tls.md`)

Filesystem

ext4

Without dir_index (htree), directory lookup is O(n) and degrades sharply above ~10K entries.
With dir_index (enabled by default at mkfs since e2fsprogs 1.41), lookups are O(log n) and scale to millions per directory — but the index pages still need to be cached for that to be fast.
Silo shards cas/{ab}/{cd}/{key} to 65,536 leaf directories. At 1M entries that’s ~15 per leaf. The total file count is what matters for inode budget.
Verify dir_index is on: dumpe2fs -h /dev/sdX | grep features should list dir_index.

xfs

Recommended for production. Allocation groups give parallel I/O; large-file performance is excellent; fragmentation is low on write-once content-addressed workloads.
No tuning needed for Silo’s usage pattern.

APFS (macOS)

Case-insensitive by default. Hex cache keys do not collide in practice, but if you’re running test suites that hand-craft uppercase/lowercase variants you’ll want a case-sensitive volume (diskutil apfs addVolume ... -caseSensitive).
Otherwise fine for development.

NTFS (Windows)

MAX_PATH = 260 characters default. Silo’s sharded layout keeps paths short; the storage root path must leave headroom for cas/ab/cd/<128-char-key> (≈ 138 chars).
Long-path opt-in via \\?\ prefix or system policy is on the roadmap for v0.2 if a Windows user needs deep roots.

overlayfs (Docker default writable layer)

rename(2) is atomic within the upper layer. Silo writes the temp file in the same shard dir as the final, so renames never cross layers.
If you bind-mount /data to a host volume (recommended), the data lives on the host filesystem and overlayfs is bypassed entirely.

NFS — not supported

fsync semantics on NFS are weak; rename(2) is not atomic over RPC; locking is unreliable.
Silo refuses to start if the storage root is on nfs / nfs3 / nfs4, detected by walking /proc/self/mountinfo (Linux) and matching the deepest mount point containing the storage root.
On macOS and Windows the detection is a no-op + WARN — there is no /proc equivalent, so we cannot reliably tell. Run with care.
Override (not recommended): silo.storage.allow-unsupported-fs = true downgrades the abort to a WARN. Use only if you are absolutely sure your filesystem is not actually NFS (e.g. a containerised test harness that lies to mountinfo).
Need network-shared storage? Wait for the S3 backend (v0.2) or run Silo locally on each node.

tmpfs / ramdisk

Allowed but flagged with a startup WARN. Use only for ephemeral CI scratch — there is no durability and the host will OOM as the cache fills.

Disk / inode exhaustion

Silo checks free space and inode count before accepting a PUT. A write that would drop free space below silo.storage.reserved-free-bytes or free inodes below silo.storage.reserved-free-inodes is rejected with HTTP 503 before the body is read.

If the kernel returns ENOSPC (no blocks) or EDQUOT (quota) anyway, Silo:

Unreserves the in-flight allocation
Logs ERROR with key prefix and bytes
Returns 503 to the client

Cross-filesystem renames (storage root spans two mounts) trigger AtomicMoveNotSupportedException. Silo catches it, falls back to copy+delete with a WARN log, and tells you to move the root onto a single filesystem.

Process / OS

File descriptor limit. Default ulimit -n 1024 is inadequate. Set LimitNOFILE=65536 in systemd, or --ulimit nofile=65536 on Docker. Silo logs the limit at startup and WARNs if it’s < 4096.
Single process per data root. Enforced by FileChannel.tryLock() on .silo.lock plus SQLite’s own WAL lock. Refuse-to-start with ERR: storage root locked by PID N on conflict.
TIME_WAIT exhaustion. Netty enables SO_REUSEADDR by default. For very high-churn CI clusters set net.ipv4.tcp_tw_reuse=1 on the host.

Ktor / Netty (3.5)

Setting	Value	Why
`SO_BACKLOG`	512	Bursty CI traffic overruns the default 128
`requestReadTimeout`	60s	Slowloris mitigation
PUT body handling	`call.receiveChannel()`	Never `receive<ByteArray>` — would buffer the whole body
`Expect: 100-continue`	Honored out of the box	Auth/size pre-check responds `100` or rejects without reading the body
HTTP/1.1 keep-alive	Unlimited per connection	Ktor default; CI clients reuse heavily

The PUT-streaming and Expect: 100-continue behavior above is in force today. The SO_BACKLOG / requestReadTimeout socket tunings are intended hardening targets — the shipped build starts Netty with engine defaults; tightening these is tracked for a follow-up.

Filesystem watchers — not used

Silo deliberately does not watch the storage tree with WatchService / inotify:

inotify drops events silently under load (above ~8K events/sec on default tuning).
macOS FSEvents and Linux inotify behave differently; cross-platform parity is fragile.
The on-demand reconciliation sweep (POST /api/storage/reconcile) plus the on-read ENOENT fallback gives stronger self-healing for far less complexity.

Drift is detected lazily on every GET (a missing blob purges its row and returns 404). To force a full pass, trigger the reconcile endpoint — there is no automatic periodic sweep.

Performance budget

CI (bench.yml) gates these on every nightly run. Regression > 10% fails the build.

GET hit p99 < 50 ms for 1 MB on commodity SSD
PUT p99 < 100 ms for 1 MB
RSS < 200 MB idle, < 500 MB under 100 rps mixed
Sustained 500 MB PUT/GET must hold streaming throughput without heap growth