Silo — Requirements and Limits
What hardware Silo needs, what it expects from its host, and what to set if you’re tuning for scale.
Hardware requirements
Silo is deliberately light — the JVM heap is the floor and everything else is disk. It runs comfortably on a small VM or a shared CI node.
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 1 vCPU | 2–4 vCPU — request handling is virtual-thread based and scales with cores |
| RAM | 512 MB | 1 GB — RSS stays < 200 MB idle and < 500 MB under 100 rps mixed traffic |
| JVM | 21+ | 21+ (virtual threads, current Ktor; no native image needed) |
| Disk | max-bytes + reserve |
Local SSD/NVMe on ext4/xfs — not NFS (see below) |
| Inodes | max-entries + reserve |
One inode per cache entry; size the filesystem for the entry cap |
| Network | 100 Mbps | 1 Gbps+ for large CI fleets pushing big task outputs |
Sizing the disk. Budget silo.storage.max-bytes for the cache itself plus
silo.storage.reserved-free-bytes (default 5 GB) of headroom — Silo starts
returning 503 once free space drops below the reserve, so a 100 GB cache
wants ~110 GB of volume. At high entry counts inodes run out before bytes do;
size the filesystem accordingly (see Disk / inode exhaustion).
Memory. The Docker image launches with
-XX:+UseG1GC -XX:MaxRAMPercentage=75 -XX:+ExitOnOutOfMemoryError, so the heap
tracks the container’s memory limit — give the container ≥ 768 MB under load and
the JVM sizes itself. PUT/GET bodies are streamed, never buffered, so peak
memory stays flat regardless of artifact size.
Recommended setup at a glance
| Item | Recommendation |
|---|---|
| OS | Linux (any modern distro) or macOS for development |
| Filesystem | xfs (production) or ext4 with dir_index (default since 2008) |
| Mount options | defaults,noatime is fine — Silo tracks access in SQLite |
ulimit -n |
≥ 65,536 |
| Disk free | At least silo.storage.reserved-free-bytes (default 5 GB) over the configured cap |
| Inode free | At least silo.storage.reserved-free-inodes (default 100,000) |
| Network | Behind a reverse proxy for TLS (see docs/tls.md) |
Filesystem
ext4
- Without
dir_index(htree), directory lookup is O(n) and degrades sharply above ~10K entries. - With
dir_index(enabled by default atmkfssince e2fsprogs 1.41), lookups are O(log n) and scale to millions per directory — but the index pages still need to be cached for that to be fast. - Silo shards
cas/{ab}/{cd}/{key}to 65,536 leaf directories. At 1M entries that’s ~15 per leaf. The total file count is what matters for inode budget. - Verify
dir_indexis on:dumpe2fs -h /dev/sdX | grep featuresshould listdir_index.
xfs
- Recommended for production. Allocation groups give parallel I/O; large-file performance is excellent; fragmentation is low on write-once content-addressed workloads.
- No tuning needed for Silo’s usage pattern.
APFS (macOS)
- Case-insensitive by default. Hex cache keys do not collide in practice, but if you’re running test suites that hand-craft uppercase/lowercase variants you’ll want a case-sensitive volume (
diskutil apfs addVolume ... -caseSensitive). - Otherwise fine for development.
NTFS (Windows)
MAX_PATH = 260characters default. Silo’s sharded layout keeps paths short; the storage root path must leave headroom forcas/ab/cd/<128-char-key>(≈ 138 chars).- Long-path opt-in via
\\?\prefix or system policy is on the roadmap for v0.2 if a Windows user needs deep roots.
overlayfs (Docker default writable layer)
rename(2)is atomic within the upper layer. Silo writes the temp file in the same shard dir as the final, so renames never cross layers.- If you bind-mount
/datato a host volume (recommended), the data lives on the host filesystem and overlayfs is bypassed entirely.
NFS — not supported
fsyncsemantics on NFS are weak;rename(2)is not atomic over RPC; locking is unreliable.- Silo refuses to start if the storage root is on
nfs/nfs3/nfs4, detected by walking/proc/self/mountinfo(Linux) and matching the deepest mount point containing the storage root. - On macOS and Windows the detection is a no-op + WARN — there is no
/procequivalent, so we cannot reliably tell. Run with care. - Override (not recommended):
silo.storage.allow-unsupported-fs = truedowngrades the abort to a WARN. Use only if you are absolutely sure your filesystem is not actually NFS (e.g. a containerised test harness that lies tomountinfo). - Need network-shared storage? Wait for the S3 backend (v0.2) or run Silo locally on each node.
tmpfs / ramdisk
- Allowed but flagged with a startup WARN. Use only for ephemeral CI scratch — there is no durability and the host will OOM as the cache fills.
Disk / inode exhaustion
Silo checks free space and inode count before accepting a PUT. A write that would drop free space below silo.storage.reserved-free-bytes or free inodes below silo.storage.reserved-free-inodes is rejected with HTTP 503 before the body is read.
If the kernel returns ENOSPC (no blocks) or EDQUOT (quota) anyway, Silo:
- Unreserves the in-flight allocation
- Logs ERROR with key prefix and bytes
- Returns
503to the client
Cross-filesystem renames (storage root spans two mounts) trigger AtomicMoveNotSupportedException. Silo catches it, falls back to copy+delete with a WARN log, and tells you to move the root onto a single filesystem.
Process / OS
- File descriptor limit. Default
ulimit -n 1024is inadequate. SetLimitNOFILE=65536in systemd, or--ulimit nofile=65536on Docker. Silo logs the limit at startup and WARNs if it’s < 4096. - Single process per data root. Enforced by
FileChannel.tryLock()on.silo.lockplus SQLite’s own WAL lock. Refuse-to-start withERR: storage root locked by PID Non conflict. - TIME_WAIT exhaustion. Netty enables
SO_REUSEADDRby default. For very high-churn CI clusters setnet.ipv4.tcp_tw_reuse=1on the host.
Ktor / Netty (3.5)
| Setting | Value | Why |
|---|---|---|
SO_BACKLOG |
512 | Bursty CI traffic overruns the default 128 |
requestReadTimeout |
60s | Slowloris mitigation |
| PUT body handling | call.receiveChannel() |
Never receive<ByteArray> — would buffer the whole body |
Expect: 100-continue |
Honored out of the box | Auth/size pre-check responds 100 or rejects without reading the body |
| HTTP/1.1 keep-alive | Unlimited per connection | Ktor default; CI clients reuse heavily |
The PUT-streaming and Expect: 100-continue behavior above is in force today. The SO_BACKLOG / requestReadTimeout socket tunings are intended hardening targets — the shipped build starts Netty with engine defaults; tightening these is tracked for a follow-up.
Filesystem watchers — not used
Silo deliberately does not watch the storage tree with WatchService / inotify:
- inotify drops events silently under load (above ~8K events/sec on default tuning).
- macOS
FSEventsand Linux inotify behave differently; cross-platform parity is fragile. - The on-demand reconciliation sweep (
POST /api/storage/reconcile) plus the on-read ENOENT fallback gives stronger self-healing for far less complexity.
Drift is detected lazily on every GET (a missing blob purges its row and returns 404). To force a full pass, trigger the reconcile endpoint — there is no automatic periodic sweep.
Performance budget
CI (bench.yml) gates these on every nightly run. Regression > 10% fails the build.
- GET hit p99 < 50 ms for 1 MB on commodity SSD
- PUT p99 < 100 ms for 1 MB
- RSS < 200 MB idle, < 500 MB under 100 rps mixed
- Sustained 500 MB PUT/GET must hold streaming throughput without heap growth