Smelt compiles a small model, a prompt, and a JSON schema into one sealed artifact that runs entirely on your machine. It starts in ~100 ms with no daemon, its output parses by construction, and once installed it works offline, forever — no API keys, no telemetry, no cloud.
Local-first by construction: stdin → stdout is the entire data path. Apple Silicon · no Python.
A registry is a Hugging Face repo. Agents are ~46 KB — the base model downloads once and is shared by every agent built on it. Every install command below works today, from the live registry.
Run an agent and this is the whole story. Output goes to stdout, diagnostics to stderr, and every cost is measured — promises you can check, not claims you have to trust.
Every agent is built on an open model that runs entirely on your Mac. Supported is a contract here, not a vibe: each family ships with structure, output-parity, and throughput gates that every release must pass. Decode speeds measured on an M2 Max.
The catalog agents aren't special — each one is a persona file and a JSON schema sealed into a model. The same four commands take a model you choose to an agent anyone can install by name.
Build a 4-bit quant straight from the original Hugging Face weights — or import a GGUF you already have, the format most local checkpoints ship in. Q4-family imports keep their original quantization.
A persona, prefilled into the model's state so it costs nothing at run time, and a JSON schema, compiled into the decoder so the output parses by construction.
stdin in, schema-valid JSON out, ~100 ms cold with no daemon. Pipes, xargs, and cron work on day one.
Publishing writes a static registry — any Hugging Face repo can host it. The agent itself is tens of KB; installs share the base model already on the machine.
Small models are fast enough to be programs.
Smelt treats them like programs.
Pipes are composition, xargs is parallelism, cron is scheduling. Fifty years of shell tooling works on day one — because as far as the shell can tell, an agent is just another process.
Everything expensive — shaders, tokenizer, grammar, prompt — compiles at build time into the artifact. The runtime maps the file and goes. Warm repeats hit ~50 ms with --linger.
The schema is compiled into the package as an llguidance token trie; invalid tokens are masked during decode. jq downstream can't be handed garbage — and it costs 2.5%, not a retry loop.
Weights, grammar, and prompt live in one artifact on your disk. It works on a plane, survives the registry disappearing, and never phones home — which is why agents like redact are credible locally and self-defeating in the cloud.