Install agents like programs, not services.

Smelt compiles a small model, a prompt, and a JSON schema into one sealed artifact that runs entirely on your machine. It starts in ~100 ms with no daemon, its output parses by construction, and once installed it works offline, forever — no API keys, no telemetry, no cloud.

Local-first by construction: stdin → stdout is the entire data path. Apple Silicon · no Python.

zsh

Agents, installed by name

A registry is a Hugging Face repo. Agents are ~46 KB — the base model downloads once and is shared by every agent built on it. Every install command below works today, from the live registry.

tr
triage
smelt install triage@judstephenson/agents
Reads a log line, returns severity, component, summary. tail -f | triage | jq
98mscold start → first token
cm
commit
smelt install commit@judstephenson/agents
Writes the commit message from your staged diff. Works on a plane.
46KBagent download size
rd
redact
smelt install redact@judstephenson/agents
Strips PII before text leaves your machine. A cloud redactor can't make that promise.
0bytessent anywhere, ever
ex
extract
smelt install extract@judstephenson/agents
Invoices, receipts, contacts → typed records. Schema-valid, so the pipe never breaks.
77msinstall, base model cached

Every run is accountable

Run an agent and this is the whole story. Output goes to stdout, diagnostics to stderr, and every cost is measured — promises you can check, not claims you have to trust.

$ tail -1 error.log | smelt run triage | jq .severity exit 0 · 0.41 s total
What that one command did
measured on an M2 Max, release build — reproducible from the repo
Produced valid JSON— the schema is compiled in; invalid output is impossible, no retriescost: 2.5% speed
Answered immediately— no daemon was running; everything was precompiled into the artifactfirst token in 98 ms
Already knew its job— its instructions were baked in at build time, not re-read on every callsaved 180 ms
Shared its model— weights are deduplicated, so 2 installed agents ≈ the size of 1535 MB for 2 agents
Sent nothing anywhere— no API, no telemetry; there is nothing to call home to0 connections

Three model families, compiled to the GPU

Every agent is built on an open model that runs entirely on your Mac. Supported is a contract here, not a vibe: each family ships with structure, output-parity, and throughput gates that every release must pass. Decode speeds measured on an M2 Max.

qw
Qwen 3.5
0.8B · 2B · 4B
The catalog default. Hybrid linear-attention models — the 0.8B is what makes agents feel instant.
~300tok/s0.8B decode
g4
Gemma 4
E2B · E4B
Google's efficient small-model family. More headroom for harder jobs, still comfortably local.
~154tok/sE2B decode
ll
Llama 3.2
1B · 3B Instruct
Meta's instruct-tuned small models — the baseline everything else gets compared to.
≥80tok/s1B release gate floor

From a model to an installed agent

The catalog agents aren't special — each one is a persona file and a JSON schema sealed into a model. The same four commands take a model you choose to an agent anyone can install by name.

01

Get a model

Build a 4-bit quant straight from the original Hugging Face weights — or import a GGUF you already have, the format most local checkpoints ship in. Q4-family imports keep their original quantization.

smelt import Qwen3.5-2B-Q4_K_M.gguf
02

Seal in the job

A persona, prefilled into the model's state so it costs nothing at run time, and a JSON schema, compiled into the decoder so the output parses by construction.

smelt bake model.smeltpkg \ --system-file triage.txt \ --json-schema triage.json
03

Run it like a program

stdin in, schema-valid JSON out, ~100 ms cold with no daemon. Pipes, xargs, and cron work on day one.

tail -1 error.log | smelt run triage | jq .severity
04

Publish it

Publishing writes a static registry — any Hugging Face repo can host it. The agent itself is tens of KB; installs share the base model already on the machine.

smelt publish model.smeltpkg --name triage smelt install triage@you/agents

Small models are fast enough to be programs.
Smelt treats them like programs.

Pipes are composition, xargs is parallelism, cron is scheduling. Fifty years of shell tooling works on day one — because as far as the shell can tell, an agent is just another process.

LATENCY

No daemon. A process that forks in 100 ms.

Everything expensive — shaders, tokenizer, grammar, prompt — compiles at build time into the artifact. The runtime maps the file and goes. Warm repeats hit ~50 ms with --linger.

STRUCTURE

JSON that parses by construction.

The schema is compiled into the package as an llguidance token trie; invalid tokens are masked during decode. jq downstream can't be handed garbage — and it costs 2.5%, not a retry loop.

LOCAL-FIRST

Yours after the install, forever.

Weights, grammar, and prompt live in one artifact on your disk. It works on a plane, survives the registry disappearing, and never phones home — which is why agents like redact are credible locally and self-defeating in the cloud.