Unix Reimagined | toast
LinuxToaster is a tool kit of opinionated CLI tools that save you time
toast — composable AI. jam — AI native shell.
toasted — local inference. basket — network for agents
Install on Mac or Linux
curl -sSL linuxtoaster.com/install | sh
Click to copy
$20 dropin — includes $20 in inference credits. Top off any time. BYOK and local inference is FREE
Slices Leverage built-in personas, Coder, Sys, Writer - or create your own.
BYOK support: OpenAI · Anthropic · Google · Mistral · Groq · Cerebras · Perplexity · xAI · OpenRouter · Together
Local support: Ollama · MLX · LM Studio · KoboldCpp · llama.cpp · vLLM · LocalAI · Jan
toast — AI in your terminal
Pipe text in, get intelligence out. Works with every Unix tool you already use.
Get the command you need
Describe what you want in plain English. Get the exact command.
Understand anything
Legacy code. Config files. Cryptic logs. Get explanations.
Diagnose your system
Not sure which tab is burning the CPU? Ask.
PID 75517 — Safari WebContent, 45.3% CPU. Kill it: kill 75517
Terminal Chat
When you need a back-and-forth. Pull files into context with @.
> @models.py explain this
sure, the file contains...
> what does function...
iMessage & Telegram
Build your own AI assistant in one line of code. Answers texts, maintains your calendar, keeps notes.
Edit & transform files
toast reads files, writes patches, and works with any format.
Slices — The name is the interface
Specialized AI personas, kaizened for specific tasks. No prompt engineering. Just type the name.
# Subscribe to a slice
$ toast --add Coder
# Now use it by name — pipe, redirect, chat. It all works.
$ cat api.py | Coder "add error handling"
$ git diff | Reviewer
$ Sys "why is my disk full"
$ cat error.log | Debug
Coder
Expert programmer. Writes clean, idiomatic code. Reviews, refactors, explains.
Sys
Unix/Linux systems expert. Shell commands, configs, debugging, performance tuning.
Writer
Technical writer. Documentation, READMEs, comments, commit messages.
Reviewer
Strict code reviewer. Finds bugs, security issues, style problems. Doesn't sugarcoat.
Debug
Error whisperer. Analyzes stack traces, logs, error messages. Finds root causes.
Security
Security analyst. Finds vulnerabilities, reviews auth flows, suggests hardening.
Plus SQL, Git, Test, Refactor, Explain, API, and more. Or create your own — drop a .persona file in any project directory.
jam — The AI shell that doesn't fight you
No quoting nightmares. No expansion. No $ surprises. What you type is what you get. Type something that isn't a command, and the AI answers.
# Strings just work. No escaping.
🍞 echo "The price is $100"
The price is $100
# Environment variables. Explicit words, not sigils.
🍞 set API_KEY sk-abc123
🍞 get API_KEY
sk-abc123
# Built-in RPN for math. No bc, no expr.
🍞 100 2 / 3 *
150
# Not a command? AI answers instead of "command not found".
🍞 what processes are using port 8080
lsof -i :8080
Loops in plain English
Gradient descent for documents. Bounded or unbounded. The AI can decide when it's done. Add a cap for safety.
AI fallback chain
Builtin → RPN → PATH → AI. Type eixt and the AI tells you it's exit. The shell understands intent, not just syntax.
Did you mean: exit
Per-project context
.history walks up from cwd. Different project, different history, different AI behavior. Zero config.
# per project directory
toasted — A local brain for your laptop
A from-scratch inference daemon for Apple Silicon. ~1,800 lines of C++, no Python. A 30B-parameter model running at ~100 tok/s generation, ~400 tok/s prompt reading. Zero cost per token. Zero data exposure. 128 GB RAM supports 8-bit, 6-bit, and 4-bit quantization. 64 GB supports 4-bit.
# Start the daemon. Model loads once, stays hot in GPU memory.
$ toasted start
# toast auto-detects local inference. Same interface as cloud.
$ toast "explain quicksort"
# Pipe chains and chat work locally.
$ cat auth.py | Security "audit this"
$ git diff | Reviewer
~100 tok/s generation
Mixture-of-experts routes through 8 of 512 experts per token — the knowledge of all 512 at the cost of 8. Speeds typically associated with a 7B dense model, from a 30B-parameter brain.
~400 tok/s prompt reading
Chunked batch prefill processes context in 32-token chunks. 17K tokens prefills in ~44 seconds instead of 7 minutes. 56× faster than our first implementation.
Session cache — 0.6s to first word
Only the last message is new. toasted hashes prior conversation, restores cached state, prefills just the delta. A 125× improvement in time-to-first-token.
Written in C++, not Python
Built against Apple's MLX C++ API with a hand-tuned Metal kernel for DeltaNet. No Python startup, no fragile environments. The model is a single file.
True privacy
Air-gapped environments, regulated industries, security-conscious teams. Your code never leaves the machine. No API keys. No internet required.
Zero marginal cost
The daemon loads the model once into unified memory. Metal shaders stay compiled. Cache stays warm. Every subsequent request is free — just electricity.
Requires Apple Silicon Mac. 128 GB unified memory supports 8-bit, 6-bit, and 4-bit quantization. 64 GB supports 4-bit. When toasted is running, toast automatically defaults to local inference. Cloud models still available with -p provider.
The Basket — AIgents that see each other on the network
UDP multicast. Every jam instance on the subnet hears it. No broker. No server. No configuration. This is the nervous system.
# Send a message to every machine on the network
🍞 send status deploying
# Listen for a specific key — blocks until match
🍞 listen status
web3:status deploying
# An AI agent that monitors and summarizes the network
🍞 while listen | toast "summarize this event"
# Wait for 3 nodes to report ready
🍞 3 times listen ready
Three linuxtoaster boxes running jam are three islands — unless they can talk to each other. send and listen turn them into a fleet. No etcd. No consul. No Kubernetes. Just multicast.
Four scopes. Each is a word. Each composes with pipes.
Power Users
Simple for beginners. Deep for experts. The toaster grows with you.
Custom Slices
Drop a .persona file in any project. Your own AI specialist, zero config.
Pipe chains
Compose like Unix. Chain multiple transforms.
Project context
Drop a .crumbs file. AI knows your stack.
Edit a book
Iterative refinement. Each pass reads, learns, decides, refines. Gradient descent for prose.
Edit a book until done.
Let the AI decide when it's done. Loops until the command signals completion. Add a cap for safety.
@file injection
In chat mode, pull files into context on the fly. Multi-file supported.
Any model
One interface, many providers. Compare models without changing your workflow.
Local Inference
Start toasted and toast will use it for local inference. You can also use Ollama, MLX, LM Studio, KoboldCpp, llama.cpp, vLLM, LocalAI, as inference provider. Full privacy, no internet required.
Usage stats
Token counts and latency per provider. Tracked locally via mmap, zero overhead.
Git hooks, log monitoring, CI/CD
# Pre-commit code review
git diff --cached | Reviewer || exit 1
# Real-time error diagnosis
tail -f app.log | grep ERROR | toast "diagnose"
# Auto-generate docs
find . -name "*.py" | xargs cat | toast "generate API docs" > API.md
Pricing
$20 to get started. $49/mo for the full stack. $2,995/year for teams.
Get Started
one-time
- toast
- $20 in AI credits included
- All Slices & custom Slices
- BYOK & local models free
- All updates
- Community Support
Expense it. Try it. Sell your boss on it.
Pro
- Everything in Get Started
- toasted local inference daemon
- 30B-parameter model included
- ~100 tok/s — zero cost per token
- Session caching & model updates
- jam shell
- basket networking
- Priority Support
Apple Silicon Mac · 64 GB (4-bit) or 128 GB (4/6/8-bit)
Team Software License
- Everything in Pro
- Software License for your Team
- AI inference via BYOK, local, or credits
- Local network AI agent coordination
- All software updates for the year
- Priority Support
- Consulting & seminars available
- On-premise option
Enterprise Software License
- Everything in Team
- Software License for your Organization
- Unified inference billing available
- Multi network AI agent coordination
- Dedicated support
- On premise seminar option
- Forward Deployed Engineers option
FAQ
How does it work?
Lightweight toast talks to local toastd, which keeps an HTTP/2 connection pool to linuxtoaster.com. Written in C to minimize latency. With BYOK, toastd connects directly to your provider—your traffic never touches our servers.
What's BYOK?
Got a PROVIDER_API_KEY set for Anthropic, Cerebras, Google Gemini, Groq, OpenAI, OpenRouter, Together, Mistral, Perplexity, and/or xAI? Use toast -p provider. Zero config.
What's the difference between Personal and Team?
Personal is $20, self-serve. Team is $2,995/yr — Software license only. AI inference via BYOK or credits. Consulting billed separately. Priority support. On-premise options available. Talk to us.
Can I run it fully offline?
Yes. Use any local backend—Ollama, MLX, LM Studio, KoboldCpp, llama.cpp, vLLM, LocalAI, or Jan. No internet, no API keys, full privacy.
What's a Slice?
A specialized AI persona, a slice through the latent space, a perspective. Coder knows code. Sys knows Unix. Writer writes docs. Or create your own with a .persona file.
What's jam?
A shell rebuilt for AI. No quoting, no expansion, no $ syntax. Strings just work. Unrecognized input goes to the AI. Includes set/get for env vars, while/times for loops, RPN math, and a UDP multicast basket for multi-machine coordination.
What's toasted?
A from-scratch local inference daemon for Apple Silicon. Written in C++ against Apple's MLX API. Loads a 30B-parameter model once, serves requests via Unix socket. ~100 tok/s generation, ~400 tok/s prefill, 0.6s time-to-first-token with session caching. 128 GB supports 8/6/4-bit quantization, 64 GB supports 4-bit.
Where's my data stored?
Locally. Context in .crumbs, conversations in .chat. Version them, grep them, delete them. Your machine, your files.
macOS? Windows?
macOS and Linux today. Windows WSL works.
What about consulting?
Consulting is available for teams that want hands-on help with deployment, integration, or training. Enterprise accounts have a Forward Deployed Engineering option.
How does billing work?
You are paying a single software license to use the quickly growing LinuxToster system of software tools. AI inference for Slices or unified billing is charged based on use. BYOK or local inference carries no cost. You may choose to pay for consulting. You may choose to pay the monthly cost of a FDE.