Unix Reimagined | toast

LinuxToaster is a tool kit of opinionated CLI tools that save you time
toast — composable AI. jam — AI native shell.
toasted — local inference. basket — network for agents

Terminal
$ ls -al | toast "roast my directory"
Oh wow, 47 node_modules folders? Your disk called, it's crying. And .DS_Store everywhere—you know those do nothing, right?
$

Install on Mac or Linux

curl -sSL linuxtoaster.com/install | sh Click to copy

$20 dropin — includes $20 in inference credits. Top off any time. BYOK and local inference is FREE

Slices Leverage built-in personas, Coder, Sys, Writer - or create your own.

BYOK support: OpenAI · Anthropic · Google · Mistral · Groq · Cerebras · Perplexity · xAI · OpenRouter · Together

Local support: Ollama · MLX · LM Studio · KoboldCpp · llama.cpp · vLLM · LocalAI · Jan

Composable AI

toast — AI in your terminal

Pipe text in, get intelligence out. Works with every Unix tool you already use.

Get the command you need

Describe what you want in plain English. Get the exact command.

toast "how do I delete all .log files older than 7 days"

Understand anything

Legacy code. Config files. Cryptic logs. Get explanations.

cat /etc/nginx/nginx.conf | toast "explain in detail"

Diagnose your system

Not sure which tab is burning the CPU? Ask.

ps aux | toast "which tab is burning the cpu"
PID 75517 — Safari WebContent, 45.3% CPU. Kill it: kill 75517

Terminal Chat

When you need a back-and-forth. Pull files into context with @.

toast
> @models.py explain this
sure, the file contains...
> what does function...

iMessage & Telegram

Build your own AI assistant in one line of code. Answers texts, maintains your calendar, keeps notes.

imessage -e 'imessage | toast | imessage'
toast --telegram

Edit & transform files

toast reads files, writes patches, and works with any format.

find . -name "*.py" -exec toast {} "add type hints" \;
Slices · Personas

Slices — The name is the interface

Specialized AI personas, kaizened for specific tasks. No prompt engineering. Just type the name.

# Subscribe to a slice $ toast --add Coder # Now use it by name — pipe, redirect, chat. It all works. $ cat api.py | Coder "add error handling" $ git diff | Reviewer $ Sys "why is my disk full" $ cat error.log | Debug

Coder

Expert programmer. Writes clean, idiomatic code. Reviews, refactors, explains.

cat utils.py | Coder "add type hints"

Sys

Unix/Linux systems expert. Shell commands, configs, debugging, performance tuning.

Sys "why is port 8080 in use"

Writer

Technical writer. Documentation, READMEs, comments, commit messages.

cat lib.py | Writer "write docstrings"

Reviewer

Strict code reviewer. Finds bugs, security issues, style problems. Doesn't sugarcoat.

git diff | Reviewer

Debug

Error whisperer. Analyzes stack traces, logs, error messages. Finds root causes.

cat error.log | Debug

Security

Security analyst. Finds vulnerabilities, reviews auth flows, suggests hardening.

cat auth.py | Security "audit this"

Plus SQL, Git, Test, Refactor, Explain, API, and more. Or create your own — drop a .persona file in any project directory.

Read more about Slices →
AI native Shell

jam — The AI shell that doesn't fight you

No quoting nightmares. No expansion. No $ surprises. What you type is what you get. Type something that isn't a command, and the AI answers.

# Strings just work. No escaping. 🍞 echo "The price is $100" The price is $100 # Environment variables. Explicit words, not sigils. 🍞 set API_KEY sk-abc123 🍞 get API_KEY sk-abc123 # Built-in RPN for math. No bc, no expr. 🍞 100 2 / 3 * 150 # Not a command? AI answers instead of "command not found". 🍞 what processes are using port 8080 lsof -i :8080

Loops in plain English

Gradient descent for documents. Bounded or unbounded. The AI can decide when it's done. Add a cap for safety.

5 times echo hello
while Editor draft.md "kaizen until done"
7 while Editor draft.md "polish for publishing"

AI fallback chain

Builtin → RPN → PATH → AI. Type eixt and the AI tells you it's exit. The shell understands intent, not just syntax.

🍞 eixt
Did you mean: exit

Per-project context

.history walks up from cwd. Different project, different history, different AI behavior. Zero config.

# AI knows your last 50 commands
# per project directory
Read the full jam story →
Local Inference · Zero Cost

toasted — A local brain for your laptop

A from-scratch inference daemon for Apple Silicon. ~1,800 lines of C++, no Python. A 30B-parameter model running at ~100 tok/s generation, ~400 tok/s prompt reading. Zero cost per token. Zero data exposure. 128 GB RAM supports 8-bit, 6-bit, and 4-bit quantization. 64 GB supports 4-bit.

# Start the daemon. Model loads once, stays hot in GPU memory. $ toasted start # toast auto-detects local inference. Same interface as cloud. $ toast "explain quicksort" # Pipe chains and chat work locally. $ cat auth.py | Security "audit this" $ git diff | Reviewer

~100 tok/s generation

Mixture-of-experts routes through 8 of 512 experts per token — the knowledge of all 512 at the cost of 8. Speeds typically associated with a 7B dense model, from a 30B-parameter brain.

~70–80 English words per second

~400 tok/s prompt reading

Chunked batch prefill processes context in 32-token chunks. 17K tokens prefills in ~44 seconds instead of 7 minutes. 56× faster than our first implementation.

7 tok/s → 394 tok/s

Session cache — 0.6s to first word

Only the last message is new. toasted hashes prior conversation, restores cached state, prefills just the delta. A 125× improvement in time-to-first-token.

75 seconds → 600 milliseconds

Written in C++, not Python

Built against Apple's MLX C++ API with a hand-tuned Metal kernel for DeltaNet. No Python startup, no fragile environments. The model is a single file.

~1,800 lines · compiled step functions · zero dependencies

True privacy

Air-gapped environments, regulated industries, security-conscious teams. Your code never leaves the machine. No API keys. No internet required.

toast "review this classified document"

Zero marginal cost

The daemon loads the model once into unified memory. Metal shaders stay compiled. Cache stays warm. Every subsequent request is free — just electricity.

toast --stats # track local vs cloud usage

Requires Apple Silicon Mac. 128 GB unified memory supports 8-bit, 6-bit, and 4-bit quantization. 64 GB supports 4-bit. When toasted is running, toast automatically defaults to local inference. Cloud models still available with -p provider.

Read the full engineering story →
Networking for AI

The Basket — AIgents that see each other on the network

UDP multicast. Every jam instance on the subnet hears it. No broker. No server. No configuration. This is the nervous system.

# Send a message to every machine on the network 🍞 send status deploying # Listen for a specific key — blocks until match 🍞 listen status web3:status deploying # An AI agent that monitors and summarizes the network 🍞 while listen | toast "summarize this event" # Wait for 3 nodes to report ready 🍞 3 times listen ready

Three linuxtoaster boxes running jam are three islands — unless they can talk to each other. send and listen turn them into a fleet. No etcd. No consul. No Kubernetes. Just multicast.

Same shell
set / get
Environment variables
Same project
.history
Shared AI context
Same machine
pipes
stdin → stdout
Same network
send / listen
UDP multicast

Four scopes. Each is a word. Each composes with pipes.

Power Users

Simple for beginners. Deep for experts. The toaster grows with you.

Custom Slices

Drop a .persona file in any project. Your own AI specialist, zero config.

echo "You are a Django expert" > .persona

Pipe chains

Compose like Unix. Chain multiple transforms.

curl site.com | toast "summarize" | toast "translate to Spanish"

Project context

Drop a .crumbs file. AI knows your stack.

echo "Python 3.11, FastAPI" > .crumbs

Edit a book

Iterative refinement. Each pass reads, learns, decides, refines. Gradient descent for prose.

20 times Editor draft.md "tighten prose, cut filler"

Edit a book until done.

Let the AI decide when it's done. Loops until the command signals completion. Add a cap for safety.

while Editor draft.md "tighten prose"
20 while Editor draft.md "tighten prose"

@file injection

In chat mode, pull files into context on the fly. Multi-file supported.

> @schema.sql @models.py are these in sync?

Any model

One interface, many providers. Compare models without changing your workflow.

toast -p anthropic -m claude-opus-4-5 "explain"

Local Inference

Start toasted and toast will use it for local inference. You can also use Ollama, MLX, LM Studio, KoboldCpp, llama.cpp, vLLM, LocalAI, as inference provider. Full privacy, no internet required.

toast -p mlx -m llama3 "tell me a joke"

Usage stats

Token counts and latency per provider. Tracked locally via mmap, zero overhead.

toast --stats

Git hooks, log monitoring, CI/CD

# Pre-commit code review git diff --cached | Reviewer || exit 1 # Real-time error diagnosis tail -f app.log | grep ERROR | toast "diagnose" # Auto-generate docs find . -name "*.py" | xargs cat | toast "generate API docs" > API.md

Pricing

$20 to get started. $49/mo for the full stack. $2,995/year for teams.

Get Started

$20

one-time

  • toast
  • $20 in AI credits included
  • All Slices & custom Slices
  • BYOK & local models free
  • All updates
  • Community Support

Expense it. Try it. Sell your boss on it.

Teams (<70 seats)

Team Software License

$2,995/yr
  • Everything in Pro
  • Software License for your Team
  • AI inference via BYOK, local, or credits
  • Local network AI agent coordination
  • All software updates for the year
  • Priority Support
  • Consulting & seminars available
  • On-premise option
Contact Team Sales
Enterprise (>70 seats)

Enterprise Software License

$10/seat/mo
  • Everything in Team
  • Software License for your Organization
  • Unified inference billing available
  • Multi network AI agent coordination
  • Dedicated support
  • On premise seminar option
  • Forward Deployed Engineers option
Contact Enterprise Sales

FAQ

How does it work?

Lightweight toast talks to local toastd, which keeps an HTTP/2 connection pool to linuxtoaster.com. Written in C to minimize latency. With BYOK, toastd connects directly to your provider—your traffic never touches our servers.

What's BYOK?

Got a PROVIDER_API_KEY set for Anthropic, Cerebras, Google Gemini, Groq, OpenAI, OpenRouter, Together, Mistral, Perplexity, and/or xAI? Use toast -p provider. Zero config.

What's the difference between Personal and Team?

Personal is $20, self-serve. Team is $2,995/yr — Software license only. AI inference via BYOK or credits. Consulting billed separately. Priority support. On-premise options available. Talk to us.

Can I run it fully offline?

Yes. Use any local backend—Ollama, MLX, LM Studio, KoboldCpp, llama.cpp, vLLM, LocalAI, or Jan. No internet, no API keys, full privacy.

What's a Slice?

A specialized AI persona, a slice through the latent space, a perspective. Coder knows code. Sys knows Unix. Writer writes docs. Or create your own with a .persona file.

What's jam?

A shell rebuilt for AI. No quoting, no expansion, no $ syntax. Strings just work. Unrecognized input goes to the AI. Includes set/get for env vars, while/times for loops, RPN math, and a UDP multicast basket for multi-machine coordination.

What's toasted?

A from-scratch local inference daemon for Apple Silicon. Written in C++ against Apple's MLX API. Loads a 30B-parameter model once, serves requests via Unix socket. ~100 tok/s generation, ~400 tok/s prefill, 0.6s time-to-first-token with session caching. 128 GB supports 8/6/4-bit quantization, 64 GB supports 4-bit.

Where's my data stored?

Locally. Context in .crumbs, conversations in .chat. Version them, grep them, delete them. Your machine, your files.

macOS? Windows?

macOS and Linux today. Windows WSL works.

What about consulting?

Consulting is available for teams that want hands-on help with deployment, integration, or training. Enterprise accounts have a Forward Deployed Engineering option.

How does billing work?

You are paying a single software license to use the quickly growing LinuxToster system of software tools. AI inference for Slices or unified billing is charged based on use. BYOK or local inference carries no cost. You may choose to pay for consulting. You may choose to pay the monthly cost of a FDE.