Gradient Descent for Planning

LinuxToaster · March 2026 · linuxtoaster.com

The previous posts in this series — biopharma, finance, gov contracting, aviation — all had a structural advantage: the loss function was published by a regulator. The FDA, the SEC, the FAA, the contracting officer — someone external defined what "better" meant.

Planning doesn't have that. There's no 14 CFR for a business plan. No Section M for a product roadmap. No ICH guideline for a marketing strategy. The loss function is whatever you decide it is.

That turns out to be fine. You just have to write it down.

The Shift

In regulated domains, the persona references an external standard: "review against ICH E6(R3)" or "verify per NIST 800-171." The standard is published. The AI checks the document against it. The loss function is objective.

In planning, you define the standard. The persona becomes the strategy. The prompt becomes the criteria. You're not checking against a regulation — you're stress-testing against your own definition of good.

This sounds weaker. It's not. Most plans fail because nobody wrote down what the plan was trying to achieve, who it was for, and what would make it bad. The act of writing the persona — defining the loss function — forces the clarity that most planning processes skip.

# Regulated domain — external loss function
🍞 8 times toast protocol.md "improve — ICH E6(R3) compliance"

# Planning domain — you ARE the loss function
🍞 8 times toast strategy.md "improve — achievable in 6 months, team of 4, no new hires"

The mechanism is identical. The only difference is where the criteria come from.

Business Plans

A business plan doesn't need to comply with a regulation. It needs to be honest, specific, and survivable. Those are criteria you can write down.

# .persona
You are a skeptical advisor who's seen 200 startups fail.
You review business plans for: unsupported assumptions,
vague go-to-market language, unrealistic timelines,
missing unit economics, and any sentence that could
apply to any company. You pressure-test, not polish.
One weakness per round. Read .crumbs. DONE when every
claim is either substantiated or removed.

# .tools
ito
cat
wc

🍞 10 times toast business-plan.md "stress-test — assumptions, unit economics, specificity"

a2f3b4c  "large and growing market" — replace with: $4.2B TAM (source: Gartner 2025), 12% CAGR, SAM is the $340M segment of mid-market logistics
b4d5e6f  CAC stated as $150 but based on paid channels only — add organic, referral, and blended CAC with channel mix assumptions
c6e7f8a  "we will achieve product-market fit by Q3" — this isn't a plan, it's a hope. Replace with: 3 design partners signed, 2 LOIs, PMF criteria defined as X% retention at 90 days
d8f9a0b  revenue projection shows $2M ARR in month 18 — work backwards: that's 133 customers at $15K ACV, current pipeline has 12 leads. Add pipeline build assumptions
e0a1b2c  team section says "experienced founders" — replace with specific relevant experience: "CEO ran logistics ops at Flexport, CTO built the routing engine at Convoy"
f2b3c4d  competitive section says "no direct competitors" — there are always competitors. Add: incumbent workarounds (spreadsheets, manual process), adjacent tools, and why they don't solve this
a4c5d6e  "capital efficient" with a $3M seed raise and 18-month runway at $167K/mo burn — that's 4 engineers and an office. State the team size and what gets built in 18 months
b6d7e8f  exit strategy section is speculative — remove or replace with: comparable acquisitions in the space with multiples (company X acquired for 8x ARR)
DONE after 8 rounds

The persona isn't a regulator. It's the investor who's going to read this and ask the hard questions. The loss function isn't compliance — it's "would I fund this?" Each round removes one weakness before the pitch meeting.

Product Roadmaps

Roadmaps fail for three reasons: they promise too much, they don't connect to outcomes, or they're not honest about constraints. All three are testable.

# .persona
You are a product leader who ships. You review roadmaps
for: scope realism given stated team size and timeline,
outcome clarity (what changes for the user?), dependency
identification, and sequencing logic. You cut scope
mercilessly. If a feature doesn't connect to a measurable
outcome, it doesn't belong. One improvement per round.
Read .crumbs. DONE when the roadmap is shippable as
written with no unstated assumptions.

# .tools
ito
cat
wc

🍞 8 times toast roadmap-h2.md "sharpen — scope realism, outcomes, constraints"

a1c2d3e  Q3 milestone lists 6 features but team is 3 engineers — at historical velocity of 2 features/quarter, cut to 3 and prioritize
b3d4e5f  "improve onboarding" has no success metric — add: reduce time-to-first-value from 14 days to 3 days, measured by activation event in analytics
c5e6f7a  feature 4 depends on new API from partner — no timeline from partner confirmed. Mark as blocked, add fallback plan, remove from critical path
d7f8a9b  "AI-powered recommendations" listed as 2-week task — that's a 2-month project minimum. Either scope to a rule-based MVP or move to Q4
e9a0b1c  three items marked P0 — if everything is P0 nothing is. Stack rank: auth migration is the only true P0 (blocks revenue), others are P1
f1b2c3d  no mention of tech debt or infrastructure work — at current pace the deploy pipeline is the bottleneck. Add: CI/CD improvements in week 1-2
a3c4d5e  outcome for search feature says "users can search" — that's a capability, not an outcome. Rewrite: "reduce support tickets about finding items by 40%"
DONE after 7 rounds

The loss function here is "can we actually ship this?" That's a question most roadmap processes don't ask until it's too late. The persona encodes the constraints: team size, timeline, historical velocity. Each round removes one piece of fiction.

Marketing Strategy

Marketing plans are where vague language goes to hide. "Build brand awareness." "Leverage social media." "Create thought leadership." These aren't strategies — they're word clouds.

# .persona
You are a performance marketer who only cares about
measurable outcomes. You review marketing plans for:
channel-specific tactics with expected CAC, conversion
assumptions with sources, content strategy tied to
pipeline stages, and budget allocation justified by
historical or benchmark data. Cut any sentence that
couldn't be turned into a dashboard. One improvement
per round. Read .crumbs. DONE when every tactic has
a number attached and every number has a source.

# .tools
ito
cat
wc

🍞 8 times toast marketing-plan.md "sharpen — measurable, channel-specific, budget-justified"

a2d3e4f  "increase brand awareness" — not measurable. Replace with: grow organic search impressions from 12K to 30K/mo by targeting 15 bottom-funnel keywords
b4e5f6a  social media budget of $40K/quarter with no channel breakdown — split: $25K LinkedIn (ICP is enterprise buyers), $10K retargeting, $5K testing new channels
c6f7a8b  content strategy says "publish regularly" — specify: 2 case studies/month (sales enablement), 1 technical blog/week (SEO), quarterly benchmark report (lead gen)
d8a9b0c  email nurture sequence described but no conversion assumptions — add: 20% open rate, 3% CTR, 0.5% MQL conversion, based on industry benchmarks for B2B SaaS
e0b1c2d  "attend 4 trade shows" with $60K budget — add expected pipeline per show. If $15K/show and historical pipeline is $40K, that's a 2.7x return. State the assumption
f2c3d4e  no mention of attribution model — add: first-touch for awareness channels, multi-touch for ABM, and state what tool tracks this
a4d5e6f  partnership marketing section says "explore strategic partnerships" — name 3 target partners, the co-marketing tactic for each, and the shared audience size
b6e7f8a  budget totals $200K but expected pipeline not stated — add: target $1.2M influenced pipeline at blended 6x return, broken down by channel
DONE after 8 rounds

"Explore strategic partnerships" is not a plan. "Co-publish a benchmark report with Datadog targeting their 14K enterprise customers in our ICP" is a plan. The persona's loss function is simple: if it can't go on a dashboard, it doesn't belong in the document.

Hiring Plans

Hiring plans live at the intersection of budget, timeline, and organizational design. Most are wish lists. Good ones are capacity models.

# .persona
You are a VP of People who's scaled three companies from
20 to 200. You review hiring plans for: role justification
tied to business outcomes, realistic timeline given market
conditions, leveling consistency, compensation market data,
and organizational design coherence. Every headcount must
answer "what breaks if we don't hire this person?" One
improvement per round. Read .crumbs. DONE when every
role is justified and the plan is executable.

# .tools
ito
cat
wc

🍞 8 times toast hiring-plan-h2.md "stress-test — justification, timeline, org design"

a3d4e5f  "senior engineer — backend" lacks business justification — add: API response times degrading, current team at 110% capacity, this role unblocks the Q3 platform migration
b5e6f7a  plan shows 3 hires starting in July but assumes 30-day time-to-fill — market average for senior engineers is 67 days, adjust start dates or begin sourcing now
c7f8a9b  two PM roles at different levels reporting to same director — one is Staff, one is Senior. Clarify: is Senior intended to grow into Staff, or are these parallel? Org chart unclear
d9a0b1c  data scientist role budgeted at $140K but market comp for the requirements listed (ML + 5 years) is $180-210K. Either adjust budget or reduce requirements
e1b2c3d  "marketing generalist" is too vague for sourcing — split into two specific outcomes: demand gen specialist (pipeline) and content marketer (SEO/thought leadership)
f3c4d5e  5 hires in engineering, 0 in infrastructure — at this ratio, deploy velocity will bottleneck by Q4. Add SRE or platform engineer
a5d6e7f  no mention of onboarding capacity — current eng manager has 8 directs, adding 3 more makes 11. Either hire a second manager first or stagger start dates
DONE after 7 rounds

The manager with 11 direct reports. The comp budget that won't land the candidate described. The timeline that assumes a market that doesn't exist. Each round catches one gap between what the plan says and what will actually happen.

Project Plans

Project plans fail when they assume everything goes right. Good plans identify where things will go wrong and account for it.

# .persona
You are a project manager who's delivered under pressure.
You review project plans for: critical path identification,
dependency mapping, resource conflicts, buffer adequacy,
risk identification with mitigation, and milestone
definitions that are binary (done or not done, no
percentage complete). One improvement per round.
Read .crumbs. DONE when the plan would survive first
contact with reality.

# .tools
ito
cat
grep
wc

🍞 8 times toast migration-plan.md "harden — dependencies, risks, realistic buffers"

a2e3f4a  data migration task shows 2 weeks but no mention of validation — add 1 week for data reconciliation, this is always where migrations stall
b4f5a6b  testing phase allocated 3 days for integration testing across 4 systems — historically this takes 2 weeks minimum. Extend or reduce scope of integration
c6a7b8c  go-live date has no rollback plan — add: rollback decision criteria (>5% error rate on core flows), rollback procedure (DNS cutover, max 15 minutes), point of no return defined
d8b9c0d  three tasks assigned to same engineer in same sprint — resource conflict. Move API refactor to sprint 3 or assign second engineer
e0c1d2e  vendor integration milestone says "API available" — but vendor hasn't confirmed date. Add: fallback if API delayed (mock endpoints for dev, slip go-live by 2 weeks)
f2d3e4f  milestone "backend complete" is ambiguous — redefine: all endpoints passing CI, load tested at 2x expected traffic, staging environment green
a4e5f6a  no communication plan for the cutover weekend — add: stakeholder notification 2 weeks prior, war room channel, status updates every 2 hours during migration
b6f7a8b  risk register lists "data loss" but mitigation is "backups" — specify: full backup 1 hour before cutover, verified restore test on staging, backup retention 30 days
DONE after 8 rounds

"Backend complete" is not a milestone. "All endpoints passing CI, load tested at 2x traffic, staging green" is a milestone — it's binary, observable, and unambiguous. Each round replaces one assumption with one plan for when that assumption fails.

OKRs and Goal Setting

OKRs are supposed to be measurable by definition. Most aren't. The gradient descent loop is fast here because the loss function is simple: is this measurable or not?

# .persona
You are an OKR coach who's seen every failure mode. You
review objectives and key results for: measurability,
ambition calibration (is this a sandbagged target or a
moonshot?), outcome vs output confusion, alignment between
team OKRs and company OKRs, and whether the KRs actually
indicate progress toward the objective. One fix per round.
Read .crumbs. DONE when every KR is measurable, every
objective is an outcome, and the set is coherent.

# .tools
ito
cat

🍞 6 times toast okrs-q3.md "fix — measurable, outcomes not outputs, calibrated"

a1c2d3e  KR: "improve customer satisfaction" — not measurable. Replace: increase NPS from 34 to 42 by end of Q3
b3d4e5f  KR: "launch new onboarding flow" — that's an output, not an outcome. Replace: reduce time-to-activation from 14 days to 5 days via new onboarding flow
c5e6f7a  Objective: "be the best platform in our category" — not falsifiable. Replace: "become the default tool for mid-market logistics teams"
d7f8a9b  KR targets are all sandbagged — "grow revenue 5%" when last quarter grew 12%. Recalibrate: 15% is ambitious but achievable, 5% is a baseline not a goal
e9a0b1c  team OKR on "API reliability" doesn't connect to company objective on "customer expansion" — add the link: reliable API enables self-serve integration, which drives expansion revenue
DONE after 5 rounds

Writing Your Own Loss Function

The pattern across all of these is the same: put the criteria in the persona. The more specific the criteria, the better the loop works.

Weak loss functions produce weak results:

# Too vague — toast doesn't know what "better" means
🍞 8 times toast plan.md "improve this"

# Slightly better but still ambiguous
🍞 8 times toast plan.md "make this more strategic"

Strong loss functions produce strong results:

# Specific criteria, specific constraints
🍞 8 times toast plan.md "improve — achievable with 4 engineers in 6 months, every milestone binary, every dependency named"

# Multi-dimensional with priorities
🍞 8 times toast plan.md "stress-test — realistic timeline, honest about risks, no hand-waving on the hard parts"

# Persona-driven with a point of view
🍞 8 times toast plan.md "review as a Series A investor — would you fund this? find the weakest assumptions"

The loss function is the prompt. The better you define "better," the better the loop performs. In regulated domains, someone else did this work for you. In planning, you do it yourself. The upside is that you can define criteria that no regulator would think of — criteria specific to your company, your market, your constraints, your definition of success.

Composing Loss Functions

You can run multiple passes with different personas. Each pass optimizes for a different dimension. The file accumulates improvements across all of them.

# First pass: financial viability
🍞 5 times toast plan.md "stress-test — unit economics, burn rate, runway math"

# Second pass: execution realism
🍞 5 times toast plan.md "stress-test — team capacity, timeline, dependencies"

# Third pass: market honesty
🍞 5 times toast plan.md "stress-test — competitive positioning, TAM specificity, customer evidence"

# Fourth pass: communication clarity
🍞 5 times toast plan.md "edit — tighten prose, kill jargon, every paragraph earns its place"

After four passes, ito history shows the full arc: financials tightened, execution gaps closed, market claims substantiated, prose cleaned up. Each pass had a different loss function. The file absorbed all of them. The trail shows which pass found what.

The Difference

In regulated domains, the loss function is external and convergent. There's a right answer — the document either complies or it doesn't. The loop converges on compliance.

In planning, the loss function is internal and directional. There's no "compliant" — there's only "more honest," "more specific," "more achievable." The loop doesn't converge on a fixed point. It converges on the best version of the plan given the criteria you defined.

This means two things. First: the persona matters more. In a regulated domain, a mediocre persona still works because the external standard does the heavy lifting. In planning, the persona is the standard. Write a weak persona, get weak improvements. Second: you'll want to review the trail more actively. In compliance work, you can trust most changes because there's an external reference. In planning, the AI is applying your judgment — you should check that it applied it well.

But the mechanism is the same. File is the state. Prompt is the loss function. .crumbs is the gradient history. ito makes every step reversible. The content — plans, strategies, roadmaps, OKRs — doesn't matter to the loop. It's all text being iteratively refined toward a goal you defined.

Getting Started

# Step 1: write the loss function
# Ask yourself: what would make this plan bad?
# Write that down. That's your persona.

# .persona
You are a [skeptical advisor | experienced operator |
demanding investor]. You review [document type] for:
[specific criteria]. You [cut scope mercilessly | demand
evidence | flag unrealistic timelines]. One improvement
per round. Read .crumbs. DONE when [threshold].

# .tools
ito
cat
wc

# Step 2: run the loop
$ cd planning && ito init
$ toast
> help me draft a product roadmap for H2

# Step 3: let it refine
🍞 8 times toast roadmap-h2.md "stress-test — shippable with current team, no fiction"
$ ito history

Read the trail. If the improvements are weak, the loss function is weak — tighten the persona. If the improvements are good but you disagree, ito undo. If they found something you missed, the loop is working.

Regulated domains had an advantage: someone else defined "better." Planning has a different advantage: you can define "better" to mean exactly what you need it to mean. No one knows your constraints better than you. Write them down. Make them the loss function. Run the loop.