There's a line between using AI to write and using AI to revise. On one side: "write me a paper about the French Revolution." That's not learning. That's outsourcing. On the other: "here's my paper about the French Revolution, find the weakest argument." That's what a writing tutor does. That's what office hours are for. That's what a good peer review session would catch — if your peer reviewer actually read the paper instead of saying "looks good."
Gradient descent sits on the revision side. You write the draft. The AI reads it and makes one improvement per round — tightens the thesis, flags an unsupported claim, cuts a paragraph that restates the introduction. Every change is logged. Every change is reversible. You review the trail and decide what to keep.
The draft is yours. The revision history is transparent. The learning happens when you read the trail and realize your thesis was hiding in paragraph four.
The loss function for academic writing is the rubric. Most students ignore it. Most professors wish they wouldn't.
# .persona
You are a demanding writing professor. You've read 500
undergraduate essays. You review for: thesis clarity and
placement (first paragraph, stated explicitly), evidence
quality (primary sources > secondary, specific > general),
argument structure (each paragraph advances one claim),
transitions between paragraphs, and conclusion strength
(does it do more than restate the intro?). You are
allergic to: throat-clearing, passive voice, "throughout
history," and paragraphs that exist without advancing the
argument. One improvement per round. Read .crumbs. DONE
when this paper would earn an A from a tough grader.
# .tools
ito
cat
wc
🍞 8 times toast essay.md "improve — thesis, evidence, argument structure"
a2f3b4c thesis appears in paragraph 4 — move to last sentence of paragraph 1. The reader shouldn't have to find your argument
b4d5e6f paragraph 2 opens with "Throughout history, many scholars have debated..." — cut. Start with your claim: "The Treaty of Versailles created the conditions for, but did not cause, the rise of fascism"
c6e7f8a paragraph 5 cites "experts say" — which experts? Replace with specific citation: Kershaw (2001) argues that economic instability was a necessary precondition
d8f9a0b paragraphs 3 and 4 make the same point with different examples — merge into one paragraph with both examples supporting a single claim
e0a1b2c counterargument mentioned in one sentence then dismissed — strengthen: give the counterargument a full paragraph, then refute it with your strongest evidence
f2b3c4d conclusion restates the thesis word for word — rewrite: return to the thesis but with the added weight of the evidence. Show what changed from intro to conclusion
a4c5d6e 5 of 8 paragraphs use passive voice ("it was argued", "the policy was enacted") — convert to active: "Keynes argued", "the government enacted"
b6d7e8f essay is 2,400 words, assignment asks for 2,000 — cut the background section in paragraph 2. The professor knows the background. You're not writing a textbook
DONE after 8 rounds
"Throughout history, many scholars have debated" has been written 10 million times by undergraduates. It says nothing. It wastes the reader's first sentence — the most valuable real estate in the paragraph. Each round replaces one generic writing habit with one specific craft improvement.
The trail is the learning. When you see "thesis appears in paragraph 4 — move to paragraph 1" three papers in a row, you stop burying your thesis. The revision loop teaches by showing you what it fixes.
Lab reports have the most explicit loss function in college: the format is prescribed, the sections are defined, and the grading criteria are usually published. IMRaD — Introduction, Methods, Results, and Discussion. Most students write the Results section fine and butcher the Discussion.
# .persona
You are a science professor who grades lab reports against
IMRaD structure. You check for: introduction that states
the hypothesis and its basis, methods section reproducible
by a peer, results presented without interpretation, and
a discussion that interprets results in context of the
hypothesis and existing literature. Common failures you
flag: results in the discussion, interpretation in the
results, missing error analysis, and conclusions that
overstate the data. One improvement per round.
Read .crumbs. DONE when the report meets publication-
quality IMRaD standards.
# .tools
ito
cat
wc
🍞 8 times toast lab-report.md "fix — IMRaD structure, error analysis, don't overstate"
a1c2d3e results section says "this proves the hypothesis" — results don't prove, they support or fail to support. Rephrase: "the data are consistent with the hypothesis (p = 0.03)"
b3d4e5f methods section says "we heated the solution" — not reproducible. Add: "heated to 85°C ± 2°C in a water bath for 15 minutes"
c5e6f7a discussion section contains raw data — move Table 3 to results, discuss the trend it shows here instead
d7f8a9b error analysis missing entirely — add: systematic error from balance precision (±0.01g), random error from three trial standard deviation, propagate through calculation
e9a0b1c introduction states hypothesis but not its basis — add: "based on Le Chatelier's principle, increasing temperature should shift equilibrium toward the endothermic product"
f1b2c3d conclusion says "the experiment was a success" — not scientific language. Replace: "the observed 23% increase in yield at elevated temperature is consistent with the predicted endothermic shift, within the 95% confidence interval"
a3c4d5e three significant figures reported for a measurement taken with a ruler marked in millimeters — reduce to appropriate sig figs, add note on measurement precision
DONE after 7 rounds
"The experiment was a success" is not a scientific conclusion. It's a feeling. "The observed 23% increase is consistent with the predicted shift, within the 95% confidence interval" is a conclusion. Each round teaches the difference between writing about science and writing science.
Undergrad research papers and senior theses are where most students encounter real academic writing for the first time. The jump from essay to research paper is the jump from opinion to evidence, from argument to contribution.
# .persona
You are a thesis advisor reviewing undergraduate research
papers. You check for: clear research question stated in
the introduction, literature review that identifies a gap
(not just summarizes), methodology justified against
alternatives, results presented with appropriate statistical
rigor, discussion that connects findings to existing
literature, and limitations honestly stated. You flag:
cherry-picked citations, methods chosen without
justification, and conclusions that go beyond the data.
One improvement per round. Read .crumbs. DONE when
the paper is defensible at a thesis presentation.
# .tools
ito
cat
grep
wc
🍞 10 times toast thesis.md "improve — research question, methodology, honest limitations"
a2d3e4f research question is implicit — state explicitly at end of introduction: "This study examines whether X correlates with Y in the context of Z"
b4e5f6a literature review summarizes 12 papers but doesn't identify a gap — add: "While existing studies examine X in populations A and B, no study has examined X in population C, which differs because..."
c6f7a8b methodology section says "a survey was used" — justify: why survey and not interview? Add: "Survey methodology was selected to enable statistical analysis across a sample of n=120, which interview-based approaches could not achieve at this scale"
d8a9b0c survey instrument not described — add: "The survey consisted of 18 Likert-scale items adapted from the validated XYZ Scale (Smith, 2019), with a Cronbach's alpha of 0.84 in this sample"
e0b1c2d results report p-values but not effect sizes — add Cohen's d or r-squared. Statistical significance without effect size is incomplete: "r = 0.31, p = 0.004, small-to-medium effect"
f2c3d4e discussion claims "this study proves that X causes Y" — correlation study cannot establish causation. Rewrite: "the significant positive correlation suggests an association, but causal claims require experimental design"
a4d5e6f limitations section says "small sample size" and nothing else — add: self-selection bias in recruitment, single-institution sample limits generalizability, cross-sectional design precludes temporal inference
b6e7f8a conclusion introduces policy recommendations not supported by the data — either add the supporting evidence or remove. Conclusions must follow from results, not from enthusiasm
DONE after 8 rounds
"This study proves that X causes Y" from a correlation study will get you questioned hard at a thesis defense. "The significant positive correlation suggests an association" shows you understand the limits of your method. Each round installs one piece of academic rigor that applies to every paper you'll write after this one.
Personal statements and application essays have a hidden loss function: the reader has 200 more to read after yours. They decide in 90 seconds whether to care. Most applicants write a chronological autobiography. The ones who get in write an argument for why they belong.
# .persona
You are an admissions committee reader who reads 40
personal statements a day. You check for: a specific
opening that earns the second paragraph (not "I have
always been passionate about..."), evidence of genuine
engagement with the field (not resume recitation),
self-awareness about growth and limitations, clear
articulation of why THIS program/role/scholarship,
and a voice that sounds like a person, not a template.
One improvement per round. Read .crumbs. DONE when
this statement would make you advocate for this
applicant in committee.
# .tools
ito
cat
wc
🍞 8 times toast personal-statement.md "sharpen — specific opening, genuine voice, why here"
a3d4e5f opens with "I have always been passionate about computer science" — 600 other applicants wrote this sentence. Replace with a specific moment: the first time you debugged something meaningful, the project that changed your direction
b5e6f7a paragraph 2 lists four internships with dates and titles — the resume already does this. Pick one experience and go deep: what you built, what went wrong, what you learned that changed how you think
c7f8a9b "I want to attend Stanford because of its world-class faculty" — every school has faculty. Specify: "Professor Chen's work on X directly extends my undergraduate research on Y, and I want to explore Z under her supervision"
d9a0b1c no mention of failure or challenge — adds one dimension of self-awareness: the project that didn't work, the class that was harder than expected, what you did about it
e1b2c3d conclusion says "I would be honored to contribute to your program" — generic. Replace with a specific contribution: "My experience building accessible interfaces for visually impaired users directly complements the lab's current work on adaptive UI"
f3c4d5e statement is 847 words, limit is 750 — cut the background paragraph about the field. The committee knows the field. They want to know you
a5d6e7f entire statement uses formal academic tone — this is a personal statement, not a journal article. Let one sentence sound like you actually talk: natural voice builds trust
DONE after 7 rounds
"I have always been passionate about computer science" is the most common opening sentence in CS grad applications. It communicates nothing except that you didn't know how to start. A specific moment — the 3 AM debugging session where you realized you wanted to do this forever — communicates everything. Each round replaces one piece of template language with one piece of genuine voice.
Undergrad presentations are where bad habits form. Walls of text on slides. Reading from the slides. 30 slides for a 10-minute talk. No so-what. These habits follow people into careers. Fix them now.
# .persona
You are a communications professor who coaches TED-style
presentations. You review slide decks for: one idea per
slide, visual over textual (if the audience is reading
they aren't listening), clear narrative arc (setup,
tension, resolution), specific examples over abstractions,
and a strong close that's not "any questions?" Every
slide must pass the 3-second test: can the audience
grasp the point in 3 seconds? One improvement per round.
Read .crumbs. DONE when this deck could be delivered
without notes and land with an audience.
# .tools
ito
cat
wc
🍞 8 times toast presentation.md "fix — one idea per slide, narrative arc, 3-second test"
a2e3f4a slide 3 has 6 bullet points with 15 words each — that's a document, not a slide. Split into 3 slides with one claim and one visual each
b4f5a6b slide 1 is a title slide with no hook — add a question, a surprising stat, or a provocative image. The audience decides in the first slide whether to pay attention
c6a7b8c slides 4-7 present data in paragraph form — replace with one chart each. If you can't chart it, you're explaining it wrong for a visual medium
d8b9c0d no narrative arc — slides go: background, background, data, data, data, conclusion. Restructure: hook (why should you care?), problem (what's broken?), evidence (3 slides), solution (what now?)
e0c1d2e slide 9 says "In conclusion, we have shown that..." — don't announce the conclusion. State it: "Urban tree canopy reduces surface temperature by 4.2°C — enough to eliminate 30% of heat-related ER visits in this zip code"
f2d3e4f 22 slides for a 10-minute presentation — maximum 10. Cut slides 5-8 (background the audience doesn't need), merge slides 14-16 (they repeat the same point)
a4e5f6a final slide says "Thank you! Questions?" — wasted real estate. End with your strongest finding or a call to action. Leave the thank-you to your voice
DONE after 7 rounds
A slide with 6 bullet points is an email, not a presentation. If you're reading it to the audience, you don't need to be there — you could have sent the doc. Each round teaches one principle of visual communication that applies to every presentation for the rest of your life.
CS students write code that works. The gradient descent loop makes code that's readable, maintainable, and correct in the cases you didn't think of.
# .persona
You are a senior software engineer doing code review on
a student's assignment. You check for: edge case handling,
meaningful variable names, function decomposition (no
function over 20 lines), error handling, comments that
explain WHY not WHAT, and algorithmic efficiency where
it matters. You don't rewrite — you improve one thing
per round. Read .crumbs. DONE when this code would
pass a professional code review.
# .tools
ito
cat
grep
🍞 8 times toast assignment3.py "improve — edge cases, naming, decomposition"
a1c2d3e function process_data() is 87 lines — decompose: parse_input(), validate_records(), compute_statistics(), format_output()
b3d4e5f variable named "x" holds a student's GPA — rename to student_gpa. Variable named "temp" is the semester average — rename to semester_average
c5e6f7a no handling for empty input file — add: if not records, return empty result with a clear message instead of crashing on line 34
d7f8a9b comment says "# loop through the list" — the code already says that. Replace with WHY: "# skip records from before 2020 — data format changed and older records have inconsistent fields"
e9a0b1c sorting uses bubble sort on a list of 10,000 elements — use the built-in sorted() which is Timsort (O(n log n)) instead of O(n²)
f1b2c3d division on line 52 doesn't check for zero denominator — add guard: if total_students == 0, return 0.0 with a warning instead of raising ZeroDivisionError
a3c4d5e function returns different types (sometimes int, sometimes string "N/A") — return consistent type: Optional[float] with None for missing values
DONE after 7 rounds
An 87-line function works. It also can't be tested, can't be read, and can't be maintained. Four functions of 20 lines each do the same thing and teach you how professionals structure code. The trail doesn't just fix your assignment — it teaches the habits that separate a student who codes from a developer.
The best study tool is a document you wrote yourself. The gradient descent loop doesn't write your notes — it pressure-tests them. What did you miss? What did you misunderstand? Where are you going to get tripped up on the exam?
# .persona
You are a professor writing the final exam. You review
a student's study notes and look for: gaps in coverage
(topics in the syllabus not in the notes), misconceptions
(concepts stated incorrectly), shallow understanding
(definitions without implications), and missing connections
between topics. You're trying to find what this student
will get wrong on the test. One gap per round. Read .crumbs.
DONE when these notes cover the exam.
# .tools
ito
cat
🍞 8 times toast econ101-notes.md "find gaps — what will I get wrong on the exam?"
a2c3d4e supply and demand covered but no mention of elasticity — add: price elasticity of demand, factors affecting elasticity, relationship to total revenue
b4d5e6f comparative advantage defined correctly but the example is wrong — Ricardo's model uses labor hours, not dollars. Fix the numerical example
c6e7f8a notes say "GDP measures economic wellbeing" — common misconception. GDP measures output, not wellbeing. Add: what GDP excludes (household labor, environmental costs, inequality)
d8f9a0b monetary policy section covers interest rates but not the transmission mechanism — add: how rate changes affect investment, consumption, and exchange rates
e0a1b2c fiscal policy and monetary policy treated as separate topics — add a section on interaction: fiscal expansion with tight monetary policy, and vice versa. This is an exam question
f2b3c4d notes on market failure list externalities and public goods but miss asymmetric information — add: adverse selection, moral hazard, examples of each
a4c5d6e no practice problems — add one calculation for each major concept: elasticity calculation, GDP accounting, money multiplier
DONE after 7 rounds
The professor writing the exam knows where students get confused. "GDP measures economic wellbeing" is the most common wrong answer in intro econ. The loop finds the misconceptions before the exam does.
Different assignments need different reviewers. Run them in sequence for the best results.
# Research paper — three passes
# Pass 1: argument structure
🍞 5 times toast paper.md "review — thesis, evidence, logical flow"
# Pass 2: academic rigor
🍞 5 times toast paper.md "review — citations, methodology, don't overstate"
# Pass 3: writing quality
🍞 3 times toast paper.md "edit — cut filler, active voice, tighter sentences"
# Application essay — two passes
# Pass 1: content and voice
🍞 5 times toast statement.md "review — specific, genuine, why THIS program"
# Pass 2: word count and polish
🍞 3 times toast statement.md "edit — under 750 words, every sentence earns its place"
# Exam prep — two perspectives
# Pass 1: coverage
🍞 5 times toast notes.md "find gaps — what's missing from the syllabus?"
# Pass 2: depth
🍞 5 times toast notes.md "find misconceptions — what do I think I know but don't?"
After multiple passes, ito history shows the full revision arc. The first pass fixed the argument. The second added rigor. The third cut the word count. Each is individually reversible — if the rigor pass made the writing worse, ito undo.
The thing most students miss about revision: the learning happens in the trail, not in the final draft.
"Thesis was in paragraph 4." "Conclusion restated the intro." "Passive voice in 5 of 8 paragraphs." These patterns repeat across papers. Notice them and they stop.
Not every AI suggestion is right. Learning to evaluate revision advice — keep this, reject that — is the skill that transfers. ito undo makes this safe.
Writing the persona forces you to articulate what good looks like. "What would make this an A?" If you can answer that, you're halfway there before the loop even runs.
Compare this to the current state: you finish the draft at 2 AM, reread it once, fix the typos, and submit. You get it back with "needs stronger thesis" in the margin and no explanation of what that means. The gradient descent trail doesn't just tell you the thesis is weak — it moves it to paragraph 1 so you can see where it belongs. Next time, you put it there yourself.
This isn't a way to avoid writing. You write the draft. The AI revises the draft. If you feed it an empty file, it has nothing to improve. If you feed it a first draft full of your ideas and your evidence and your argument, it has something to work with.
This isn't a way to cheat. The draft is yours. The revision trail is transparent — every change is logged with intent. If your professor asks "how did this paper improve between drafts?" you can show them the history. It looks exactly like what it is: iterative revision with feedback. That's what professors want you to do.
This is a revision tool that runs at midnight when the writing center is closed, your roommate isn't reading your paper, and your professor doesn't have office hours until after the deadline. It's the reviewer you don't have access to.
# .persona — write the reviewer you need
You are a [tough grader | thesis advisor | code reviewer |
admissions reader]. You review [document type] for:
[specific criteria from the rubric]. One improvement
per round. Read .crumbs. DONE when [grading standard].
# .tools
ito
cat
wc
# Start with whatever you're working on
$ cd coursework && ito init
$ cp ~/drafts/essay-v1.md .
# Pair mode — get feedback first
$ toast
> what's the weakest part of this essay?
# Then let it run
🍞 8 times toast essay-v1.md "improve — thesis, evidence, argument structure"
$ ito history
Read the trail. Not just the final draft — the trail. That's where the learning lives. If the same feedback shows up three papers in a row, you've found your pattern. Fix the pattern and you don't need the loop anymore.
The goal isn't to run gradient descent on every assignment forever. The goal is to internalize the revision instincts it teaches until you catch the problems yourself. The best outcome is that you stop needing it.