Have you ever opened a legacy service, followed one method call, then another, then another, and suddenly you're reading a 900-line class that handles billing, emails, analytics, and one mysterious flag nobody wants to remove?
That's exactly where AI feels tempting. You want to paste the file into an agent and say, "Please make this sane." I get it. I've had that feeling too.
But legacy code is not just bad code. Legacy code is code with history. Some of that history is ugly. Some of it is business-critical. AI can help, but only if you make it respect the ghosts in the machine.

Legacy Code Is A Museum Of Business Rules
Legacy code often looks messy because it survived real customers, real production incidents, and real deadlines.
A weird if statement might represent a payment gateway edge case. A duplicated query might support an old reporting screen. A strange default value might exist because a mobile app version from three years ago still sends incomplete payloads.
Treat legacy code like an old city. The roads may look irrational until you learn where the rivers, hills, and old walls used to be.
First, Ask AI To Explain
Before asking for changes, ask for a map:
Read this service and explain:
1. The main responsibility of the class.
2. The external systems it depends on.
3. The business rules you can infer.
4. The riskiest parts to change.
Do not suggest code changes yet.
That one instruction changes the whole interaction. You're not asking the AI to be a hero. You're asking it to be a careful analyst.
A good AI response should mention uncertainty. If it says everything is obvious, be suspicious. Legacy code is rarely obvious.
Tests First, Always
In legacy work, tests are not paperwork. Tests are a seatbelt.
Before refactoring, you need to capture current behavior. Not ideal behavior. Not "what the code should have done." Current behavior. That's the contract you can safely improve around.
Characterization Tests
Characterization tests describe what the existing system does today. They're useful when nobody fully trusts the code but everyone depends on it.
Here's a small example:
public function test_it_does_not_retry_hard_declines(): void
{
$payment = Payment::factory()->declined('stolen_card')->create();
$result = app(PaymentRetryService::class)->shouldRetry($payment);
$this->assertFalse($result);
}
This test doesn't refactor anything. It freezes one important behavior so the AI can't "clean it up" accidentally.
Once you have tests around the risky behavior, AI becomes much safer. Not safe. Safer. Big difference.
Keep Diffs Small Enough To Review
AI is very good at creating big diffs. Unfortunately, big diffs are where legacy systems go to hide bugs.
A giant refactor can look elegant while changing behavior in five places. That's dangerous because reviewers get tired. The larger the diff, the easier it is for one tiny behavior change to sneak through.
Think of legacy refactoring like defusing wires. You don't cut all of them because the bundle looks messy. You isolate one wire, understand it, test it, then move to the next.

A Safer Refactor Sequence
- Add characterization tests. Lock down what the code currently does.
- Extract pure logic. Move calculation or decision logic into small methods.
- Reduce duplication. Only after tests prove behavior.
- Improve naming. Names are cheap and often high-value.
- Change behavior last. Bug fixes should be explicit, reviewed, and tested.
Here's a small extraction that AI can usually handle well:
private function isHardDecline(string $reason): bool
{
return in_array($reason, [
'stolen_card',
'do_not_honor',
'fraud_suspected',
], true);
}
The value here is not the code itself. The value is that the rule now has a name, and a named rule is easier to test, review, and discuss.
Make The AI Show Its Work
When AI changes legacy code, don't accept "I fixed it" as an answer.
Ask for the reasoning, the affected behavior, the tests run, and the remaining risks. This is not about making the model sound smart. It's about forcing reviewable output.
A Useful Review Prompt
Before I review the diff, summarize:
- Which behavior is intended to stay the same.
- Which behavior intentionally changed.
- Which tests prove that.
- Which areas still feel risky.
- Any assumptions you made.
This kind of summary is like a PR description from a careful engineer. It doesn't replace review, but it helps you focus your review.
Also, compare the summary against the diff. AI summaries can be incomplete. The diff is the source of truth.
Don't Let AI Rewrite Architecture In One Pass
One of the funniest and most dangerous AI habits is architectural enthusiasm.
You ask it to fix a bug. It discovers a service locator, old static calls, missing interfaces, and a controller doing too much. Five seconds later, it wants to introduce a new module boundary, repository layer, DTO structure, and event system.
I respect the ambition. I do not merge it.
Common Legacy AI Mistakes
- Inventing abstractions too early. The agent adds interfaces before the team understands the domain.
- Normalizing weird behavior away. It removes edge cases because they look accidental.
- Changing dependency lifetimes. It turns lazy work into eager work or vice versa.
- Breaking old integrations. It assumes current tests cover all external clients.
- Mixing refactor and feature work. That makes review much harder.
A better instruction is:
Do not introduce new architecture.
Make the smallest change that fixes the tested behavior.
If you see larger design issues, list them separately.
That last sentence is powerful. It lets the AI be helpful without turning your bug fix into a surprise rewrite.
Final Tips
The safest AI workflow I've found for legacy code is boring: understand first, test second, change third, review always. It doesn't feel as flashy as "agent refactors entire module," but it keeps you out of trouble.
My opinion: legacy codebases are where senior engineers will get the most value from AI, because seniors know what not to touch. That judgment is the real accelerator.
Good luck with your next legacy cleanup. Move slowly enough to stay fast 👊






