Skip to content
About

Agentic Coding Workflows

A coding agent doesn’t just suggest lines — it takes a task and runs the agent loop: explore the repo, plan, edit files, run tests, fix what broke. This unlocks real delegation. It also demands a different workflow — your job becomes specifying and reviewing, not typing.

You — specify the task + context Agent — explore the codebase Agent — propose a plan review here ◄ Agent — edit files · run tests · fix failures iterate You — review the full diff the real gate ◄

The two human checkpoints — review the plan and review the diff — are where quality is won or lost. Skip them and you’ve automated the writing of code nobody understands.

Agents succeed or fail on the task you hand them. A good agentic task is:

  • Well-specified — clear definition of done, expected behavior, constraints.
  • Verifiable — there’s a concrete way to confirm success, ideally tests.
  • Bounded — a coherent unit of work, not “build the whole feature.”
  • Low-ambiguity — few unstated decisions; you’ve made the design calls.
Poor: "Improve the checkout flow."
Good: "In services/checkout.py, add a retry (3 attempts, exponential
backoff) around the payment-gateway call. Only retry network
and 5xx errors — never on a declined card. Add unit tests for
both paths. Match the retry helper already in lib/http.py."

The good version made the design decisions for the agent. Agents execute well; they decide poorly. Keep the judgment yours.

Most agents will outline an approach first. This is the cheapest place to catch a mistake — redirecting a plan costs a sentence; redirecting a finished 500-line diff costs a rewrite. Read the plan: right files? sound approach? missing an edge case? Correct it now.

An agent’s output is a pull request, and it gets the same scrutiny as a colleague’s — see Working Effectively. Read every file. Run it. Check the edge cases and security. The author being an agent lowers the bar for nothing — and arguably raises it, since the agent has no stake in the result.

Keep diffs reviewable: prefer several small, focused agent tasks over one sprawling one. A 200-line diff gets a real review; a 2,000-line diff gets a rubber stamp.

Agents are far more effective when they can verify their own work and iterate. Give them that loop:

  • Point them at the test command so they run tests and fix failures.
  • Ensure linters and type checks run, so mistakes are caught automatically.
  • A task with a clear pass/fail signal (tests, a build) lets the agent self-correct before you ever see the result.

This is why a strong test suite is now a force multiplier for AI-assisted development: it’s the agent’s feedback loop and your safety net.

PitfallWhy it happensCounter
Over-scopingVague, sprawling taskSmall, bounded, well-specified tasks
Rubber-stamp reviewDiff too large to absorbKeep diffs small; review every line
Plausible-but-wrongOutput reads well, logic is offRun it; test edges; understand it
Context gapsAgent didn’t see a key fileName the relevant files and conventions
Lost-thread loopingAgent flails on a hard taskStop it; re-specify; or do it yourself
Skill erosionDelegating the thinking, not the typingKeep owning design and judgment

A coding agent runs the explore–plan–edit–test loop, shifting your role to specifying and reviewing. Hand it tasks that are well-specified, verifiable, bounded, and low-ambiguity — make the design decisions yourself. Review the plan before code (the cheapest fix) and the diff like any PR (the real gate). Keep diffs small. Give the agent a test command so it can self-correct. You remain fully accountable for every line that merges.