BugMojoBugMojoBugMojo
FeaturesPricingBlogGuidesAbout
Log inGet started
BugMojoBugMojo

Bug reports that actually help fix bugs — capture, replay, share.

A product of Softech Infra.

Product

  • Features
  • Pricing
  • Get started
  • Log in

Resources

  • Blog
  • Guides
  • Compare
  • Glossary

Company

  • About
  • Contact
  • Privacy
  • Sitemap
  • Engineering
  • Playbooks
© 2026 BugMojo. All rights reserved.
AllGuidesEngineeringPlaybooksCompareGlossaryAlternativesBy roleBug tracking by framework
  1. Home
  2. Blog
  3. Guides
  4. Steps to Reproduce: The Skill That Separates Good Bug Reports From Ignored Ones
Bug reporting

Steps to Reproduce: The Skill That Separates Good Bug Reports From Ignored Ones

Steps to reproduce is the one bug-report skill worth drilling: the right altitude, an explicit known state, expected-vs-actual at the failing step, and a captured replay for the intermittent ones agents can act on.

ManviManvi·Jun 5, 2026·10 min read
Guides
Thin lime line-art browser window dissolving into a replay timeline, with three numbered repro-step nodes snapping toward a glowing error marker
TL;DR
  • Steps to reproduce is the single bug-report skill worth drilling — it is the field most often missing when a bug dies as 'works for me'.
  • Four rules: correct altitude, an explicit known state, expected vs actual at the failing step, and the right granularity (4-8 steps).
  • Written steps capture what you did, not the state you were in — which is why complete-looking steps still fail to reproduce.
  • For intermittent bugs, stop reconstructing and start capturing: a replay carries the DOM, console, and network so a human or an AI agent can act on the exact failing session.

Every other field on a bug report is negotiable. A vague title gets renamed in triage; a missing screenshot gets requested in a comment. The steps to reproduce are different — when they are wrong, the bug does not get fixed, it gets closed. A developer who cannot make the bug happen on their machine has two moves: ask you for more information, or mark it 'works for me' and move on. Both cost days.

This guide is about that one skill in isolation. Not the whole report template (that lives in the six-field bug report companion), and not how replay serialization works under the hood (covered in session replay for debugging). Just the writing: how to pitch the altitude, anchor the starting state, mark the exact failure, and — the part most guides skip — what to do for the bugs that refuse to reproduce on command.

What makes steps to reproduce good?

Good steps to reproduce let someone with zero context reproduce the bug on the first try. They start from an explicit known state, use 4-8 numbered steps with unambiguous click targets, and state expected-versus-actual at the exact step that fails. The measure is reproducibility, not word count. When steps alone cannot encode the environment, a captured replay carries it instead.

That definition is doing more work than it looks. 'Zero context' rules out shorthand only you understand. 'First try' rules out steps that work two times in five. 'Explicit known state' is the rule everyone forgets, and it is the one that breaks reproduction most often. We will take each apart below — but first, the evidence that the repro gap is real and expensive.

The repro gap is the most common reason bugs die

The strongest data on this is academic, not vendor marketing. A 2022 empirical study in Empirical Software Engineering examined 576 non-reproducible bug reports — 250 from Mozilla Firefox Core and 326 from Eclipse JDT — and identified 11 distinct factors behind non-reproducibility. The two most common were insufficient information in the report and inter-environmental differences. When a bug landed as non-reproducible, developers responded one of two ways: close it outright, or solicit more information through long, counter-productive manual searches.

Read that back. The single most common thing missing when a bug dies as 'works for me' is the thing this page is about — a usable reproduction. Not a cleverer fix, not a better stack trace. The steps.

The artifact

A 2026 field write-up, 'The bug report that took four days to fix,' documents a representative case: four days, three engineers, five back-and-forth exchanges for a bug whose actual fix took thirty minutes — and an average of 2.3 rounds of clarification per bug across the sprint. The gap between communication time and fix time is the entire argument for getting the repro right the first time.

That field artifact is one team's anecdote, not a survey — treat it as illustrative. But it names the failure mode precisely: the fix was cheap; the round-trips to reach a reproducible state were not. Tricentis's 2025 Quality Transformation Report (2,700+ software-delivery practitioners) frames the same loop at industry scale — 33% of organizations point specifically to poor communication and weak feedback loops between developers and testers as a top quality problem, while 40% say poor software quality costs them $1M or more annually. The dev/tester feedback loop is exactly where steps to reproduce live.

Where the four-day bug actually went (one documented case)
Clarification + back-and-forth
230
Waiting / reassignment
100
Actual fix
30

Values are minutes-equivalent, derived from the documented case (2.3 average clarification rounds, a thirty-minute fix). The shape is the point: the fix is a sliver; the repro gap is the bar. Source: QA meets AI, March 2026.

The four rules of a reproducible repro

1. Get the altitude right

Altitude is the level of detail. Fly too high — 'checkout broke' — and the developer has no path to start. Fly too low — 'I was on the 14:32 train, also playing music, and I think I'd had this tab open since Monday' — and you have buried the one relevant detail in ten irrelevant ones. The non-reproducibility study found over-described and under-described reports fail for the same reason: neither lets a stranger retrace the path.

The right altitude is the level at which each step is an action another person can perform identically. 'Add an item to the cart' is the right altitude. 'Broke' is too high. 'Move the mouse to coordinate 412, 290' is too low.

2. Anchor an explicit known state

This is the rule that separates steps that reproduce from steps that look complete and do not. Most repros silently assume the state the reporter happened to be in: logged in as an admin, a cart with two items already in it, a feature flag flipped, a stale cache. The reader starts from a different state and the bug never fires.

Begin from a state anyone can recreate. Logged out, or logged in as a named test role. A fresh tab. A specific URL, not 'the dashboard'. If the bug needs data, link the seed script or the test account rather than describing it. The empirical record is blunt here: inter-environmental differences were among the top causes of non-reproducibility, and an unstated starting state is the cheapest one to eliminate.

good-repro.txttext
Environment: Chrome 137, macOS 14.5, viewport 1440x900, build a1b2c3d

1. Sign in as the seeded free-tier user (login: qa+free@example.com / see 1Password)
2. Open https://app.example.com/projects/new   ← fresh tab, not from the dashboard
3. Type "Test" in the Name field
4. Click Save
   → Expected: project is created, redirect to /projects/<id>
   → Actual:   Save button spins indefinitely; no network request fires

Notes: only reproduces on free-tier; paid accounts redirect correctly.

Notice what the block encodes that prose usually drops: the explicit account and tier, the fresh-tab caveat inline at step 2, and a boundary condition (free-tier only) that tells the triager where not to look. That last line alone can save a clarification round.

3. State expected vs actual at the failing step

Put the expectation where the failure happens, not in a separate paragraph at the bottom. The two-line → Expected / → Actual pattern, attached to the exact step that breaks, does two jobs. It pins the failure to one action instead of a vague 'somewhere in this flow'. And it forces you to articulate the feature's intent, which is how 'this is actually working as designed' disagreements get caught before they become ticket ping-pong.

4. Use 4-8 steps, no more

Granularity is a forcing function. If your repro runs to fifteen steps, either it is at too low an altitude (collapse them) or it is actually two bugs (split them). Four to eight numbered steps is the band where a repro stays scannable and still starts from a known state. One ticket, one bug, one lifecycle.

The five repro mistakes triagers see most
  • Implicit starting state. 'Go to the dashboard' — as which user, with what data, from where?
  • 'Sometimes it fails.' Not actionable as steps. Capture it instead (see below).
  • Expected/actual buried at the bottom instead of pinned to the failing step.
  • One ticket, multiple bugs. Each needs its own reproduction and lifecycle.
  • 'DM me your credentials and I'll show you.' Never. Use a seeded account or a shareable replay.

When steps to reproduce are not enough: capture, don't reconstruct

Here is the limit of the skill. Written steps record what you did. They cannot record the state you were in — the browser build, the viewport, the feature-flag matrix, the cache, the account permissions, or a race between two requests that only loses on a slow connection. That is precisely the class of bug the non-reproducibility study found dominates 'works for me' closures, and no amount of careful prose fixes it, because the missing information was never in your fingers to begin with.

For that class, stop reconstructing and start capturing. A session replay records the DOM, console, and network of the actual failing session and lines them up on one timeline, so the environment travels with the report. Steps tell the developer where to click; the replay shows them the click, the failed render, and the stack trace at the same timestamp. They are complementary — lead with a short numbered repro for human scanning, attach the replay as the ground truth for when the steps alone fall short.

FeatureWritten stepsCaptured replay
Scannable in five seconds✓partial
Works for the human triager✓✓
Encodes the exact environment—✓
Survives an intermittent / race bug—✓
Carries console + network at the failure—✓
Readable by an AI agent over MCPtext only✓

Neither column wins outright, which is the honest takeaway. Written steps are unbeatable for a human scanning a queue — a developer reads four numbered lines faster than they scrub a recording. The replay wins everywhere the environment matters and everywhere a machine is the reader. The strongest reports carry both.

Steps to reproduce are becoming an interface an agent executes

Until recently, 'steps to reproduce' had exactly one consumer: a human reading prose. That assumption is breaking. Chrome DevTools MCP, released September 23, 2025, lets an AI coding agent 'navigate, fill out forms, and click buttons to reproduce bugs and test complex user flows — all while inspecting the runtime environment,' reading console logs and network requests on a live page. The repro stopped being only prose a human reads; it became an action an agent performs.

Prose, though, is a weak interface for a machine. An agent can fumble through ambiguous English steps, but it acts reliably on structured context: the replay, the console, the network, and the environment exposed over the Model Context Protocol. Anthropic introduced MCP in November 2024 as an open standard precisely so agents could read external tools and data this way. Feed an agent like Claude Code or Cursor the exact failing session as structured context and it can reproduce and draft a fix without a human re-tracing a single step.

Why this matters more every quarter

GitClear's analysis of the 2024-2025 Google DORA data found AI coding tools move work through the pipeline faster but correlate with more instability — higher change-failure rates, more rework — and that code churn (lines rewritten or reverted within two weeks) roughly doubled, from ~3.3% toward 5.7-7.1%, as AI adoption spread. More machine-written code means more defects that need a fast, machine-readable reproduction. Prose steps do not scale to that; captured, structured repro does.

This is the BugMojo wedge, and it is worth stating plainly because no prose-only repro guide can claim it. BugMojo's browser extension captures the rrweb session replay, console logs, network requests, and screenshot at the moment a bug is reported, and its MCP server exposes that captured reproduction to AI agents as structured context. The steps you write stay useful for the human in the queue; the captured artifact is what an agent reads to act. To be straight about the boundary: BugMojo is a capture-and-repro layer, not a mature production error-monitoring suite — if you need deep release-health dashboards or aggregate crash analytics, that is a Sentry-class job, not this one.

Write the steps for the human in the queue. Capture the repro for the machine that fixes it. The first costs you two minutes; the second costs you a browser click.
BugMojo engineering

A repro checklist you can paste into your template

Before you submit, run the list. If any line is a 'no', the bug is a clarification round away from stalling.

  • Known state — does step 1 start from a state a stranger can recreate (logged out, or a named seeded account)?
  • Altitude — is each step an action another person performs identically, no higher, no lower?
  • 4-8 steps — if longer, is this secretly two bugs?
  • Expected vs actual — stated inline at the exact failing step?
  • Environment — browser, OS, viewport, build (or autocaptured)?
  • Boundary — any 'only on X' condition noted, to tell the triager where not to look?
  • Capture — for anything intermittent or visual, is a replay attached as the ground-truth backup?
⁓ ⁓ ⁓
Stop reconstructing repros from memory
Install the extension

Frequently asked questions

Frequently asked questions

Sources

  1. Works for Me! Cannot Reproduce — A Large Scale Empirical Study of Non-reproducible Bugs — Empirical Software Engineering (Rahman, Khomh, Castelluccio) (2022)
  2. Chrome DevTools (MCP) for your AI agent — reproduce bugs, inspect console + network — Chrome for Developers (Google) (2025-09-23)
  3. 2025 Quality Transformation Report — key findings — Tricentis (2025-05)
  4. Google DORA 2024 — AI impact summary (throughput vs rework/instability) — GitClear (Google DORA data) (2024-2025)
  5. The bug report that took four days to fix — Ali El-Shayeb, QA meets AI (Medium) (2026-03-12)
  6. Introducing the Model Context Protocol — Anthropic (2024-11)
Share:
Manvi
Manvi· QA Tester

Manvi is a Quality Assurance Tester with three years of experience. For her, quality is not just about finding bugs — it is about ensuring the best possible experience for every user.

On this page

  • What makes steps to reproduce good?
  • The repro gap is the most common reason bugs die
  • The four rules of a reproducible repro
  • 1. Get the altitude right
  • 2. Anchor an explicit known state
  • 3. State expected vs actual at the failing step
  • 4. Use 4-8 steps, no more
  • When steps to reproduce are not enough: capture, don't reconstruct
  • Steps to reproduce are becoming an interface an agent executes
  • A repro checklist you can paste into your template

Get bug-tracking insights, weekly.

Engineering deep-dives, QA playbooks, and honest tool comparisons. No spam — unsubscribe in one click.