How detailed should reproduction steps be?

Detailed enough that a developer with zero context reproduces the bug on the first try, and no more. That usually means a known entry state (logged out, fresh tab, a specific URL), then unambiguous actions naming the exact element and value, with expected versus actual stated at the failure point. Mozilla advises minimizing the steps to the shortest sequence that still triggers the bug and including any special setup. Over-described reports bury the signal; under-described ones bounce back as 'need more info'.

Why can't developers reproduce some bugs even with steps?

Because steps capture actions, not state. The same clicks can pass or fail depending on the API response, feature flags, timing, cached data, viewport, or account. An empirical study of Firefox and Eclipse found non-reproducible reports make up about 17% of all bugs and stay open roughly three months longer than reproducible ones. Roughly 14% of those failures simply lacked the information needed to reproduce. The fix is to attach the surrounding state: console output, the exact network request, and a session replay of what actually happened.

What is the difference between reproduction steps and a test case?

Reproduction steps are written for a human to follow by hand and describe a path that currently produces wrong behavior. A test case is a codified, automatable assertion of correct behavior that a machine runs on every commit. Good repro steps are the raw material a test case is built from: once you can reliably reproduce a bug, you can encode that path as a failing regression test, fix the code, and watch the test go green. AI agents increasingly do this conversion directly, turning a reproduction into a failing test before patching.

Can an AI coding agent reproduce a bug from the steps?

Increasingly, yes, but only when the steps come with state. Cursor shipped a /debug command in April 2026 aimed at bugs that are 'hard to reproduce or understand', where the agent forms hypotheses, adds log statements, and uses runtime information to localize the fault. The limit is input quality: an agent reading prose steps alone still guesses at the triggering state. Feeding it the console, the network request, and a session replay alongside the steps turns 'reproduce this for me' into a deterministic task instead of a hunt.

Glossary

What Are Reproduction Steps? How to Write Steps That Actually Repro

Q: What are reproduction steps in a bug report?

Reproduction steps (often shortened to 'repro steps' or 'steps to reproduce', STR) are the ordered, numbered actions that reliably trigger a bug, written from a known starting state so anyone can recreate the failure. They are paired with the expected result and the actual result. Mozilla's bug-writing guidelines call them the single most important part of any report, because a bug a developer can reproduce is a bug that is very likely to get fixed.

Reproduction steps are the ordered actions that reliably trigger a bug from a known starting state. Here is how to write steps that actually repro, and why steps alone fail 17% of the time.

ManviJun 5, 20265 min read

Glossary

A horizontal rail of five numbered step-nodes threaded by a lime tracer line, the last node cracking into a failure burst that continues into a console, network, and replay state cluster read by an AI agent glyph

Definition

Reproduction steps are the ordered, numbered actions that reliably trigger a bug, written from a known starting state so anyone can recreate the failure. They are paired with the expected result and the actual result observed at the point things break.

You will also see them as repro steps, steps to reproduce, or the acronym STR. The triad is always the same: a numbered path of actions, what you expected to happen, and what actually happened. GitHub bakes exactly these three fields into its issue-form templates — 'Steps to reproduce', 'Expected behavior', 'Actual behavior' — making the structure the de facto standard enforced at file time across millions of repositories. This page defines STR and the reasons it fails; for the full six-field bug report it lives inside, see the bug report template guide.

Why it matters

Mozilla's Bug Writing Guidelines are blunt about it: 'Steps to reproduce are the most important part of any bug report. If a developer is able to reproduce the bug, the bug is very likely to be fixed.' The same page warns of the inverse — 'if the steps are unclear, it might not even be possible to know whether the bug has been fixed.' Reproduction is the gate. Everything downstream, from triage priority to the regression test, depends on a developer being able to make the failure happen on demand.

Good steps share three traits. They start from a known entry state (logged out, a fresh tab, a specific URL) so the reader is not guessing at preconditions. They use unambiguous actions that name the exact element and value — 'enter -1 in the Quantity field', not 'add some items'. And they state expected versus actual at the failure point, because ambiguous expectations are not a cosmetic problem: the data-fusion study of 576 non-reproducible reports found ambiguous or outdated expected behavior drove about 8% of non-reproducibility on its own. Length is a trap in both directions — Mozilla advises minimizing to the shortest sequence that still triggers the bug. Over-described reports bury the signal; under-described ones bounce back as 'need more info'.

A horizontal reproduction recipe: five numbered step-nodes connected by a lime tracer line, the fifth cracking into a failure burst, then continuing into a state cluster of console, network, and replay icons that an AI-agent hexagon reads over a dotted MCP connector — A reproduction recipe: steps 1-5 are the actions; the failure burst is where expected and actual diverge; the state cluster (console, network, replay) is what turns the recipe from 'works for me' into a deterministic repro.

Here is the part most guides skip. Steps reliably bounce because steps capture actions, not state. The same five clicks can pass or fail depending on the API response, a feature flag, timing, cached data, the viewport, or the account. The 'Works for me!' empirical study of Firefox and Eclipse quantified the cost: non-reproducible reports are about 17% of all bug reports and stay active roughly three months longer than reproducible ones. The data-fusion follow-up found roughly 14% of Eclipse's non-reproducible reports simply lacked the information required to reproduce. Notably, 66% of the non-reproducible reports that were eventually fixed had in fact been reproduced once enough information finally arrived — proof that the missing ingredient is almost always state, not effort.

How this shows up in a real BugMojo bug report

In a BugMojo report the steps do not arrive alone. The browser extension records the failure with the surrounding state attached — an rrweb session replay of exactly what the user did, the console output, and the network request that fed the data. So 'Step 4: click Checkout, expected order confirmation, actual blank page' sits next to the precise POST /api/cart response that returned an empty cart, and the replay shows the click that triggered it. The prose steps become a recipe you can re-run, not a description you have to reconstruct. That is the difference between a developer reproducing on the first try and a ticket that ping-pongs for three months.

State matters even more once the reader is an agent. Cursor shipped a /debug command in April 2026 aimed squarely at bugs that are 'hard to reproduce or understand', where the agent generates hypotheses, adds log statements, and uses runtime information to localize the fault — and its Bugbot reports that '70%+ of flags get resolved before merge'. Reproduction is becoming an agent task, not just a human one. But an agent reading prose steps alone still guesses at the triggering state. BugMojo's MCP server hands the agent (Claude Code, Cursor) the steps plus the replay, console, and network bundle, which is the gap between 'reproduce this for me' as a hunt and the same request as a deterministic task.

Feature	Capability	BugMojo	Issue tracker + test tool (Jira/TestRail)
Structured STR + expected/actual fields	—	✓	✓
rrweb session replay captured with the steps	—	✓	—
Console + exact network request attached	—	✓	Manual attachment
Steps + state handed to an AI agent over MCP	—	✓	—
Formal test-case library with run history	—	—	✓
Deep workflow / sprint / Jira admin	—	—	✓

Two-sided: BugMojo ships steps with the state that produced them over MCP, but it is not a manual-test-case manager or a CI test runner.

Ship the steps with the state that produced them

Install the extension

Frequently asked questions

Sources

Bug Writing Guidelines — Steps to reproduce are 'the most important part of any bug report' — Mozilla / Bugzilla (2025)
Works for me! Characterizing non-reproducible bug reports (Firefox + Eclipse empirical study) — Mozilla Foundation / Empirical Software Engineering (2022)
Why are Some Bugs Non-Reproducible? An Empirical Investigation using Data Fusion — arXiv (ICSME 2020) (2021)
CLI Debug Mode and /btw Support — /debug for bugs 'hard to reproduce or understand' — Cursor / Anysphere (2026-04-14)
Configuring issue templates — Steps to reproduce / Expected / Actual fields — GitHub Docs (2026)
Cursor Bugbot — '70%+ of flags get resolved before merge' — Cursor / Anysphere (2026)

Get bug-tracking insights, weekly.

Engineering deep-dives, QA playbooks, and honest tool comparisons. No spam — unsubscribe in one click.