Checkpoints and recovery: retrying without starting over

Mar 2026

Long browser tasks fail in the middle. A navigation times out, a button moved, the target serves an error page, the network glitches. If the agent is ten steps into a twelve-step task, restarting from the beginning is wasteful and often unreliable — you have to navigate again, authenticate again, handle the same UI again.

Traditional automation engines either restart the whole thing or fail completely. Some platforms try to catch exceptions and retry the one bad step, but that only works if the step is isolated. Most real tasks have dependencies: you cannot retry a form submission if you have not filled the form.

BrowserPilot records checkpoints as the agent progresses. After each successful step, a snapshot of the browser state is saved: the DOM, the cookies, the local storage, the open tabs. If a step fails, the agent has the option to retry from the last checkpoint instead of restarting from the beginning.

When you retry a run, BrowserPilot restores the checkpoint (the browser state at the end of the last successful step), then re-executes the failed step. If there are dependencies in that step — maybe it is querying for data that depends on an earlier step — the context is preserved.

Recovery rate becomes a dashboard metric you can watch per template. What fraction of your runs hit a transient error and recover? What fraction fail outright? You can correlate recovery with time of day, day of week, or target site changes. High recovery rate might mean your targets are flaky; low recovery rate might mean your tasks are brittle.

This is not automatic. If your task says 'click the button three times', the agent does not know when to retry. But if your task says 'fill the form and submit', and submission fails, the agent can be configured to retry the submission step — not the whole form.