Running browser agents in CI

Oct 2025

The most valuable QA check is the one that exercises a real journey on every release: signup with a new account, login with an existing account, add items to a cart and checkout, or fill a critical form. These flows test not just your code but the integration of your code with third-party services, payment processors, and email systems.

From CI, triggering a run is simple: one authenticated POST to BrowserPilot's API with your task description, output schema, and template (if you have saved one). You get back a run id immediately and do not have to wait.

Now you have two options: wait for a signed webhook (better, more reliable) or poll the run endpoint (simpler, but more API calls). Either way, you get the result and can decide whether to proceed with the deploy.

If the run completes successfully, you get the result (did the signup work, did the login succeed, was the checkout successful). You can assert on the result in your CI pipeline and move forward.

If the run fails, you have something better than a timeout or a cryptic error. You have a step-by-step replay with screenshots. You can see exactly which step failed, what the browser saw, what the agent tried. You can reproduce it locally if needed (the same input should fail the same way), or you can look at the result and understand the issue.

This is more powerful than a traditional end-to-end test because you are testing against a real target (your live site, staging, or a test environment), not a mock. You are testing the full integration, and you have a rich record of what happened if it fails. When a deploy breaks the checkout flow, you do not just get a CI failure — you get a full session recording showing exactly what went wrong.