SafeTest Forge

Table of Contents

This is a list of all the sections in this post. Click on any of them to jump to that section.

SafeTest Forge

TypeScript React Anthropic Node.js Python

SafeTest Forge is an AI-powered test generation and debugging tool built with the Claude Agent SDK. It is a local-first TypeScript tool that generates, runs, repairs, inspects, and rewinds Python pytest tests. Point it at a Python repository, specify a target module, and the agent writes tests, executes them locally, and fixes failures — all while you watch the live event trace in the UI.

The repository is open source. Live Claude runs require your own Anthropic API key, but local development, tests, and smoke evaluation work in fake mode without one. New language support is planned for future releases.

What makes this project interesting is the end-to-end agent loop. The Claude Agent SDK handles streaming events, structured output, checkpoint capture, and subagent definitions. When a generated test fails, the agent reads the error output, diagnoses the issue, and attempts a repair round automatically. If something goes wrong, you can rewind to any checkpoint and inspect exactly what the agent was thinking at that point. The entire flow , from test generation to execution to repair , happens locally on your machine with full visibility into every step.

Current Capabilities

  • CLI-First V1 Flow - Run, cancel, report, trace, and rewind commands through a complete CLI interface
  • Python Repo Validation - Package-shape detection and ambiguous-monorepo rejection without --target
  • Policy Enforcement - Test-only writes and a restricted shell allowlist keep the agent from touching production code
  • Claude Agent SDK Integration - Streaming events, structured output, checkpoint capture, and subagent definitions
  • Deterministic Fake-Agent Path - Unit, integration, and CLI smoke tests run without an API key
  • Local Pytest Execution - Stdout/stderr capture, timeout classification, cancellation polling, and one repair round
  • Live-Run Cancellation - Propagation through the run-level abort controller, including persisted cancel requests observed across processes
  • Flat-File Persistence - Runs, traces, reports, checkpoints, and fake rewind snapshots stored under .safetest-forge/
  • React UI - 2-column layout with color-coded trace badges, phase progress bar, click-to-copy run ID, stat grid report panel, and REST + SSE wiring

For setup instructions, usage details, and architecture overview, see the GitHub repository.

Future Plans

  • Support for additional programming languages beyond Python