Autonomous Red-Blue Teaming agents that detect, exploit, and fix smart contract bugs via Foundry.
Smart contract security is a high-stakes "Dark Forest," where a single vulnerability can lead to millions in losses. Traditional static analysis tools (like Slither) are noisy, and manual auditing is slow and expensive. SoliForge introduces an autonomous Red-Blue Teaming workflow powered by LLMs. Unlike passive scanners, SoliForge acts as an active adversary and a defender simultaneously: The Red Agent acts as a hacker. It analyzes static reports and writes executable Foundry (.t.sol) exploit scripts to prove the vulnerability exists (e.g., draining a bank via reentrancy). The Blue Agent acts as a senior engineer. It analyzes the successful exploit and patches the source code automatically using CEI (Checks-Effects-Interactions) patterns. The Gatekeeper validates the fix by running regression tests in a sandboxed Docker environment. The loop continues until the contract is mathematically secure against the generated threats. SoliForge turns security auditing from a manual consulting service into an automated, verifiable, and self-healing continuous integration process.
SoliForge is built on a Python backend orchestrated by LangGraph, which manages the stateful interaction between agents. The Tech Stack: Brain: We utilized Alibaba Cloud's Qwen (DashScope) as the core LLM. Its instruction-following capability was crucial for generating valid Solidity code. Orchestration: LangGraph defines the cyclical workflow (Discovery -> Weaponize -> Fix -> Validate). Security Tools: We integrated Slither for initial scanning and Foundry for execution. All compilations and tests run inside isolated Docker containers to ensure host security while allowing the agents to execute arbitrary code. Frontend: React + Vite provides a real-time "Mission Control" dashboard to visualize the battle between Red and Blue agents. The "Hacky" Part (Prompt Engineering): A major challenge was LLMs hallucinating invalid reentrancy attacks (e.g., forgetting to implement the receive() recursion). We solved this by injecting a strict "One-Shot Template" into the Red Agent's system prompt. We force the LLM to follow a specific "Attacker" contract structure, effectively "teaching" the model EVM mechanics (like balance checks vs mapping updates) in real-time. This significantly increased the compilation success rate of AI-generated exploits from <30% to >90%.

