project screenshot 1
project screenshot 2
project screenshot 3

Naughty Agents

web-of-trust reviewers blacklisting computer-controlling AI agents

Naughty Agents

Created At

ETHGlobal New York 2025

Project Description

Naughty Agents is a decentralized, human-in-the-loop (HITL) security protocol designed to mitigate the emerging threat of AI agent hijacking, specifically targeting on-chain financial actions.

As AI agents gain autonomy over digital wallets (e.g., Coinbase Server Wallets) to perform tasks like trading, swapping, or paying for services, the risk of manipulation by malicious actors increases. A significant emerging threat vector is the "Malicious Image Patch" (MIP) or adversarial attacks via visual inputs (e.g., an agent viewing a compromised social media feed) (Source: Anthropic: Sleeper Agents). These attacks can hijack the agent's objective function, leading to unauthorized transactions and loss of funds.

Our solution creates a robust, on-chain "firewall" that verifies every transaction an agent proposes.

The protocol operates on a "Trust but Verify" principle, enforced at the Smart Contract Account (SCA) level. It uses an on-chain registry to instantly block known-malicious transactions (Blacklist). Unknown transactions are automatically reverted by the SCA and escalated to a decentralized network of human reviewers (The Review Oracle).

The system is powered by a crypto-economic model. Users pay a subscription fee for protection, which funds rewards for the reviewers. The integrity of the reviewer network is secured by a "Web of Trust" with a delegated slashing mechanism (simplified for MVP), ensuring all participants are financially incentivized to act honestly. Naughty Agents makes on-chain AI safety a public good, secured by the community, for the community.

How it's Made

Naughty Agents is a decentralized security protocol that acts as an on-chain firewall to prevent hijacked AI agents from draining user funds. We built a full-stack solution with on-chain enforcement at the Smart Contract Account (SCA) level.


Core Architecture & Tech

Our system combines a React frontend, Solidity smart contracts, and a Python agent simulator.

  • Frontend: We bootstrapped a React/Vite app using @coinbase/create-cdp-app. User interactions are powered by Viem via the pre-configured CDP hooks, connecting directly to our smart contracts.

  • On-Chain Logic (Solidity & Hardhat 3): The protocol's core is built on-chain.

    • Enforcement Layer: A user's UserSCA (Smart Contract Account) has a mandatory SecurityModule hook. This module intercepts every transaction, reverting malicious or unknown ones before they can execute.
    • Protocol Contracts: A WebOfTrust manages reviewer staking, an ActionRegistry stores the on-chain blacklist, and a ReviewOracle queues unknown transactions for human review.
  • Agent Simulator: A Python script using web3.py simulates a hijacked agent proposing a malicious transaction, allowing us to demonstrate the protocol's real-time defense.


Leveraging Partner Technologies

We built our project on the Coinbase CDP stack, which was crucial for rapid development.

  • Coinbase CDP Stack: Embedded Wallets provided seamless email-based onboarding and served as the designated Operator key for the user's SCA. The CDP Hooks and integrated Viem client drastically simplified frontend development.
  • Hardhat 3: This was the backbone for our smart contract development. We used its Viem integration for robust, type-safe testing and Hardhat Ignition for streamlined and repeatable deployments.
  • Base: Our protocol is designed for a low-cost L2 like Base, as the on-chain security checks on every transaction would be too expensive on L1.

Notable Hackathon Hacks

To deliver a working MVP, we made two key simplifications:

  • Simplified Slashing: We implemented an immediate, user-triggered slashing function in the WebOfTrust contract. This demonstrates the economic incentive model (skin in the game) without building a complex, multi-stage arbitration system.
  • Deterministic Action Hashing: We created a unique "fingerprint" for any transaction by calculating keccak256(abi.encode(dest, value, data)). This simple but effective method allowed our ActionRegistry to easily identify and block known-malicious actions.
background image mobile

Join the mailing list

Get the latest news and updates