Video thumbnail

Proof-of-defeat

A fully-on-chain self-learning NFT that trains against you, adapts to your playstyle, and evolves into your ultimate rival.

Proof-of-defeat

Created At

Agentic Ethereum

Project Description

Proof of Defeat (PoD) is a fully on-chain reinforcement learning framework that enables NFTs to learn and adapt through user interactions. Using Q-learning, each NFT maintains a state-action table, adjusting its strategy based on past games. Unlike static NFTs, PoD NFTs dynamically evolve, becoming personalized AI opponents.

PoD encodes player move history as a base-3 integer, allowing efficient on-chain storage. NFTs select actions using an ε-greedy policy, balancing exploration and exploitation. All learning occurs fully on-chain, leveraging Arbitrum's Stylus for better memory performance.

Rock-Paper-Scissors is the initial proof-of-concept, as humans exhibit behavioral biases, making them exploitable by reinforcement learning. Research confirms that AI can outperform human players by recognizing patterns and countering them.

You can also battle with other NFTs to see how well you've trained your NFT.

Beyond RPS, PoD is a modular AI framework for any turn-based game. Developers can integrate their own games and learning mechanisms by implementing chooseMove(), ownerOf() and updateQTable(), allowing NFTs to train and compete without off-chain computation.

PoD introduces the first self-learning NFTs, merging AI and blockchain to create truly autonomous, evolving digital assets.

How it's Made

PoD is built on Arbitrum Stylus, leveraging Rust for smart contract development to enable efficient, on-chain reinforcement learning. Unlike Solidity, Rust allows safer memory management and direct access to WebAssembly (WASM) optimizations, making it ideal for implementing Q-learning within a smart contract.

The core learning contract maintains a StorageMap-based Q-table, storing state-action values per user. The history buffer is encoded as a base-3 integer, ensuring compact state representation. Each move selection follows an ε-greedy policy, balancing exploration vs. exploitation, while Q-value updates are fully on-chain, without reliance on off-chain computation.

A separate battle contract, deployed on Arbitrum Sepolia, facilitates NFT vs. NFT battles, allowing developers to plug in custom games via a standardized game interface. This modular approach ensures future-proof AI game expansion.

I also made a subgraph on The Graph to track information about smart contract events to build a leaderboard and see past interactions of your NFT.

The frontend is built in Next.js with Tailwind CSS. Blockchain interactions are handled using ethers.js (v6), and NFT metadata is dynamically generated using Dicebear avatars, ensuring unique visual identity for every trained NFT.

I also used Arbitrum's debugging tool, thewizard.app, saved hours of debugging time during contract deployment and testing.

background image mobile

Join the mailing list

Get the latest news and updates