LLMBench

Verifiable on-chain Large Language Models drift benchmarking

Live Demo Source Code

LLMBench

Created At

ETHGlobal Paris

Project Description

This project was inspired by the following paper: https://arxiv.org/abs/2307.09009

It detailed how ChatGPT's behavior and performance have been changing over time. That makes it difficult for companies to integrate the models into their pipeline, considering the unpredictability of these changes. The researchers thus developed a set of benchmarks that they ran on two snapshots of OpenAI's models. In this project, this benchmarking process is made recurrent and the results are stored on-chain for immutability and transparency purposes.

How it's Made

The project is split into three modules:

Front-end: one-page application made with Vue.js 3 with GPT-3.5 Turbo and GPT-4 benchmarks
Smart Contract: benchmark storing contract deployed on Gnosis
LLMDrift Scripts: scripts meant for bacalhau, running the LLMDrift benchmarks on the gpt-3.5-turbo and gpt-4 current models, and writing the result on the Gnosis chain. These scripts were based on the "lchen001/LLMDrift" repo, developed by the researchers of the aforementioned paper.

LLMBench

LLMBench

Created At

Project Description

How it's Made

Join the mailing list

Get the latest
news and updates

LLMBench

Created At

Project Description

How it's Made

Join the mailing list

Get the latest news and updates

Get the latest
news and updates