Playground

EARLY ACCESS

Run evaluations interactively. Choose a mode below to test model × framework configurations against real tasks. Sign up for early access to unlock all modes.

Run your own evaluations

Sign in with GitHub to run live evaluations and see execution traces.