Playground
EARLY ACCESSRun evaluations interactively. Choose a mode below to test model × framework configurations against real tasks. Sign up for early access to unlock all modes.
Run your own evaluations
Sign in with GitHub to run live evaluations and see execution traces.