Playground

EARLY ACCESS

Run evaluations interactively. Choose a mode below to test model × framework configurations against real tasks. Sign up for early access to unlock all modes.

Run your own evaluations

Sign in with GitHub to run live evaluations and see execution traces.

CRTF
GitHubTwitter/XAPI Docs