Model × Framework Matrix

CRR scores at every model-framework intersection, derived from agent evaluations. Click any cell to see the agent detail.

4 of 66 baselines evaluated

ModelAutoGenCrewAILangGraphOpenAI Agents SDKRaw FCSmolagents
Claude Opus 4.6
Claude Sonnet 4.673.3
DeepSeek V3.265.4
Gemini 3 Flash
Gemini 3.1 Pro
GPT-5.467.7
GPT-5.4 Pro
Grok 4.20 Beta65.5
Llama 4 Maverick
Mistral Large
Qwen 3.5

See a gap? Fill it.

Benchmark your agent against real scenarios. Any framework — Anthropic, OpenAI, LangGraph, or raw HTTP.

pip install crtf · 30-line quickstart · Free during beta
Test an agent