Model Rankings

AI models ranked

Curated rankings of frontier and open-source AI models for developers. Scores are calibrated against the current frontier so older generations do not look artificially competitive.

55 models - page 1 of 5

🥇

GPT-5.5

âš¡ Top PickTextCode
OpenAI1M ctx

OpenAI's newest flagship model for agentic coding, professional knowledge work, data analysis, computer use, and long-running tool workflows. It is positioned as a step up from GPT-5.4 with stronger system understanding, better debugging behavior, and API availability for production developers.

Frontier-relative96
Coding
94
Reasoning
94
Instruction
92
Speed
64
Cost eff.
48
Cost$5 in / $30 out per 1M
Best for
Agentic codingComputer useTool workflowsProfessional work
🥈

Claude Opus 4.8

💻 Best CodingTextCodeImage
Anthropic1M ctx

Anthropic's current Opus-tier model for complex reasoning, agentic coding, and high-autonomy workflows. It keeps the strong Claude coding profile with a 1M-token context window and a lower price than Fable 5.

Frontier-relative96
Coding
97
Reasoning
95
Instruction
94
Speed
54
Cost eff.
44
Cost$5 in / $25 out per 1M
Best for
Agentic codingHigh autonomy1M contextClaude Code
🥉

Claude Fable 5

âš¡ Top PickTextCodeImage
Anthropic1M ctx

Anthropic's most capable widely released Claude model, built for demanding reasoning and long-horizon agentic work. Use it when autonomy, context depth, and complex multi-step execution matter more than low latency.

Frontier-relative95
Coding
96
Reasoning
96
Instruction
94
Speed
50
Cost eff.
34
Cost$10 in / $50 out per 1M
Best for
Long-horizon agentsComplex reasoningHigh autonomy1M context
#4

Gemini 3.5 Flash

TextCodeImage
Google1.048576M ctx

Google's stable Gemini 3.5 production model for sustained frontier performance with strong coding, agentic loops, grounding, tool use, and multimodal inputs. It balances capability, 1M-token context, and production cost better than older Gemini 2.5 entries.

Frontier-relative95
Coding
89
Reasoning
90
Instruction
88
Speed
78
Cost eff.
68
Cost$1.5 in / $9 out per 1M
Best for
Agent loops1M contextGroundingMultimodal input
#5

Grok 4.3

TextCodeImage
xAI1M ctx

xAI's newest flagship chat model with configurable reasoning, agentic tool calling, low hallucination positioning, image input, and a 1M-token context window. Use server-side search tools when current events or live data matter.

Frontier-relative95
Coding
88
Reasoning
90
Instruction
84
Speed
82
Cost eff.
78
Cost$1.25 in / $2.5 out per 1M
Best for
Agentic toolsReasoning control1M contextSearch workflows
#6

DeepSeek V4 Pro

🔓 Open SourceTextCode
DeepSeek1M ctx

DeepSeek's V4 Pro open-weight model for agentic coding, math, STEM reasoning, and 1M-context workflows. It supports OpenAI-compatible and Anthropic-compatible APIs with thinking and non-thinking modes.

Frontier-relative92
Coding
90
Reasoning
91
Instruction
83
Speed
64
Cost eff.
96
Cost$0.435 in / $0.87 out per 1M
Best for
Open weights1M contextThinking modeLow API cost
#7

Gemini 3.1 Pro Preview

TextCodeImage
Google1.048576M ctx

Google's preview Pro model optimized for software engineering behavior, tool use, agentic workflows, stronger thinking, and grounded multimodal reasoning. Use it for evaluation and advanced workflows where preview volatility is acceptable.

Frontier-relative91
Coding
86
Reasoning
91
Instruction
86
Speed
58
Cost eff.
56
Cost$2 in / $12 out per 1M
Best for
Vibe codingTool useGrounded reasoning1M context
#8

GPT-5.5 Pro

🧠 Best ReasoningTextCode
OpenAI1M ctx

Higher-accuracy GPT-5.5 variant for the hardest reasoning, science, finance, research, and verification-heavy workflows. It costs substantially more than the base GPT-5.5 model, so reserve it for tasks where confidence and depth matter more than throughput.

Frontier-relative90
Coding
92
Reasoning
97
Instruction
92
Speed
42
Cost eff.
12
Cost$30 in / $180 out per 1M
Best for
Hard reasoningScienceResearchHigh-confidence analysis
#9

Claude Sonnet 4.6

âš¡ Top PickTextCode
Anthropic1M ctx

Anthropic's current frontier model. State-of-the-art on SWE-bench, best-in-class instruction following, and extended thinking built-in. The go-to for agentic coding workflows.

Frontier-relative90
Coding
84
Reasoning
84
Instruction
90
Speed
68
Cost eff.
62
Cost$3 in / $15 out per 1M
Best for
Agentic codingSWE-bench leaderExtended thinkingInstruction accuracy
#10

GPT-5.3 Codex

💻 Best CodingCodeText
OpenAI400K ctx

OpenAI's Codex-specialized model for long-running software engineering, terminal work, refactors, frontend implementation, and defensive security review. It is currently available across paid Codex surfaces while OpenAI works toward API access.

Frontier-relative89
Coding
90
Reasoning
91
Instruction
88
Speed
70
Cost eff.
40
Cost$5 in / $30 out per 1M
Best for
Codex agent workRefactorsTerminal tasksSecurity review
#11

Mistral Medium 3.5

Open SourceTextCodeImage
Mistral256K ctx

Mistral's frontier-class open-weight multimodal model for agentic and coding use cases. It brings function calling, agents, built-in tools, structured outputs, OCR, FIM, and a 256K context window under a Modified MIT license.

Frontier-relative88
Coding
86
Reasoning
85
Instruction
84
Speed
74
Cost eff.
70
Cost$1.5 in / $7.5 out per 1M
Best for
Open weightsAgentic codingOCRStructured outputs
#12

Grok Build 0.1

CodeText
xAI256K ctx

xAI's coding-specific model trained for fast agentic coding workflows. It trades some flagship breadth for lower latency, lower price, and a developer-focused 256K context window.

Frontier-relative87
Coding
86
Reasoning
80
Instruction
82
Speed
88
Cost eff.
82
Cost$1 in / $2 out per 1M
Best for
Fast codingAgent workflows256K contextLow cost