LLM Comparison (Apr 2026)

⚓ LLM    📅 2026-04-21    👤 Pragmatismo    👁️ 5      

Pragmatismo

Warning

This post was published 49 days ago. The information described in this article may have changed.
GLM-5.1GLM-5Qwen3.6-PlusMinimax M2.7DeepSeek-V3.2Kimi K2.5Claude Opus 4.6Gemini 3.1 ProGPT-5.4
HLE31.030.528.828.025.131.536.745.039.8
HLE (w/ Tools)52.350.450.6-40.851.853.1*51.4*52.1*
AIME 202695.395.495.189.895.194.595.698.298.7
HMMT Nov. 202594.096.994.681.090.291.196.394.895.8
HMMT Feb. 202682.682.887.872.779.981.384.387.391.8
IMOAnswerBench83.882.583.866.378.381.875.381.091.4
GPQA-Diamond86.286.090.487.082.487.691.394.392.0
SWE-Bench Pro58.455.156.656.2-53.857.354.257.7
NL2Repo42.735.937.939.8-32.049.833.441.3
Terminal-Bench 2.0 (Terminus-2)63.556.261.6-39.350.865.468.5-
Terminal-Bench 2.0 (Best self-reported)69.0 (Claude Code)56.2 (Claude Code)-57.0 (Claude Code)46.4 (Claude Code)---75.1 (Codex)
CyberGym68.748.3--17.341.366.638.866.3
BrowseComp68.062.0--51.460.6---
BrowseComp (w/ Context Manage)79.375.9--67.674.984.085.982.7
τ³-Bench70.669.270.767.669.266.072.467.172.9
MCP-Atlas (Public Set)71.869.274.148.862.263.873.869.267.2
Tool-Decathlon40.738.039.846.335.227.847.248.854.6
Vending Bench 2$5,634.41$4,432.12$5,114.87-$1,034.00$1,198.46$8,017.59$911.21$6,144.18
🏷️ benchmark 🏷️ comparison 🏷️ llm