benchmarks
5 articles tagged with "benchmarks"

Claude 3.5 Sonnet vs Claude 3 Opus: Which Should You Use?
Claude 3.5 Sonnet outperforms Claude 3 Opus on most benchmarks while running twice as fast. Here's the detailed comparison and when each model makes sense.

Claude 3.5 Sonnet Released: Faster Than Opus with Better Coding
Anthropic launches Claude 3.5 Sonnet with performance exceeding Claude 3 Opus at twice the speed. Major improvements in coding, visual analysis, and instruction following.

How Claude 3 Improved Code Generation Accuracy
Claude 3 Opus scores 84.9% on HumanEval coding benchmarks, up from 56% in Claude 2. Here's what changed and what it means for real development work.

Claude 3 Opus Tops GPT-4 on Industry Benchmarks
Claude 3 Opus outperforms GPT-4 on MMLU, GPQA, and coding benchmarks. Here's what the numbers mean for real-world business applications.

Claude 3 Family Released: Opus, Sonnet, and Haiku Explained
Anthropic launches Claude 3 family with three models: Opus (most capable), Sonnet (balanced), and Haiku (fast). Vision capabilities included.