Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Codex Daily Benchmarks for Degradation Tracking (Marginlab.ai) (marginlab.ai)
1 point by wendgeabos 29 days ago | past
Claude Code daily benchmarks for degradation tracking (marginlab.ai)
760 points by qwesr123 29 days ago | past | 355 comments
No one is evaluating AI coding agents in the way they are used (marginlab.ai)
1 point by qwesr123 45 days ago | past
Claude Code Daily Degradation Tracker (marginlab.ai)
3 points by qwesr123 49 days ago | past | 3 comments
Anatomy of a Coding Agent: A step-by-step illustration (marginlab.ai)
3 points by qwesr123 67 days ago | past
How are coding assistants evaluated? SWE-Bench Pro Explorer (marginlab.ai)
2 points by qwesr123 70 days ago | past
SWE-Bench: The $500B Benchmark (marginlab.ai)
5 points by qwesr123 71 days ago | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: