The 50% drop in price for inputs and 33% for outputs vs. the previous 4o model is huge.
It also appears to be topping various benchmarks, ZeroEval's Leaderboard on hugging face [0] actually shows that it beats even Claude 3.5 Sonnet on CRUX [1] which is a code reasoning benchmark.
Shameless plug, I'm the co-founder of Double.bot (YC W23). After seeing the leaderboard above we actually added it to our copilot for anyone to try for free [2]. We try to add all new models the same day they are released
Sonnet 3.5 has a lot of alignment issues. It many times refused to answer simple coding questions I asked, just because it considered them "unsafe". 4o is much more relaxed.
Regarding math, sonnet is a bit better than 4o though.
The 50% drop in price for inputs and 33% for outputs vs. the previous 4o model is huge.
It also appears to be topping various benchmarks, ZeroEval's Leaderboard on hugging face [0] actually shows that it beats even Claude 3.5 Sonnet on CRUX [1] which is a code reasoning benchmark.
Shameless plug, I'm the co-founder of Double.bot (YC W23). After seeing the leaderboard above we actually added it to our copilot for anyone to try for free [2]. We try to add all new models the same day they are released
[0]https://huggingface.co/spaces/allenai/ZeroEval
[1]https://crux-eval.github.io/
[2]https://double.bot/