it has been pretty much a benchmark for memorization for a while. there is a paper on the subject somewhere.
swe bench pro public is newer, but its not live, so it will get slowly memorized as well. the private dataset is more interesting, as are the results there:
> If your job is to translate requirements into code manually - and that's it - you're the generalist travel agent.
I’ve been a full-stack web programmer at five different companies over the last fifteen years, big and small, e-commerce and B2B, junior to senior to staff, and that has never fully described my responsibilities.
I'm also curious what results we would get if SWE came up with a new set of 500 problems to run all these models against, to guard against overfitting.
Won’t those models gradually become outdated (for anything related to events that happen after the model was trained, new code languages or framework versions, etc) if no one is around to continually re-train them?
Let's hope the people not subject to a warrant sue ICE's pants off. As far as I can tell, most of the dragnets are either in public places or with the permission of the property owner.
I'd love to be wrong because it means the judciary has a chance to shut this down but I fear outside of a few civil rights suits this will have to be remedied at the ballot box.
There is a hard limit on the number of atomic elements, and an even smaller limit on the number of soluble compounds that facilitate chemical reactions, and water is demonstrably both the best and the most common in the universe.
So while it may be possible for life to exist without water, any alternatives should be reasonably expected to be even more rare than water-based life