If the Claude team care for feedback for the free model.
I'm using the free model via chat from the beginning. This is the first time, I'm seriously considering moving away from Claude. Before last month, Claude's Sonnet model was consistent in quality. But, now the responses are all over the place. It's hard to replicate the issue as it happens once in a while. I rarely encountered hallucinations from Claude models with questions from my domain however since last month I have observed abundance of them.
I work on some aspects of intelligence in birds, primarily in songbirds. There have been some effort finding general intelligence ("g" cognitive factor) in birds since last 15-20 years. The results have been mixed as you would expect. Animals' intelligence have evolved for survival and designing experiments to test those are quite hard.
Research has shown brain size matters but not that much, we should look at relative brain size.
> Animals' intelligence have evolved for survival and designing experiments to test those are quite hard.
My conure is extremely intelligent at times, learning a trick at the second try or doing what I ask him immediately. Most of the time, though, he understands but decides to just ignore me.
I mean regarding the domains of intelligence and how to test them.
With humans, performance in one cognitive test correlates with another and so on, generally. So, intelligence across domains.
Researchers test the same with animals. The issue being animals' intelligence being tied to their ecology. The dilemma being what is it worth for an animal solving a task that has no significance in its life. The other argument being if the animals' intelligence is closer/similar to human intelligence, we will find similar results in both.
That's true. I was much younger back then to notice about privacy.
Yeah, it was pretty bad incorporating G+ account to everything. The way the G+ worked (at least in my friend circle), normal people had less business there. It was very hobby focused.
Nope. Google+ was a ghost town. They made the right call to shut it down and focus their efforts on YouTube.
The videos and comments on YT are superb training data, every bit as good as Google+ was.
In 2025, YouTube’s total revenue (advertising + subscriptions like YouTube Premium and TV) surpassed $60 billion. If they spun out YT it would have a market cap $500-600bn putting it in the top 20 companies.
Google+ would never have been worth much as the 7th most popular social network.
This I find hard to believe. Most YT comments are just noise. Even the UX of writing comments in YT is just terrible. Comments randomly appear and disappear, and you are never sure if it is some yt algorithm, a technical issue or specific moderation practice. I am pretty sure if they valued yt comments as data, they would have put a bit more effort into that side of their platform.
The videos are good training data, but the comments? The comment UX is so non-conducive to discussion, and the general quality is very low compared to what used to be on Google+ (to be fair, the self-selected users of Google+ were not representative of the general population).
Interesting but not surprising to me. Once a field expert guides the models, they most likely will reach a solution. The models are good at lazy work for experts. For hard or complicated questions, many a time the models have blind spots.
An expert trying to find a solution for a problem with no solution may sometimes spend decades with no results
Worse yet, proving there is no solution often requires totally different techniques
There's some problems that are currently in a limbo of sorts. We tried to tackle them, were not successful, and currently we don't know if we just need new math to solve them, or if they can't be solved at all
That would be something. Definitely more exciting. But, from I have seen so far, the models are not there yet.
It's a tricky situation for people who might want to work on hard problems like this. Is it worth spending time and money fiddling around the models?
In research, you can't show your progress by showing how many ways you have failed (which I don't like). The universities, grant agency etc. require you to work on solvable problems.
I don't think so. I went through the output of Opus 4.6 vs GPT 5.4 pro. Both are given different directions/prompts. Opus 4.6 was asked to test and verify many things. Opus 4.6 tried in many different ways and the chain of thoughts are more interesting to me.
How many math PhD students do you have? If you set the problem right, something like this per year on average is a good pace.
How are they cheaper? Your average grant where I am can pay for a couple of PhD students. I could afford to pay for inference costs out of my own salary, no grant needed. Completely different economic scales here. I like students better of course, but funding is drying up these days.
I was saying generally. I don't work in maths. PhD students do lots of other things than research. If we ask a PhD student to just solve these kinds of problems and nothing else, the student would do it without much difficulty.
I guess it's different in somewhere like Europe. But in Canada, most of the PhD students are paid for doing TAships, not primarily through grant. Average salary is 25k/year. Take 6-10k out for tuition, that's 15-19k/year. You get a student doing so many things for less pay. I guess, if your job only requires research then you can do it.
Bayesian methods are not better than frequentist methods and vice versa. I use both, but mostly Bayesian.
Bayesian approaches take a long time thinking, making models, choosing priors, simulations etc. but they provide a better estimate and understanding the parameters. I hate point estimates and decision based arbitrary p-value. Whenever possible I use Bayesian methods.
There are few. I use zoteroGPT to extract things(e.g. methods, sample size, species etc.) from a bunch of papers/collection. I don't use it for summary.
I'm using the free model via chat from the beginning. This is the first time, I'm seriously considering moving away from Claude. Before last month, Claude's Sonnet model was consistent in quality. But, now the responses are all over the place. It's hard to replicate the issue as it happens once in a while. I rarely encountered hallucinations from Claude models with questions from my domain however since last month I have observed abundance of them.
reply