More

timshell · 2026-03-29T17:25:54 1774805154

My grad school research was on computational models of human/machine cognition, and I'm now commercializing it as a 'proof-of-human API' for bot detection, spam reduction, and identity verification.

One of the mistakes people assume is that AI capability means humanness. If you know exactly where to look, you can start to identify differences between improving frontier models and human cognition.

One concrete example from a forthcoming blog post of mine:

[begin]

In fact, CAPTCHAs can still be effective if you know where to look.

We ran 75 trials -- 388 total attempts -- benchmarking three frontier AI agents against reCAPTCHA v2 image challenges. We looked across two categories: static, where each image grid is an individual target, and cross-tile challenges, where an object spans multiple tiles.

On static challenges, the agents performed respectably. Claude Sonnet 4.5 solved 47%. Gemini 2.5 Pro: 56%. GPT-5: 23%.

On cross-tile challenges: Claude scored 0%. Gemini: 2%. GPT-5: 1%.

In contrast, humans find cross-tile challenges easier than static ones. If you spot one tile that matches the target, your visual system follows the object into adjacent tiles automatically.

Agents find them nearly impossible. They evaluate each tile independently, produce perfectly rectangular selections, and fail on partial occlusion and boundary-spanning objects. They process the grid as nine separate classification problems. Humans process it as one scene.

The challenges hardest for humans -- ambiguous static grids where the target is small or unclear -- are easiest for agents. The challenges easiest for humans -- follow the object across tiles -- are hardest for agents. The difficulty curves are inverted. Not because agents are dumb, but because the two systems solve the problem with fundamentally different architectures.

Faking an output means producing the right answer. Faking a process means reverse-engineering the computational dynamics of a biological brain and reproducing them in real time. The first problem can be reduced to a machine learning classifier. The second is an unsolved scientific problem.

The standard objection is that any test can be defeated with sufficient incentive. But fraudsters weren't the ones who built the visual neural networks that defeated text CAPTCHAs -- researchers were. And they aren't solving quantum computing to undermine cryptography. The cost of spoofing an iris scan is an engineering problem. The cost of reproducing human cognition is a scientific one. These are not the same category of difficulty.

[end]

ctoth · 2026-03-29T17:38:48 1774805928

How does your software work with blind people like me who use screen readers?

Your key finding is that humans process the grid as one visual scene — but that's a finding about sighted cognition.

Isn't this, like most things, a sensitivity specificity tradeoff?

How many real humans should be blocked from your system to keep the bots out?

What is the Blackstone ratio of accessibility?

gruez · 2026-03-29T17:40:45 1774806045

>The first problem can be reduced to a machine learning classifier. The second is an unsolved scientific problem.

I can't believe people are still using this as a generic anti-AI argument even though a decade ago people were insisting that there's no way AI can have the capabilities that frontier LLMs have today. Moreover it's unclear whether the gap even exists. Even if we take the claim that the grid pattern is some sort of fundamental constraint that AI models can't surpass, it doesn't seem too hard to work around by infilling the grids pattern and presenting the 9 images to LLMs as one image.

braingravy · 2026-03-29T19:48:13 1774813693

> “…reverse-engineering the computational dynamics of a biological brain and reproducing them in real time…”

Is not an anti-AI argument, it’s an open and unsolved question. Your optimism is appreciated, but the dismissal and assumption this is already solved is foolish and naive.

timshell · 2026-01-14T20:38:32 1768423112

https://mayank-agrawal.com/

timshell · 2025-12-17T17:28:19 1765992499

Check out a demo of a similar tool we created (https://model-guessr.com/) that was bot-gated by Roundtable Proof of Human.

Happy to talk more details about PoH (disclaimer: I'm a cofounder and this is my YC S23 company)

reliefcrew · 2025-12-17T17:58:22 1765994302

Can you comment on the notion that Turnstile's primary goal isn't to keep bots out 100% but instead to slow them down to "human" speeds.

Asking because as a dev I hate when sites don't allow bots... however can appreciate that automation should be rate-limited. IOW, isn't preventing bot access actually an anti-pattern since rate-limiting is sufficient?

I see a lot of marketing which bashes Turnstile [detection] rates and tries to leverage this misunderstood nuance. And, it seems to be a dishonest point of contention but am willing to hear opposing arguments.

Thanks.

timshell · 2025-12-17T18:11:28 1765995088

Yup! It depends on your use case.

Cloudflare is really good at network bot detection. Rate-limiting is super helpful here, for example during DDoS attacks.

Our customers are a little different. They sometimes struggle with high-volume bot attacks (e.g. SMS toll fraud in ticketing marketplaces), but we specifically focus on online platforms that want to verify a human is on the other side of the screen. For example, survey pollsters and labor marketplaces want to stop a slow agent that can complete traditional CAPTCHA even if it's solving it a human speed

reliefcrew · 2025-12-17T18:28:02 1765996082

I see. I'll have to read the marketing more closely next time, lol. The cynic in me only notices the detection rate comparisons, which I'm sure the marketing folks don't mind much ;-)

timshell · 2025-12-17T19:46:54 1766000814

https://research.roundtable.ai/bot-benchmarking/ :)

reliefcrew · 2025-12-18T00:21:37 1766017297

> Finally, our evaluation did not involve active adversarial optimization.

Good luck!

timshell · 2025-11-10T21:59:59 1762811999

The 'Process Turing Test' extends the CAPTCHA from 'What would a reasonable person click' to 'How would a reasonable person click'.

For example, hesitation/confusion patterns in CAPTCHAs are different between humans and bots and those can actually be used to validate humans

timshell · 2025-11-10T21:27:01 1762810021

Yeah, we've looked at it in the context of reCAPTCHA v3 and 'invisible behavioral analysis': https://www.youtube.com/watch?v=UeTpCdUc4Ls

It doesn't catch OpenAI even though the mouse/click behavior is clearly pretty botlike. One hypothesis is that Google reCAPTCHA is overindexing on browser patterns rather than behavioral movement

timshell · 2025-11-10T20:59:37 1762808377

One of the writers here. We believe the real Turing Test is whether your AI performs a CAPTCHA like a human would/does.

timshell · 2025-08-29T17:02:47 1756486967

I think about this as a startup founder building a 'proof-of-human' layer on the Internet.

One of the hard parts in this space is what level of transparency should you have. We're advancing the thesis that behavioral biometrics offers robust continuous authentication that helps with bot/human and good/bad, but people are obviously skeptical to trust black-box models for accuracy and/or privacy reasons.

We've defaulted to a lot of transparency in terms of publishing research online (and hopefully in scientific journals), but we've seen the downside: competitors fake claims about their own best in-house behavioral tools that is behind their company walls in addition to investors constantly worried about an arms race.

As someone genuinely interested (and incentivized!) to build a great solution in this space, what are good protocols/examples to follow?

timshell · 2025-08-23T21:14:26 1755983666

Great question! One of the core results of this paper was to explain this discrepancy. Basically, we found a 'mixture of theories' - a hybrid of prospect theory and expected utility theory, where people essentially arbitrate between one of the two decision-making mechanisms depending on the complexity of the gamble.

gsf_emergency_2 · 2025-08-24T01:49:16 1756000156

Curious that you can "mix" PT & EU functionals (with perceptron) but not the corresponding "decision-making mechanisms"..?

(I might have missed an explicit description of these "decision-making mechanisms" in the paper)

>we find that the ... most complex class ... lies outside the simple classes

Another curious statenent

timshell · 2025-08-24T03:27:54 1756006074

> Curious that you can "mix" PT & EU functionals (with perceptron) but not the corresponding "decision-making mechanisms"..?

Great push. We actually can't make any mechanistic claims from the data/math in this paper. From an ML prediction standpoint, we're mixing a PT and EU theory together. But to what extent that is the actual cognitive process we have to remain agnostic about. That being said, a reason this arbitration between EU and PT is intriguing is because there's a lot of work about arbitration between dual process models in psychology (System 1 and 2; model-free and model-based; labor versus leisure; etc.)

timshell · 2025-08-23T21:06:38 1755983198

I'm one of the co-authors of this article.

The TLDR of this paper:

You can generalize theories of decision-making into broad functional forms and then apply gradient descent to find the best parameters for that functional form. For example, prospect theory is multiply a utility weighting function U(x) with a probability weighting function p(x). Kahneman and Tversky proposed one specific set of U(x) and p(x), but we can use autodiff to generate all.

We can apply this method to any functional form.

Happy to answer any questions!

slinkypinky · 2025-08-24T02:00:39 1756000839

Can you explain what a “differentiable” decision theory is? I understand, for instance, maximizing expected value (and taking a derivative to get a maximum), but I don’t understand how the concept of maximizing expected value could itself be made into a derivative.

Edit: Seems like a “differentiable theory” is just one that can be framed in terms of an optimization problem that can be solved by gradient descent. Is that right?

timshell · 2025-07-25T01:07:28 1753405648

I think a common misconception of Moneyball is that it's about analytics. The broader lesson is that people need to systematically evaluate undervalued assets in sports/business etc.

One of the interesting 'post-Moneyball' stories is when old-school scouting methods came back onto the scene. People started overvaluing the new popularized statistics, and the market advantage was to combine the analytics and traditional approach in a cost-efficient manner.

suzzer99 · 2025-07-25T08:06:38 1753430798

The 2014/2015 Royals capitalized on this to some degree, picking up players who didn't strike out or walk much, at a time when players who walked a lot were at a super premium.

Some of the smarter teams in the NFL seem to be figuring out that maybe running backs aren't completely fungible, as has been the mantra for a while.

steveBK123 · 2025-07-25T13:03:43 1753448623

Markets are a decentralized adaptive system, so the overall lesson is to have a process to identify what is under/over valued and adapt over time.

There is no durable thing you can simply identify as your edge in metrics that you can stick to for years.

voidfunc · 2025-07-25T03:22:11 1753413731

Whether someone understands what moneyball is about is a great intelligence litmus test.

chistev · 2025-07-25T05:03:40 1753419820

And let me guess, you pass that test, right?