More

rdos · 2026-04-15T15:17:44 1776266264

The Tools detail page is wrong, it's the one from last release, November 2025.

rdos · 2026-04-15T14:09:34 1776262174

almost fell for it

heyitsmedotjayb · 2026-04-15T14:44:58 1776264298

when you put this stuff in perspective the lies really start to fall through

rdos · 2026-04-15T13:38:37 1776260317

I can't seem to change the colors of the pie chart, other than the predefined themes. But all of those are horrible for a pie chart.

jsmith45 · 2026-04-15T15:02:53 1776265373

Yeah, as far as I know, you need to define a customized theme to customize pie chart colors. You can prepend the chart with initialization logic like:

%%{init: {"theme": "base", "themeVariables": { "pie1": "#FF5733", "pie2": "#33FF57", "pie3": "#3357FF", "pieStrokeColor": "#000000", "pieStrokeWidth": 3, "pieOpacity": 0.8 }}}%%

This looks like it works on this site too.

pjot · 2026-04-15T15:20:08 1776266408

To be fair, pie charts are horrible in general.

rdos · 2026-04-09T14:55:42 1775746542

> This bug is categorically distinct from hallucinations or missing permission boundaries

I was expecting some kind of explanation for this

esafak · 2026-04-09T15:31:09 1775748669

Unless it is a bug in CC, which is likely as not, the LLM is failing to keep the story straight. A human could do the same; who said what?

rdos · 2026-04-02T10:47:52 1775126872

Was any text in the repo NOT written by AI?

Edward9055 · 2026-04-03T00:36:20 1775176580

I used AI tools during development, same as most people writing code right now. The research direction, experiments, and conclusions are mine I read the papers, designed the experiments, ran them, and documented where things broke. The repo includes 60+ experiment iterations, result logs showing failures, and documentation that corrects earlier optimistic claims. That's not a pattern you'd get from prompting a model to generate a project. I'm one person, so yes, AI helped with implementation. The research was mine.

rdos · 2026-03-26T10:46:06 1774521966

14B even at Q4 isn't realistic for coding on a single 12GB RTX 3060. Token speed is too slow. After all they are dense models. You aren't getting a good MoE model under 30B. You can do OCR, STT, TTS really well and for LLMs, good use cases are classification, summarization and extraction with <10B models.

suprjami · 2026-03-26T11:17:27 1774523847

Dual 3060s run 24B Q6 and 32B Q4 at ~15 tok/sec. That's fast enough to be usable.

Add a third one and you can run Qwen 3.5 27B Q6 with 128k ctx. For less than the price of a 3090.

rdos · 2026-04-08T09:22:39 1775640159

Sure, two 3060 can pull usable performance on an usable LLM, but a single one can't (yet).

> 3x RTX 3060 less tgab the price of a 3090

Interesting, here it is around the same. 200-250€ for a used 12GB 3060 and 600-800 for a used 3090€.

rdos · 2026-03-17T14:56:19 1773759379

> llama.cpp (previously Ollama)

I almost fainted

rdos · 2026-02-11T16:11:28 1770826288

Is it possible for such a small model to outperform gemini 3 or is this a case of benchmarks not showing the reality? I would love to be hopeful, but so far an open source model was never better than a closed one even when benchmarks were showing that.

amluto · 2026-02-11T16:15:07 1770826507

Off the top of my head: for a lot of OCR tasks, it’s kind of worse for the model to be smart. I don’t want my OCR to make stuff up or answer questions — I want to to recognize what is actually on the page.

retrac · 2026-02-11T21:11:17 1770844277

Sometimes what is on the page is ambiguous. Imagine a scan where the dot over the i is missing in a word like "this". What's on the page is "thls" but to transcribe it that way would be an error outside of forensic contexts.

I am reminded it's basically impossible to read cursive writing in a language you don't know even if it's the same alphabet.

vintermann · 2026-02-12T08:50:21 1770886221

Yes, but that's context specific. If your goal with OCR to make text indexable and searchable with regular text search, then transcribing "lesser" as "lesfer" is bad. And handwriting can often be so bad that you need context to make the call about what the scribbles actually are trying to say.

Evaluation methods, too, are bad because they don't think critically about what the downstream task is. Word Error Rate and Character Error Rate are terrible metrics for most historical HTR, yet they're what people use because of habit.

It's a bit like how for a long time BLEU was the metric for translation quality. BLEU is based on N-gram similarity to a reference translation, so naturally translation methods based on and targeting N-gram similarity (e.g. pre NN Google translate) did well, and looked much better than they actually were.

rdos · 2026-02-11T16:26:02 1770827162

Interesting. Won't stuff like entity extraction suffer? Especially in multilingual use cases. My worry is that a smaller model might not realize some text is actually a persons name because it is very unusual.

kergonath · 2026-02-11T17:57:59 1770832679

The model does not need to be that smart to understand that a name it does not know that starts with a capital letter is a the name of a place or a person. It does not need to be aware of whom this refers to, it just needs to transcribe it.

Also, there are generalist models that have enough of a grasp of a dozen or so languages that fit comfortably in 7B parameters. Like the older Mistral, which had the best multi-lingual support at the time, but newer models around that size are probably good candidates. I am not surprised that a multilingual specialised model can fit in 8B or so.

woeirua · 2026-02-11T17:24:30 1770830670

No. Gemini is clearly the leader across the board: https://www.ocrarena.ai/leaderboard

rdos · 2026-01-09T10:38:49 1767955129

This is very interesting. Especially the last part where it shows gpt-5.2 and gpt-oss and their very similar and unique outcome of being 90%+ Serious.

I tested this locally and got the same result with gpt-oss 120b. But only on the default 'medium' reasoning effort. When I used 'low' I kept getting more playful responses with emojis and when I used 'high' I kept getting more guessing responses.

I had a lot of fun with this and it provided me with more insight than I would have thought.

rdos · 2026-01-07T15:24:26 1767799466

I didn't read the blog yet because I clicked on cat pics and there weren't any!!!