Hidden or undocumented features like this always have a strange appeal. Part of it is nostalgia for older software where small Easter eggs or experimental features would sometimes ship in production builds.
The concept of long-running background agents sounds appealing, but the real challenge tends to be reliability and task definition rather than raw model capability.
If an agent runs unattended for hours, small errors compound quickly. Even simple misunderstandings about file structure or instructions can derail the whole process.
The idea of exposing a structured crawl endpoint feels like a natural evolution of robots.txt and sitemaps.
If more sites provided explicit machine-readable entry points for crawlers, indexing could become a lot less wasteful. Right now crawlers spend a lot of effort rediscovering the same structure over and over.
It also raises interesting questions about whether sites will eventually provide different views for humans vs. automated agents in a more formalized way.
I expect that if we still used REST indexing would be even less wasteful.
I've found myself falling pretty hard on the side of making APIs work for humans and expecting LLM providers to optimize around that. I don't need an MCP for a CLI tool, for example, I just need a good man page or `--help` documentation.
I know in practice it no longer is the case, if it ever was.
But semantic HTML is exactly that explicit machine-readable entrypoint. I am firmly entrenched in the opinion that HTML, and the DOM is only for machines to read, it just happens to be also somewhat understandable to some humans. Take an average webpage, have a look at all characters(bytes) in there: often two third won't ever be shown to humans.
Point being: we don't need to invent something new. We just need to realize we already have it and use it correctly. Other than this requiring better understanding of web tech, it has no downsides.
The low hanging fruit being the frameworks out there that should really do a better job of leveraging semantics in their output.
The only ones benefitting from 'wastefull' crawling are the anti-bot solution vendors. Everyone else is incentivized to crawl as efficiently as possible.
I yearn for the days when a single kb get was enough. Now it's endless wastage spawning entire browsers larger than operating systems with mitigations, hacks and proxies. Requesting access directly from webmasters is only met with silence. All of my once simple, hobbyist programs are now bloated beyond belief and less reliable than ever
> It also raises interesting questions about whether sites will eventually provide different views for humans vs. automated agents in a more formalized way.
This question raises an interesting question about if this would exacerbate supply chain injection attacks. Show the innocuous page to the human, another to the bot.
With google covering only 3% I wonder how much people still care and if they should. Funny: I own and know sites that are by far the best resource on the topic but shouldn't have so many links google says. It's like I ask you for a page about cuban chains then you say you don't have it because they had to many links. Or your greengrocer suddenly doesn't have apples because his supplier now offers more than 5 different kinds so he will never buy there again.
One of the interesting things about Unicode is how many symbols exist that almost no one encounters in normal software.
Every once in a while you run into something like this and realize the standard is not just for text encoding but also a kind of archive of specialized notation from different fields.
It makes you wonder how many other symbols are sitting in the table that are still mostly unknown outside the niche communities that originally needed them.
Oh shit I didn't know! Amazing! Is there a COMBINING ENCLOSING BUTTON too? We'd also need SHOULDER, SHOULDER BIG and THUMBSTICK, then we'd have something.
Most games aren't shipping with full-fat unicode support or typefaces that could display those icons, though. Plus it'd start to break down with controllers that aren't simple A/B/X/Y.
By "game tutorials", I think they mean modern successors to the role GameFAQs used to play.
There is a combining character that, by its description, sounds like it should be implemented to do the desired thing (U+20DD Combining Enclosing Circle), but my fonts don't render it very well when I stuff geometric characters matching the PlayStation buttons into it.
Without spaces:
△⃝□⃝×⃝○⃝
With two spaces between each one so you can see how "enclosing" is getting interpreted:
△⃝ □⃝ ×⃝ ○⃝
For the Markdown renderer I'm working on to replace WordPress for my blog, I resorted to shortcodes which resolve to CSS styling the `<kbd>` tag with `title` attributes to clarify and the occasional bit of inline SVG for things where I didn't want to specify a fixed font to get sufficient consistency, like PlayStation button glyphs.
(In all fairness, it's a nerd-snipe made based on the idea that I'll be more willing to blog about things I have nice tools for. I don't currently typeset button presses in any form.)
*nod* As-is, we're stuck with hacks like custom shortcodes and emoji.
...though, given the inconsistent naming of consistently laid-out buttons, I think anything that makes its way into Unicode should include something that follows the lead of what Batocera Linux does on their Wiki and with custom emojis in their Discord.
See https://wiki.batocera.org/configure_a_controller for an example of how they look inline but the gist is that it's an outline of the SNES-originated diamond of action buttons that pretty much everyone but Nintendo uses these days and which is embodied in XInput and the SDL Gamepad API, with one of the circles filled in to represent the button in question.
With more and more players expecting emoji support in text entry boxes, more games are starting to ship with full unicode support. Also I've noticed optional ligatures that OpenType supports have become a big style thing in certain game genres/by certain studios. Harfbuzz's "Old MIT" license shows up more often in the credits of games and game engines.
I don't know if that's a good reason to push to standardize controller glyphs to Unicode, though.
(ETA: Plus the other obvious reason more games and game engines are bringing in full Unicode support is localization, especially for Arabic and CJK. We're a bit past the point where AAA games only feel a need to support EFIGS. The Middle East and Asia are huge markets, especially for mobile games.)
Given it’s a table, one would be able to iterate over each, “be wrong on the Internet” about the character and wait for said niche communities to swoop in to make a correction.
It's nearly impossible to know or to implement all utf-8/16 as beside of UTF support you need also to provide fonts for each. Thousands of scalable fonts - takes a lot of memory. That's why using such characters is risky as somewhere on the path such font will be displayed aa trash. (logs to email to presentation to word to excel to csv to database for example)
For years Ł support on Python on windows for example broke sometimes when imported from poor quality Excel files haha
Normally there's a single "font of last resort" that's used for particularly obscure characters. Although even those don't cover everything, the extended Egyptian hieroglyphs don't display for me, for example https://en.wikipedia.org/wiki/Egyptian_Hieroglyphs_Extended-...
a single font can contain a maximum of about 65000 glyphs, but there are over 150000 defined Unicode glyphs, so a single font of last resort isn’t possible, unfortunately. Complete coverage would require multiple fonts.
I've never looked into this in detail before, you're right, it looks like android has over 100. Although composite font representation is supposed to fix this.
reply