I have been thinking about the parallels between early compilers, and LLMs for a while now. I know I'm not the only one... But I hadn't see a retrospective on how predictions and outcomes played out last time around. This post is the outcome of my own digging.
Thank you! On this note, I am curious if the content could be hosted and integrated with an existing app from the [fediverse](https://fediverse.party/en/miscellaneous/). Need to do some research.
This is a really cool tool. Have been playing with similar scraping capabilities, so appreciate you sharing the source code as well. People who are saying "loads of scraping tools already exist" have likely not suffered through the current state of the art too, as heuristic based approaches absolutely pale in comparison to what an LLM can extract.
He's my ancestor! He hovers just on the edge of obscurity as a historical figure, and most people have never heard of him, so it's cool to see a summary like this out in the wild. My grandfather still shares exactly the same name too, though he's the last of the juniors.
He is known by all members of the Chilean Navy [1], he is not obscure in this part of the world. By the way, I am chilean and lived most of my childhood just a block away from Lord Cochrane street [0], in downtown Santiago, which is one the busiest places in the whole country (and those adjacent blocks have seen some history).
In case you're interested, here is a photo I took in the Museo Marítimo Nacional, Valparaíso, Chile - a full-length stained glass portrait of him: https://i.imgur.com/XSr6wwG.jpg
Don't forget that he indirectly and in hindsight contributed to the independence of Peru as well. Most of the fighting of course did not happen on the seas, but just as air superiority matters a great deal, naval superiority in the 1820s mattered as well, and the upper hand he gave to the expedition from Chile/Argentina made it a war that was effectively centered on the land war aspect and led to in part and a few years down the line, Peru's independence from Spain as well.
Somewhat ironically, the Irish diaspora (the Flight of the Wild Geese, which happened in 1691 and scattered the Irish Catholic military aristoracy across Catholic Europe and their descendants, just Europe in general, likely had a larger cumulative role in effectuating much of the independence movements. After all, O'Higgins is not exactly a native Spanish surname. Informally the scions of Irish Catholic families of importance, with their paths of advancing in Britain blocked, filtered out across the continent for the next hundred years or so and ended up having certainly an outsized influence on the makeup of military administration and colonial administration by way of the makeup of the officer corps of the nations where they ended up settling, stretching from Spain to France to Austria to Russia. They were not exactly mercenaries but really a true diaspora before the concept really solidified in today's terms. One can argue that it was certainly an early and self-inflicted 'brain-drain' through policies instituted by the post-Williamite British state that may have had longer term consequences (the loss of Minorca and the loss of America came in fairly rapid succession, for example) to the British and the distinct and separate traditions allowing them to be less attached to a sovereign but a cause. Much more on this needs studying, but what is certain is that policies made out of fear undermined the British and aided continental powers for generations. Look at the names etched on the Arc de Triomphe and you'll notice how many of the names are distinctly not French, including that of Dillion (Arthur Dillion, related to the Viscount Dillion, who also fought for the American side during the revolution), Clarke (whose father served in the Dillion Regiment), MacDonald (whose ties to Flora Macdonald during the Jacobite Rising precipitated his family's exile, but nevertheless served as officers first under Dillon before later siding with Napoleon and taking independent command, primarily in Switzerland) are just some of the prominent names inscribed. Sectarianism's long tail etched in stone right there.
And the main purpose of the data structure is to recover term frequencies and document frequencies. We also store positional information to allow phrase matching.
BM25 of course is just one such way of using these stats. But you can also get raw termfreqs and docfreqs of matching terms and do whatever you want with them mathematically :).
The BM25 here tries to align to Lucenes internal BM25 calculation.