The idea is that NPUs are more power efficient for convolutional neural network operations. I don't know whether they actually are more power efficent, but it'd be wrong to dismiss them just because they don't unlock new capabilties or perform well for very large models. For smaller ML applications like blurring backgrounds, object detection, or OCR, they could be beneficial for battery life.
Yes, the idea before the whole shove LLMs into everything era was that small, dedicated models for different tasks would be integrated into both the OS and applications.
If you're using a recent phone with a camera, it's likely using ML models that may or may not be using AI accelerators/NPUs on the device itself. The small models are there, though.
Same thing with translation, subtitles, etc. All small local models doing specialized tasks well.
OCR on smartphones is a clear winner in this area. Stepping back, it's just mind blowing how easy it is to take a picture of text and then select it and copy and paste it into whatever. And I totally just take it for granted.
Not sure about all NPUs, but TPUs like Google's Coral accelerator are absolutely, massively more efficient per watt than a GPU, at least for things like image processing.
Even with modern digital codecs and streaming, there's usually chroma subsampling[1], so the color channels may have non-square "pixels" even if overall pixels are nominally square. I most often see 4:2:0 subsampling, which still has square pixels, but at half resolution in each dimension. However 4:2:2 is also fairly common, and it has half resolution in only one dimension, so the pixels are 2:1. You'd have trouble getting a video decoding library to mess this up though.
> This metalanguage must have some kind of constructs to describe unknown things, or things that are deliberately simplified in favor of exposition.
Perhaps you're thinking of mathematics.
If you have to be able to represent arbitrary abstract logical constructs, I don't think you can formalized the whole language ahead of time. I think the best you can do is allow for ad-hoc formalization of notation while trying to keep any newly introduced notation reasonably consitent with previously introduced notation.
I largely agree with this, but at the same time, I empathize with the FA's author. I think it's because LLMs feel categorically different from other technological leaps I've been exited about.
The recent results in LLMs and diffusion models are undeniably, incredibly impressive, even if they're not to the point of being universally useful for real work. However they fill me with a feeling of supreme dissapointment, because each is just this big black box we shoved an unreasonable amount of data into and now the black box is the best image processing/natural language processing system we've ever made, and depending on how you look at it, they're either so unimaginably complex that we'll never understand how they really work, or they're so brain-dead simple that there's nothing to really understand at all. It's like some cruel joke the universe decided to play on people who like to think hard and understand the systems around them.
Agree totally. Reminiscent of the Paul Erdös reaction to the proof of the Four Colour Problem.
It's been quite good reading these comments because a lot of them have put into words my own largely negative feelings about the AI ubiquitous hype, which I have found it hard to articulate. Your second paragraph, and someone else's comment about how they are attracted to computer science because they like fiddly detail and so are uninterested in a machine hiding all that, and a third comment about how so-called "busy work" is actually a good way of padding out difficult stuff and so a job of work becomes much less palatable when it is excised entirely.
The other thing I find deeply depressing is the degree to which people are thrilled (genuinely) by dreadful looking AI art and unbearable to read AI prose. Makes me think I've been kidding myself for years that people by and large have a degree of taste. Then again maybe it just means it's not to my taste..
But think about it: if digital painting were solved not by a machine learning model, but human-readable code, it would be an even more bleak and cruel joke, isn't it?
Interesting that people seem to have this assumption.
"The lesson is considered "bitter" because it is less anthropocentric than many researchers expected and so they have been slow to accept it."
I mean we are so many people on the planet, its easy to feel useless when you know you can get replaced by millions of other humans. How is that different being replaced by a computer?
I was not sure how AGI would come to us, but I assumed there will be AGI in the future.
Weirdest thing for me is mathematics and physics: I assumed that would be such an easy field to find something 'new' through brute force alone, im more shocked that this is only happening now.
I realized with DeepMind and Alphafold that the smartest people with the best tools are in the industry and specificly in the it industry because they are a lot better using tools to help them than normal researchers who struggle writing code.
Ditto. It's also significantly lighter weight than competing readers (at least when I bought mine), has physical buttons, has color models, and has really good battery life possibly because it runs a custom Linux instead of Android.
Like the author, I've also found myself wanting to recover an accidentally deleted file. Luckily, some git operations, like `git add` and `git stash`, store files in the repo, even if they're not ultimately committed. Eventually, those files will be garbage collected, but they can stick around for some time.
Git doesn't expose tools to easily search for these files, but I was able to recover the file I deleted by using libgit2 to enumerate all the blobs in the repo, search them for a known string, and dump the contents of matching blobs.
I'm not the author, but these video-in-game projects typically work with a few phases:
1. Get the game into a specific state by performing specific actions, moving to specific positions, performing specific inputs, etc. so that a portion of the game state in RAM happens to be an executable program.
2. Jump to that executable code such as by corrupting the return address in the stack with a buffer overflow
3. (optional) The program from 1 may be a simple "bootstrap" program which lets the player directly write a new, larger program using controller inputs then jumps to the new program.
4. The program reads the video and audio from the stream of controller inputs, decodes them, and displays them. The encoding is usually an ad-hoc scheme designed to take advantage of the available hardware. The stream of replayed inputs is computed directly from the media files.
Specifically, this TAS abuses the fact that SMB doesn't clear RAM on bootup to use SMB3 to write $16 to the "continue world" RAM location, and then hotswaps to SMB1 to start the game in World N-1 which makes the rest of the TAS possible. If you download the TAS and use it with SMB1 on an emulator, the included base savestate will already have $16 in that RAM location, for convenience. The main HN link for this submission has the full technical writeup.