>equally true of human transcription, in which individual words are often [UNINTELLIGEBLE].
ML systems somewhat notoriously do not necessarily make the same sorts of errors that a human would. And I'd expect a large portion of the errors to be transcribing the wrong words rather that indicating that a word couldn't be transcribed. That sort of error means that you can't really get away with manually reviewing just 3% of the audio.
ML tending to make weird mistakes rather than subtle ones that make sense in context like human transcribers is likely to make them easier to spot.
And there are humans in the loop too, and an enormous amount of redundancy in the questions and answer, so even plausible false transcriptions will get picked up on if they matter. Nobody gets sent to jail simply because the transcription process - human or machine - accidentally substitutes "I did it" in place of "I didn't" midway through a two hour interview.
The thing is that 'Likely' is very far away from 'always'.
There is no guarantee the mistake will be easy to spot.
For entertainment purposes AI transcription is awesome.
For serious business applications the ability to recognize mistakes will continue to be a field to which serious attention is given. It would be interesting to see AI processes double check itself, and also run a logic check on whether the transcription makes sense. So that it can report sections flagged as incongruous or of dubious reliability.
+1. There is a widespread "metric fallacy" or "task fallacy" going around. Models of course optimize for metrics, so they tend to perform well on those related metrics.
Humans, however, are not simply metric optimizers. Though it's always in the interest of those corporations producing metric optimizers (i.e. models) to paint humans as such, so their models shine in comparison. They want humans to look like bad machines, so it looks like they should be automated. Not to say they shouldn't in many cases, just that there's a clear one-sidedness in all corporate PR (and funded research, especially that research which is also PR).
All this to say that yes I agree with you. And if we humans don't want our unsustainable economic growth to turn us even more into machines (as our bureaucratic creep has done quite well thus far), we should fight such rhetoric that aims to paint humans simply as machines or task-doers.
ML systems somewhat notoriously do not necessarily make the same sorts of errors that a human would. And I'd expect a large portion of the errors to be transcribing the wrong words rather that indicating that a word couldn't be transcribed. That sort of error means that you can't really get away with manually reviewing just 3% of the audio.