Its a great tool. Unlike vectorDBs alone, Marqo helps the full process that alot of people end up wanting to use vectorDBs for (e.g. have structured data, use LLMs to create embeddings, and perform search/CRUD on embeddings + original data).
Most people, like me, who end up needing to use vector DBs, are wanting to use LLMs on a specific, often private dataset/use case. Typically one starts with something like unstructured JSON data, then need to pick and manage LLMs to create embeddings, then store these and the original JSON data in a vectorDB. Then the application is some variety of CRUD operations + searching over both the original data and the embeddings.
Chroma, Pinecone, I guess FAISS/HNSWlib/etc only handle vector operations. Really what I'd want, which Marqo does, is handle everything end to end.
This is interesting but what problem does it solve better than CTRL+F-ing a transcript? It seems like this would be a worse solution for when the precise way someone says something could be important (ex. journalists parsing an interview, students studying their recorded lectures) and that it would be most useful if you were working with a large volume of recorded audio, such as customer service calls. This makes me somewhat uncomfortable, but perhaps I am not fully understanding how it works.
Being able to handle and ask questions of audio data is a pretty big field. https://www.assemblyai.com/, for example, is a company entirely dedicated to audio intelligence. They have some great example use cases on their page.
Diarization can be done on premise using pyannote (what they use in the article). Huggingface offers a library to run things locally and an API to run things on their cloud. Pyannote is available under an MIT licence
vosk is really good, but also a good example of an open source project with great potential, but doesn't scale up because the person behind it is a douchebag.
documentation is poor, and what you find is sparsed outdated shit on the web, so it's really hard to find help.
Not a dumb question at all! Essentially what can do Marqo, and this blog shows, is that there is alot of logic and work to do what you said (i.e. pass raw data into LLM, get embeddings, store in vector DB, then query both embeddings and original data).