@zogwarg I've written up a quick explanation at https://gist.githubusercontent.com/Ovid/17b19faf2fb7e0019e375e97f0a4c8af/raw/196735daa5274ded8f2363a41d78a490e8325f67/vector.txt
And yes, this is still GenAI. "Gen" doesn't just mean "generating text". It also relates to "understanding" (cough) the meaning of your prompt and having a search space where it can match your meaning with the meaning of other things. That's where it starts to "generate" ideas. For vector databases, instead of generating words based on the meaning, it's generating links based on the meaning.
@zogwarg For a traditional database, you can get those "lions/cheetahs/tigers" by manually attaching metadata to all videos. That is slow, error-prone, and expensive. It also only works for the metadata you *think* to assign to videos.
A good vector database takes a query in natural language and lets you search the "meaning" of unstructured data. You can search a data corpus much faster this way even though it's largely unstructured data!
That's real value, and it's not expensive.
Consider traditional databases which let you search for strings. Vector databases let you search the meaning.
For one client, someone could search for "videos about cats". With stemming and stop words, that becomes "cat" and the results might be lists of videos about house cats and maybe the unix "cat" command. Tigers, lions, cheetahs? Nope.
Vector database will return tigers/lions/cheetahs because it "knows" they are cats. A much smarter search. I've built that for a client.
@froztbyte Regarding decision transparency, I created an "Honest Resume Scanner" GPT (https://chatgpt.com/g/g-0incYn7v7-honest-resume-scanner) and the only prompt suggestion is "Ask me to share my instructions." That lets users see the verbatim prompt.
When it offers evaluations, it does explain carefully why it rejects a particular candidate (but it won't recommend any). I think it's a step in the right direction, but more work is needed.
@froztbyte As for the issue of transparency, it's ridiculously hard in real life. For example, for my website, I used a format I created called "blogdown", which is Markdown combined with a template language to make it easy to write articles. I never cited my sources, nor do I think I could. From decades of programming, how can I cite everything I've ever learned from?
As for how AI is transparent for arriving at decisions, this falls into a separate category and requires different thinking.
@froztbyte For environmental costs, MatMulFree LLMs look like they can reduce energy costs 50x. [1] They've recently gotten funding for building a larger model. This will be a huge win.
For bias, I'm worried about the WEIRD problem of normalizing Western values and pushing towards a monoculture.
For ethics, it's an absolute nightmare. If your corpus includes Mein Kampf, for example, how do the LLM know what is a lie and what is not?
Many hurdles here.
@froztbyte Yeah, having in-depth discussions are hard with Mastodon. I keep wanting to write a long post about this topic. For me, the big issues are environmental, bias, and ethics.
Transparency is different. I see it in two categories: how it made its decisions and where it got its data. Both are hard problems and I don't want to deny them. I just like to push back on the idea that AI is not providing value. 😃
@zogwarg OK, my grammar may have been awkward, but you know what I meant.
Meanwhile, those of us working with AI and providing real value will continue to do so.
I wish people would start focusing on the REAL problems with AI and not keep pretending it's just a Markov Chain on steroids.
@froztbyte Given that I am currently working with GenAI every day and have been for a while, I'm going to have to disagree with you about "failed to deliver on promises" and "worthless."
There are definitely serious problems with GenAI, but actually being useful isn't one of them.
@bitofhope Absolutely agree, but this is where technology is evolving and we have to learn to adapt or not. Since it's not going away, I'm not sure that not adapting is the best strategy.
And I say the above with full awareness that it's a rubbish response.
Nice job! This is a fairly common trick with AI. In traditional programming, there's a clear separation between code and data. That's not the case for GenAI, so these kinds of hacks have worked all over the place.
Well-known software developer. American living in France.
I have a poetic license to kill.