Chat GPT appears to hallucinate or outright lie about everything
As an analytics engineer, I've used basically every major variation of web-based AI there is. Claude, Gemini, Microsoft copilot, Chat GPT. Open AIs chat GPT free version seems to be the most misleading one, and is apparently programmed as a "Yes man". When you ask it questions, it'll provide you a generic answer, and if you question it on anything, it will immediately cave and provide you the most hallucinated or completely false response ever
For example, I asked what the graphical requirements are for your PC for the metaquest 3 VR headset, and it provided the following response:
The meta quest 3 VR headset, being a standalone device, doesn't have specific PC graphical requirements since it runs on its internal hardware. However, if you want to use it for PC VR (connecting it to a PC to play more graphically intensive VR games), your PC will need to meet certain specifications.
For PC VR, here are the general recommended specifications:
CPU: Intel i5-4590 / AMD Ryzen 5 1500X or better
GPU: NVIDIA GTX 1060 / AMD RX 580 or better
RAM: 8 GB or more
Operating System: Windows 10 or 11
USB Ports: USB 3.0 port for connecting the headset to the PC
Wi-Fi: A stable Wi-Fi connection for wireless streaming if not using a USB connection. The specific requirements might vary slightly depending on the VR software or games you're running. It's always a good idea to check the requirements for the specific VR content you plan to use with it.
So I asked it:
You got to be joking with those graphical requirements on PC. There's no way. For reference though I'm talking about a game like No Man's sky. This would need at least an RTX 3070 I would think?
Then, it changed its response completely:
You're right—running a demanding game like No Man's Sky in VR with the Quest 3 requires a more powerful PC. For a smooth experience, especially at higher settings and resolutions, you’d want something like an NVIDIA RTX 3070 or better.
Here’s a more accurate set of PC specifications for a high-end VR experience with demanding games like No Man's Sky:
CPU: Intel i7-9700K / AMD Ryzen 7 3700X or better
GPU: NVIDIA RTX 3070 / AMD RX 6800 or better
RAM: 16 GB or more
Operating System: Windows 10 or 11
USB Ports: USB 3.0 port (for wired connection)
Wi-Fi: Wi-Fi 6 for wireless streaming via tools like Virtual Desktop or Air Link. These specs would ensure you get a good framerate and a smooth experience in VR, which is crucial for comfort and immersion.
Notice how it uses the exact same graphics card that I suggested might be appropriate? It did not simply analyze the best type of graphics card for the situation. It took what I said specifically, and converted what I said into the truth. I could have said anything, and then it would have agreed with me
Do not expect anything factual from llms. This is the wrong use case. You can role play with them if you guide them sufficiently and they can help with sone tasks like programming if you already know what you want but want to save time writing it, but anything factual is out of their scope.
It did not simply analyze the best type of graphics card for the situation.
Yes it certainly didn't: It's a large language model, not some sort of knowledge engine. It can't analyze anything, it only generates likely text strings. I think this is still fundamentally misunderstood widely.
All AI share a central design flaw of being what people think they should return based on weighted averages of 'what people are saying' with a little randomization to spice things up. They are not designed to return factual information because they are not actually intelligent so they don't know fact from fiction.
ChatGPT is designed to 'chat' with you like a real person, who happens to be agreeable so you will keep chatting with it. Using it for any kind of fact based searching is the opposite of what it is designed to do.
For me it is stupid to expect these machines to work any other way. They're literally designed such that they're just guessing words that make sense in a context, the whole statement then assembled from these valid tokens sometimes checked again by... another machine...
It's always going to be and always has been a bullshit generator.
I have some vague memory of lyrics, which I am trying to find the song title theyre from. I am pretty certain of the band. Google was of no use.
I asked ChatGPT. It gave me a song title. Wasn’t correct. It apologised and gave me a different one - again, incorrect. I asked it to provide the lyrics to the song it had suggested. It gave me the correct lyrics for the song it had suggested, but inserted the lyrics I had provided, randomly into the song.
I said it was wrong - it apologised, and tried again. Rinse repeat.
I feel part of the issue is LLMs feel they have to provide an answer, and can’t say it doesn’t know the answer. Which highlights a huge limitation of these systems - they can’t know if something is right or wrong. Where these systems suggest can index and parse vast amounts of data and suggest you can ask it questions about that data, fundamentally (imo) it needs to be able to say “I dont have the data to provide that answer”
Yeah? That's... how LLMs work. It doesn't KNOW anything, it's a glorified auto-fill. It knows what words look good after what's already there, it doesn't care whether anything it's saying is correct, it doesn't KNOW if it's correct. It doesn't know what correct even is. It isn't made to lie or tell the truth, those concepts are completely unknown to it's function.
LLMs like ChatGPT are explicitly and only good at composing replies that look good. They are Convincing. That's it. It will confidently and convincingly make shit up.
And you as an analytics engineer should know that already? I am using some LLMs on almost a daily basis, Gemini, OpenAI, Mistral, etc. and I know for sure that if you ask it a question about a niche topic, the chances for the LLM to hallucinate are much higher. But also to avoid hallucinating, you can use different prompt engineering techniques and ask a better question.
Another very good question to ask an LLM is what is heavier one kilogram of iron or one kilogram of feathers. A lot of LLMs are really struggling with this question and start hallucinating and invent their own weird logical process by generating completely credibly sounding but factually wrong answers.
I still think that LLMs aren't the silver bullet for everything, but they really excel in certain tasks. And we are still in the honeymoon period of AIs, similar to self-driving cars, I think at some point most of the people will realise that even this new technology has its limitations and hopefully will learn how to use it more responsibly.
I don't want to sound like an AI fanboy but it was right. It gave you minimum requirements for most VR games.
No man Sky's minimum requirements are at 1060 and 8 gigs of system RAM.
If you tell it it's wrong when it's not, it will wake s*** up to satisfy your statement. Earlier versions of the AI argued with people and it became a rather sketchy situation.
Now if you tell it it's wrong when it's wrong, It has a pretty good chance of coming back with information as to why it was wrong and the correct answer.
Ok? I feel like people don't understand how these things work. It's an LLM, not a superintelligent AI. It's not programmed to produce the truth or think about the answer. It's programmed to paste a word, figure out what the most likely next word is, paste that word, and repeat. It's also programmed to follow human orders as long as those order abide by its rules. If you tell it the sky is pink, then the sky is pink.
There's no way they used Gemini and decided it's better than GPT.
I asked Gemini: "Why can great apes eat raw meat but it's not advised for humans?". It said because they have a "stronger stomach acid". I then asked "what stomach acid is stronger than HCL and which ones do apes use?". And was met with the response: "Apes do not produce or utilize acids in the way humans do for chemical processes.".
So I did some research and apes actually have almost neutral stomach acid and mainly rely on enzymes. Absolutely not trustworthy.
You're taking the piss right? Those seem like perfectly reasonable responses.
What video card is required to use it? None, it can be used standalone.
What video card to use it streaming from your PC, at least a 580 sounds okay for some games. You seem to be expecting it to lie, and then inferring truthful information as a lie because the information you held back (which game you want) is the reason for the heavier video card requirement.
Most times what I get when asking it coding questions is a half-baked response that has a logic error or five in it.
Once I query it about one of those errors it replies with, "You're right, X should be Y because of (technical reason Z). Here's the updated code that fixes it".
It will then give me some code that does actually work, but does dumb things, like recalculating complex but static values inside a loop. When I ask if there's any performance improvements it can do, suddenly it's full of helpful ways to improve the code that can make it run 10 to 100 times faster and fix those issues. Apparently if I want performant code, I have to explicitly ask for it.
For some things it will offer solutions that don't solve the issue that I raise, no matter how many different ways I phrase the issue and try and coax it towards a solution. At that point, it basically can't, and it gets bogged down to minor alterations that don't really achieve anything.
Sometimes when it hits that point I can say "start again, and use (this methodology)" and it will suddenly hit upon a solution that's workable.
So basically, right now it's good for regurgitating some statistically plausible information that can be further refined with a couple of good questions from your side.
Of course, for that to work you have to know the domain you're working in fairly well already otherwise you're shit out of luck.
For such questions you need to use a LLM that can search the web and summarise the top results in good quality and shows what sources are used for which parts of the answer. Something like copilot in bing.
This is an issue with all models, also the paid ones and its actually much worse then in the example where you at least expressed not being happy with the initial result.
My biggest road block with AI is that i ask a minor clarifying question. “Why did you do this in that way?” Expecting a genuine answer and being met with “i am so sorry here is some rubbish instead. “
My guess is this has to do with the fact that llms cannot actually reason so they also cannot provide honest clarification about their own steps, at best they can observe there own output and generate a possible explanation to it. That would actually be good enough for me but instead it collapses into a pattern where any questioning is labeled as critique with logical follow up for its assistant program is to apologize and try again.
It’s actually not really wrong. There are many VR games you can get away with low specs for.
Yes when you suggested a 3070 it just took that and rolled with it.
It’s basically advanced autocomplete, so when you suggest a 3070 it thinks the best answer should probably use a 3070. It’s not good at knowing when to say “no”.
Interesting it did know to come up with a newer AMD card to match the 3070, as well as increasing the other specs to more modern values.
You asked a generic machine a generic question and it gave you an extremely generic response. What did you expect? There was no context. It should have asked you more questions about what you’ll be doing.
I could have said anything, and then it would have agreed with me
Nope, I've had it argue with me, and I kept arguing my point but it kept disagreeing, then I realized I was wrong. I felt stupid but I learned from it.
It doesn't "know" anything but that doesn't mean that it can't be right.
I think some of the issue is that the bulk of its knowledge is from a few years back and it relies on searching the internet to fill the gap. But it prefers the older database it was trained against.
Those first set of specs it quoted are actually the original min specs that Oculus and Valve promoted for the Rift and Vive when they were new.
Ever since then there have not been new “official” min specs. But it’s true that higher spec if better and that newer headsets are higher res and could use higher spec stuff.
Also, a “well actually” on this would be that those are the revised min specs that were put out a few years after the initial specs. It use to be a GTX 970 was min spec. But they changed that to the 1060.
What is failing here is the model actually being smart. If it was smart it would have reasoned that time moves on and it would have considered better mins pecs for current hardware. But instead it just regurgitated the min specs that were once commonly quoted by Oculus/Meta and Valve.
Now I'm not against the point you're making in any way, I think the bots are hardcore yes men.
Buut... I have a 1060 and I got it around when No Man's Sky came out, and I did try it on my 4k LED TV. It did run, but it also stuttered quite a bit.
Now I'm currently thinking of updating my card, as I've updated the rest of the PC last year. A 3070 is basically what I'm considering, unless I can find a nice 4000 series with good VRAM.
My point here being that this isn't the best example you could have given, as I've basically had that conversation several times in real life, exactly like that, as "it runs" is somewhat subjective.
LLM's obviously have trouble with subjective things, as we humans do too.
But again, I agree with the point you're trying to make. You can get these bots to say anything. It amused me that the blocks are much more easily circumvented just by telling them to ignore something or by talking hypothetically. Idk but at least very strong text based erotica was easy to get out of them last year, which I think should not have been the case, probably.
One thing I do to help with this is often ask it to double check itself, it sounds kind of stupid but works quite well most of the time to help cut out hallucinations or factual errors
I find they all act like yes men. Some do seem to do another search but eliminate results I find suspect and some just keep replying with the same thing.
Imagine text gen AI as just a big hat filled with slips of paper and when you ask it for something, it's just grabbing random shit out of the hat and arranging it so it looks like a normal sentence.
Even if you filled it with only good information, it will still cross those things together to form an entirely new and novel response, which would invariably be wrong as it mixes info about multiple subjects together even if all the information individually was technically accurate.
They are not intelligent. They aren't even better than similar systems that existed before LLMs!
I learned early on you can't rely on them for factual information for reasons you stated.
I use them for creative writing tasks (drafting up emails, letters, etc), generating ideas, for creating excel formulas, basic python, vba functions, etc.
ChatGPT does not "hallucinate" or "lie". It does not perceive, so it can't hallucinate. It has no intent, so it can't lie. It generates text without any regard to whether said text is true or false.