The Google device was able to deliver a detailed description of “The Nakba.”
Google is coming in for sharp criticism after video went viral of the Google Nest assistant refusing to answer basic questions about the Holocaust — but having no problem answer questions about the Nakba.
If you train your large language model on all the internet's bullshit and don't want bullshit to come out, there's not a lot of good options. Garbage in, garbage out
Yes, false information is technically undesirable, but that's not really what that word is trying to convey. The goal should be accurate information, not agreeable information. If the truth is objectionable/offensive, it should still be easily findable.
I'm actually wondering what is censorship. Because if you are going to include every nonsense blog and asshat that has some unfounded garbage to spew, the quality of your product will potentially be garbage. So you end up with the question on what sources to include, and you probably end up with authorative sources that are regarded higher.
The issue we already see with Google search is that seo spam and generated websites that all form a large circle jerk are setup to fool the algorithm. This will be the case for llms as well. The longer they are in use the better people will understand how to game the system. And then bad actors will get these things to say whatever they want.
I don't know a solution, but my guess is that it lies in what used to happen for the encyclopedia Britannica etc.. large pools of experts that curate the underlaying sources. Like in libraries etc.
Nah, I think the solution is simpler: multiple competing algorithms. Gaming one system is pretty easy, gaming 5 isn't. So if a search company wants to always have the top results, they need to swap between a handful of good search algorithms to keep SEO hunters at bay.
Hiring experts is certainly a good idea, but due to the sheer size of the internet, it's not going to be feasible.
As for the original discussion about censorship in search, I take it to mean intentional hiding or demotion of relevant results due to the content of those results. SEO spam isn't relevant because it's not what the customer is likely wanting, so hiding/demoting it doesn't count as censorship imo.
Censorship is simply intentionally limiting the information that someone else has available to them, and it is bad. Let them curate their own information, that's fine, but they should have choice over what they see.
I disagree. The whole "buyer beware" does not work. Everyone is entitled to their own opinions but not to their own facts. Plenty of people out there are not able to curate their own content and rely on others to do it for them. Librarians, curators, there are jobs specifically for that purpose.
I think it is time.. no overdue, that proper curation takes over again. But the task is so enormous that it will be a challenge to figure out how this is done properly. And.. commercial entities will always have incentives that are not aligned with that of the broader populace.. so there is that.
Ideally the user would be in complete control of what gets censored for them. The service should simply flag content by category and the user could selectively show/hide content.
Individual instances can and do but decentralization means everyone can spin one up with your own rules.
I bet if you looked around there be plenty of lawless absolutist instances that allow all manners of free speech but non will adhere exactly to your own moral ideals besides the one you made yourself.