Is there a way to search through posts across all instances/communities?
Hi, I'm relatively new here.
Like probably a lot of people, I appended the word "reddit" to most of my searches to get discussions around my interests/news or to find "curated" things. I noticed that it does not work as great with "Lemmy" for example, as the user base is more spread out and smaller in general.
I've looked for a search engine for the Lemmy or fediverse with no luck. There exists some to find communities but not to return posts it seems.
If it does not exists, is this even something that can be built? Would one need to build a crawler or use an API? I would be very interested to contribute to such a project.
But https://www.search-lemmy.com. It has a few bugs but it should work for you. Especially if you set your home instance to something large like Lemmy.world.
Searching the whole Fediverse, literally all of it, 100%, is technically impossible or at least very hard to implement, and if implemented, it'd eat up lots of CPU power and network bandwidth.
It's simply next to impossible for any instance of any Fediverse project, also for any centralised or decentralised dedicated search engine, to know all instances and all content on it without all instances actively pushing their existence, their status and all their content to the search engine in real-time.
A search engine that literally covers all of the Fediverse with no exception has to even know about brand-new instances that have just been started a split-second ago. An instance that's so new doesn't even have any connections into the Fediverse yet, probably no content and only one account, the admin account. (Replace "account" with "channel" on Hubzilla and (streams).)
So if someone spins up a new instance of whatever project, that search feature has to know about that instance immediately before the instance even connects with anything. That is, I'm not sure when that search feature is expected to know about a new Hubzilla hub since ActivityPub is optional per hub and per channel and AFAIK off by default for both: Shall the search feature already know when ActivityPub is still off, and nothing in the Fediverse that isn't Hubzilla or (streams) can connect to it anyway, or shall it only learn about the instance the second that the hub admin turns ActivityPub on?
And when the admin of a new instance puts out a test post to see if it runs as desired, and the instance still isn't connected to any other instance, the search feature would immediately know that test post so you can find it if it's that what you're looking for.
Mind you, Google doesn't know everything on the Internet either.
A search engine that literally covers all of the Fediverse with no exception has to even know about brand-new instances that have just been started a split-second ago. An instance that’s so new doesn’t even have any connections into the Fediverse yet, probably no content and only one account, the admin account. (Replace “account” with “channel” on Hubzilla and (streams).)
So if someone spins up a new instance of whatever project, that search feature has to know about that instance immediately before the instance even connects with anything.
Yes, but who would want a search engine to specifically cover emtpy servers with half a nanosecond lifetime? For all practical intents and purposes, people search for content, which already excludes these theoretical edge cases. More realistically, people will search for quality content, which implies some engagement happened and some upvotes accumulated. There is no value in discovering servers before users discovered them, on the contrary.
If you really care about new and empty servers, you're rather looking for a fediverse monitoring tool than a search engine. And even for those, it's questionable what the value of those entries would be. I would prefer if they are filtered out to not bloat the numbers.
Sure this is technically true, but it doesn't really fix the human need to find things. It would be better if some grouping of Fediverse instances came together under a common banner and agreed to certain protocols that helped make things like mass-indexing easier. This would enable a better frontend experience for people trying to find good content. In fact I think building more protocols on top of the existing one would be exactly inline with the philosophical underpinnings of the Fediverse
Do we know if/that Lemmy posts are getting indexed by google? I haven’t had much luck throwing “lemmy” into my google searches but presumably if we do it should start getting more traffic and increase rank?