The Fediverse (Lemmy/Mastodon/etc) is based on a following/subscribing model; each instance only "sees" what it's users are currently following or subscribed to. This keeps storage and systems usage lower since each instance doesn't need a complete copy of the entire Fediverse. This third party is more like a web crawler like Google, just crawling from instance to instance and saving the data. Hopefully in the future Lemmy could add something like this discovery feature, maybe something like Mastodon Relays, to aggregate community lists, but it would definitely put more strain on each instance.
They explain it on the project's GitHub:
How does discovery work?
It uses a seed list of communities and scans the equivalent of the /instances federation lists, and then creates jobs to scan each of those servers.
How long till my instance shows up?
How long it takes to discover a new instance can vary depending on if you post content that's picked up by one of these servers.
Since the crawler looks at lists of federated instances, we can't discover instances that aren't on those lists.
Additionally, the lists are cached for 24 hours, so it can take up to 24 hours for an instance to show up after it's been discovered till it shows up.