Long story short they turn songs into digital fingerprints, where it can be stored in a tiny way. When you start recording you're making a fingerprint as well. Then it just starts testing yours against its database.
Some Google folks wrote how they do the Now Playing feature offline with very little resource impact. It's a great paper to read and quite easy to understand
https://arxiv.org/abs/1711.10958
I only ever used SoundHound and never had any failure. Shrug in fact once it found a song my son couldn’t using Shazam. I imagine that could happen either way sometimes I’m just stating my experience. I don’t think either of them are vastly superior to the other just I thought SH was out there earlier. But I could be wrong.
Around 10 years ago I was trying to write something like it for a company. It is a lot of time, so probably now there is a lot of AI involved, but 10 years ago the path was to build a heatmap of the binary and try to find matches
I always felt like it had something to do with calculating the bmp and key and the words or something and they kept a database of this data to filter things down
The real question is, how do you get Shazam to actually work!? From my understanding, it's now part of Snapchat but I'm never able to get it to work when I need it!