Automatic Speech Recognition in Linux - Seeking Experiences and Recommendations
Hi folks, I'm in a bit of a personal crisis currently and need to quickly find a piece of speech transcription software that works on Linux and does not require a significant time investment to set up and can help me transcribe a number of audio clips <15 min. each.
Can someone recommend a program that can transcribe some audio recordings for me and is relatively simple to set up and use?
Do such programs need a GPU to run effectively? I'm running a Dell XPS 9370 laptop which only has internal graphics.
My backup plan is to listen and transcribe by hand, so recommendations of a program that will allow me to self-transcribe by typing while listening at a reduced rate are also appreciated.
If any experienced transcribers are reading this, have you found that your pedals worked well with Linux?
Normally I would try out all the different programs and do more than the small number of searches I've done, but my timeline doesn't allow time for to build a cluster of custom-coded transcription bots running gentoo on hand-soldered hardware.
My environment is EndeavorOS running on a Dell XPS 9370,internet is over Wifi, with no external dongles or anything currently hooked up.
I’ve had good experiences with whisper.cpp (should be in the AUR). I used the large model on my GPU (3060), and it filled 11.5 out of the 12GB of vram, so you might have to settle for a lower tier model. The speed was pretty much real time on my GPU, so it might be quite a bit slower on your CPU, unless the lower tier models are also a lot faster (never tested them due to lack of necessity).
The large model had pretty much perfect accuracy (only 5 or so mistakes in ~40 pages of transcriptions), and that was with Dutch audio recorded on a smartphone. If it can handle my pretty horrible conditions, your audio should (hopefully) be no problem to transcribe.
I used the base model and it ran at a very acceptable speed with CPU only. Decent accuracy considering the recording was mediocre quality at best. Thank you for the suggestion.