Sentence structure means that it kind of can't happen in real-time as such, because you would need to wait until potentially the end of the sentence to get words that appear early in the sentence in an accurate and natural-ish translation. If "20 seconds later" is real time, barring run-on sentences, which are much more common in speech than in writing, then I guess.
you would need to wait until potentially the end of the sentence to get words that appear early in the sentence in an accurate and natural-ish translation
Yandex Browser already does this, but to Russian only. It has like 10-15 seconds delay for live streams (at least on Youtube) but it works as well as the auto-generated transcription.
Here’s the funny part: their American accent totally made it believable.
It’s very clear that even with the AI generated voice, they are not native Mandarin speakers. They sound like your typical foreigners who learned Chinese for a number of years lol. I don’t know if it’s the dataset they’re trained on or just how the algorithm works, but it’s very interesting.
Makes me think about what it would be like if Chinese ever becomes an international language, in the way English has and Latin did before it. It makes me giggle to think about Mandarin with a backwoods Tennessee drawl.
Even with the phonemes of any two given language varieties that are considered to be “the same sound”, there are going to be differences in what the average pronunciation is, so I assume that’s a lot of what’s going on here. The other thing is that English and Chinese have a lot of phonemes that barely or don’t at all overlap in possible pronunciations, so the algorithm is picking the closest match.
Felix in the replies: "I’m crying at how beautiful this is. I support AI now. all I have ever wanted is for the show to be credibly portrayed as a Chinese podcast"
Native Mandarin speaker here. They all sound like your typical Westerners who have lived in China for a number of years. It’s more interesting that the AI were able to give them that realistic Western accent than a proper regional Chinese accent.
how good are the translations? does anyone know Chinese and can compare the original to the translated version? I'd be interested to hear how accurate the translations are. passingly accurate transcription services are a huge boon.
If we get to the point where this kind of stuff can process accurately, in real time, on a mobile device, we will have destroyed the need for real time speech translators (sorry, translators), and, more importantly, most of the language barriers between the international working class, no? Cautiously optimistic.
I already wrote this below, but Yandex Browser already does this! It only translates to Russian, and with live streams (on Youtube for example) you get a ~15 seconds delay.
It’s basically as a real-time transcription -> translation -> voice generation pipeline so the accuracy is as good as the transcript it manages to extract from.
I am not that scared for my job in the next 10 years. As long as people don't trust Bazinga Translate from the Torment Nexus Company to translate the doctor who goes through the Operation details with them, I am safe. Remember many countries want a stamp from a sworn translator for legal documents. Not to mention how shit even AI translation is still for Arabic (if it can recognize the letters to begin with, lmao). I can pre-translate a text and then go through it again, sometimes it even saves me time.
Eh I dunno speech to text algorithms are still kinda garbage. They work well on simple sentences but once you throw in colloquialisms and abbreviations it just craps out.