I suppose that if someone builds a system where there's a LLM doing mapping not just from the spoken text, but from descriptive text to speech -- like, do Tortoise TTS but with a Stable Diffusion style prompt for description, it'd be possible to hear SirMechsALot's voice. That'd be interesting.