commentr/StutterMarch 6, 2025

Content

Hi Laphi, Interesting! It turns out that we have a software app that automatically removes disfluencies from your speech. The output can be (fluent) text that is superimposed on your video, or the text can be re-synthesized into audible 'speech' in a cloned voice (that can be a canned voice or a clone that sounds like you) that is played back to you or to your Zoom partners. You don't have to do any editting ... you just speak, and the software does the rest. It's a subtle consequence of AI processing of speech, but I won't bore you with the details. If you go the superimposed-text-onto-video route, then the processing is almost real time. The text won't be right in sync with your lips, but it will be only a couple of seconds behind. At present, the synthesized-speech output is slower, and you need to finish a sentence before it starts processing the first word of that sentence. So there would be 'gaps' in the conversation, if you were using this 'audio' system to communicate over e.g. Zoom. If you are interested, I'll send you a demo and a link to a web-based 'toy' app called the Voice Mirror that just plays back your re-synthesized speech back to you -- it doesn't link to Zoom. Best of luck to you in your fluency journey!

Themes

Therapy & ProfessionalCommunity & Support

Subthemes

Assistive DevicesResearch & Resources

Codes (1)

telephone_video