Let's build a speech-to-speech chatbot

Juan Ovalle

Feb 11

Apply speech-to-text (STT), LLMs and text-to-speech (TTS) to replicate ChatGPT's voice assistant feature.

Read →

3 Comments

L00ng

Feb 11

Great post, but this is not a real speech-to-speech implementation like OpenAI's real-time api.

https://openai.com/index/introducing-the-realtime-api/

Expand full comment

Reply (1)

Paul Iusztin

Feb 11

You can imagine it's not 1:1 with what a 100B company is doing. Conceptually, it's the same, but they invested a lot more in streaming everything in real-time for a "real feel", but that's only extra engineering

Expand full comment

Muhammad hadi

Feb 11

Having wake word detection on top will make it more complete and a rockstar project for resume.

It keeps listening and respond when a wake word is detected and only process that fragment of speech etc.

I have been looking into AI voice agents and how big that industry is going to be in near future.

Expand full comment