3 Comments
User's avatar
L00ng's avatar

Great post, but this is not a real speech-to-speech implementation like OpenAI's real-time api.

https://openai.com/index/introducing-the-realtime-api/

Expand full comment
Paul Iusztin's avatar

You can imagine it's not 1:1 with what a 100B company is doing. Conceptually, it's the same, but they invested a lot more in streaming everything in real-time for a "real feel", but that's only extra engineering

Expand full comment
Muhammad hadi's avatar

Having wake word detection on top will make it more complete and a rockstar project for resume.

It keeps listening and respond when a wake word is detected and only process that fragment of speech etc.

I have been looking into AI voice agents and how big that industry is going to be in near future.

Expand full comment