Apply speech-to-text (STT), LLMs and text-to-speech (TTS) to replicate ChatGPT's voice assistant feature.
Great post, but this is not a real speech-to-speech implementation like OpenAI's real-time api.
https://openai.com/index/introducing-the-realtime-api/
You can imagine it's not 1:1 with what a 100B company is doing. Conceptually, it's the same, but they invested a lot more in streaming everything in real-time for a "real feel", but that's only extra engineering
Having wake word detection on top will make it more complete and a rockstar project for resume.
It keeps listening and respond when a wake word is detected and only process that fragment of speech etc.
I have been looking into AI voice agents and how big that industry is going to be in near future.
Great post, but this is not a real speech-to-speech implementation like OpenAI's real-time api.
https://openai.com/index/introducing-the-realtime-api/
You can imagine it's not 1:1 with what a 100B company is doing. Conceptually, it's the same, but they invested a lot more in streaming everything in real-time for a "real feel", but that's only extra engineering
Having wake word detection on top will make it more complete and a rockstar project for resume.
It keeps listening and respond when a wake word is detected and only process that fragment of speech etc.
I have been looking into AI voice agents and how big that industry is going to be in near future.