Story
1. Introduction & Key Features
My project is a real-time AI voice assistant built on an ESP32. It listens to your questions, transcribes them via Deepgram, queries ChatGPT for up-to-date answers, then speaks back through an I²S amp—all while showing each step on a 16×2 LCD display. This makes it a hands-free, web-connected assistant without needing a PC.
-
Live Web Data: Fetch news, weather, stock quotes, or any current event.
-
Local Logging: Records every question (WAV).
-
Clear UI: 16×2 LCD shows “Ready for Ans...”, “record start...”, “Speech to text”, and the actual text.
-
Portable Power: Runs from any USB charger or power bank—no batteries to worry about.
2. Hardware Design & Assembly
All pin-out and wiring details can be found in the Schematic and Layout section. In brief, the core modules are:
-
ESP32-WROOM-32: Main controller with Wi-Fi & I²S
-
INMP441 I²S Mic: Digital microphone
-
MAX98357A Amp: I²S audio output to speaker
-
microSD Module: WAV recording & logs
-
16×2 I²C LCD: Status display
-
Pushbutton: Record trigger
3. Project Demonstration Video
See it in action:
https://drive.google.com/file/d/1I4ZKJhOAJsSllxNhRCoU3JL7TKS1CD7C/view?usp=drivesdk
4. Software Workflow & Code
st=>start: Button Press
rec=>operation: Record WAV to SD
stt=>operation: Deepgram STT → Text
gpt=>operation: ChatGPT API → Reply Text
tts=>operation: OpenAI TTS → Audio
play=>end: Play via I²S → LCD shows steps
st->rec->stt->gpt->tts->play
-
Initialize Peripherals (I²S, SD, LCD, Wi-Fi)
-
Record Audio: Hold button → WAV saved to SD.
-
Transcribe: Upload WAV to Deepgram → receive plain text.
-
Chat & TTS: Send text to ChatGPT → get reply → send reply to OpenAI TTS.
-
Playback: Stream audio through MAX98357A → display messages.
-
Loop: Ready for next query.
Code Highlights:
-
lib_audio_recording.ino
– I²S + SD card WAV writer -
lib_audio_transcription.ino
– HTTPS POST to Deepgram STT -
lib_OpenAI_Chat.ino
– ChatGPT Completions API handler -
lib_audio_tts.ino
– OpenAI TTS playback routine -
main.ino
– Orchestrates button, LCD, and state machine
5. Reference Code
Find the complete source code and documentation on GitHub:
https://github.com/kaloprojects/KALO-ESP32-Voice-ChatGPT
6. Step-by-Step Build Tutorial
-
Solder headers to each module.
-
Wire modules per the PDF’s pin diagram.
-
Flash firmware via Arduino IDE
-
Insert a FAT32-formatted microSD card.
-
Power via USB charger—watch the LCD boot.
-
Hold the button, speak—watch and listen!
7. Lessons Learned & Pitfalls
-
I²S Buffering: Tune buffer sizes to prevent underruns.
-
Network Timeouts: Implement retries for STT and GPT calls.
-
Power Stability: Use a reliable USB supply to avoid drops during Wi-Fi.
Conclusion
This project seamlessly integrates embedded audio I/O, cloud-based AI, and real-time web access into a user-friendly device—ideal for anyone looking to explore AI assistants on the go.