Story
What Is This Project About?
This project is about creating a compact, low-power, and smart voice assistant that works in real-time using just a single ESP32-S3 microcontroller, a microphone, and a speaker.
It listens to your voice, processes your question through Google’s Gemini AI, and speaks out the answer using Text-to-Speech. It supports general questions, math queries, and even real-time translation – all from your voice.
And the best part? It's completely portable, fits in your hand, and costs way less than building similar setups with Raspberry Pi or cloud services.
How Does It Work?
Here’s the complete working flow of the system:
-
You press a button and speak your query
-
ESP32 records your voice and sends the audio to Deepgram API for transcription
-
The text is forwarded to Gemini AI API for a smart, real-time response
-
The response text is sent to Google TTS API
-
ESP32 receives the audio and plays it back via a speaker
List of Components
- ESP32 WROOM32D – The brain of the project, offering Wi-Fi and Bluetooth connectivity.
- MAX98357 Amplifier – For high-quality audio output.
- INMP441 MEMS Microphone – Captures voice input with precision.
- SD Card Module – Stores audio files or configuration data.
- LiPo Battery – Powers the device.
- TP4056 Charging Module – Handles battery charging and power management.
- Miscellaneous – HT7833 IC, PCB or breadboard, connectors, etc
Why This Project Matters
This project is more than just a cool DIY gadget. It showcases the power of AI in embedded systems, proves that you don’t need expensive hardware to work with LLMs, and provides a real use case for offline-like, edge-friendly voice AI.
By sharing this project, I hope to inspire makers, educators, and developers to try out AI + IoT in an accessible way. Whether you're teaching kids, building smart assistants, or prototyping an AI translator, this project is a great foundation.
Tutorial Video
https://youtu.be/zvR9DTfMwPE?si=DvWAgLvEsCJJbt20