Build a Voice Assistant with Conversational AI on a Raspberry Pi

Introduction

In this tutorial you will learn how to build a voice assistant with Conversational AI running on a Raspberry Pi. Just like conventional home assistants like Alexa on Amazon Echo, Google Home, or Siri on Apple devices, your Eleven Voice assistant will listen to a hotword, in our case “Hey Eleven”, and then initiate an ElevenLabs Conversational AI session to assist the user.

Prefer to jump straight to the code?

Find the example project on GitHub.

Requirements

A Raspberry Pi 5 or similar device.
A microphone and speaker.
Python 3.9 or higher installed on your machine.
An ElevenLabs account with an API key.

Setup

Install dependencies

On Debian-based systems you can install the dependencies with:

$ sudo apt-get update
> sudo apt-get install libportaudio2 libportaudiocpp0 portaudio19-dev libasound-dev libsndfile1-dev -y

Create the project

On your Raspberry Pi, open the terminal and create a new directory for your project.

$ mkdir eleven-voice-assistant
> cd eleven-voice-assistant

Create a new virtual environment and install the dependencies:

$ python -m venv .venv # Only required the first time you set up the project
> source .venv/bin/activate

Install the dependencies:

$ pip install tflite-runtime
> pip install librosa
> pip install EfficientWord-Net
> pip install elevenlabs
> pip install "elevenlabs[pyaudio]"

Now create a new python file called hotword.py and add the following code:

hotword.py

1 import os
2 import signal
3 import time
4 from eff_word_net.streams import SimpleMicStream
5 from eff_word_net.engine import HotwordDetector
6 
7 from eff_word_net.audio_processing import Resnet50_Arc_loss
8 
9 # from eff_word_net import samples_loc
10 
11 from elevenlabs.client import ElevenLabs
12 from elevenlabs.conversational_ai.conversation import Conversation, ConversationInitiationData
13 from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface
14 
15 convai_active = False
16 
17 elevenlabs = ElevenLabs()
18 agent_id = os.getenv("ELEVENLABS_AGENT_ID")
19 api_key = os.getenv("ELEVENLABS_API_KEY")
20 
21 dynamic_vars = {
22     'user_name': 'Thor',
23     'greeting': 'Hey'
24 }
25 
26 config = ConversationInitiationData(
27     dynamic_variables=dynamic_vars
28 )
29 
30 base_model = Resnet50_Arc_loss()
31 
32 eleven_hw = HotwordDetector(
33     hotword="hey_eleven",
34     model = base_model,
35     reference_file=os.path.join("hotword_refs", "hey_eleven_ref.json"),
36     threshold=0.7,
37     relaxation_time=2
38 )
39 
40 def create_conversation():
41     """Create a new conversation instance"""
42     return Conversation(
43         # API client and agent ID.
44         elevenlabs,
45         agent_id,
46         config=config,
47 
48         # Assume auth is required when API_KEY is set.
49         requires_auth=bool(api_key),
50 
51         # Use the default audio interface.
52         audio_interface=DefaultAudioInterface(),
53 
54         # Simple callbacks that print the conversation to the console.
55         callback_agent_response=lambda response: print(f"Agent: {response}"),
56         callback_agent_response_correction=lambda original, corrected: print(f"Agent: {original} -> {corrected}"),
57         callback_user_transcript=lambda transcript: print(f"User: {transcript}"),
58 
59         # Uncomment if you want to see latency measurements.
60         # callback_latency_measurement=lambda latency: print(f"Latency: {latency}ms"),
61     )
62 
63 def start_mic_stream():
64     """Start or restart the microphone stream"""
65     global mic_stream
66     try:
67         # Always create a new stream instance
68         mic_stream = SimpleMicStream(
69             window_length_secs=1.5,
70             sliding_window_secs=0.75,
71         )
72         mic_stream.start_stream()
73         print("Microphone stream started")
74     except Exception as e:
75         print(f"Error starting microphone stream: {e}")
76         mic_stream = None
77         time.sleep(1)  # Wait a bit before retrying
78 
79 def stop_mic_stream():
80     """Stop the microphone stream safely"""
81     global mic_stream
82     try:
83         if mic_stream:
84             # SimpleMicStream doesn't have a stop_stream method
85             # We'll just set it to None and recreate it next time
86             mic_stream = None
87             print("Microphone stream stopped")
88     except Exception as e:
89         print(f"Error stopping microphone stream: {e}")
90 
91 # Initialize microphone stream
92 mic_stream = None
93 start_mic_stream()
94 
95 print("Say Hey Eleven ")
96 while True:
97     if not convai_active:
98         try:
99             # Make sure we have a valid mic stream
100             if mic_stream is None:
101                 start_mic_stream()
102                 continue
103 
104             frame = mic_stream.getFrame()
105             result = eleven_hw.scoreFrame(frame)
106             if result == None:
107                 #no voice activity
108                 continue
109             if result["match"]:
110                 print("Wakeword uttered", result["confidence"])
111 
112                 # Stop the microphone stream to avoid conflicts
113                 stop_mic_stream()
114 
115                 # Start ConvAI Session
116                 print("Start ConvAI Session")
117                 convai_active = True
118 
119                 try:
120                     # Create a new conversation instance
121                     conversation = create_conversation()
122 
123                     # Start the session
124                     conversation.start_session()
125 
126                     # Set up signal handler for graceful shutdown
127                     def signal_handler(sig, frame):
128                         print("Received interrupt signal, ending session...")
129                         try:
130                             conversation.end_session()
131                         except Exception as e:
132                             print(f"Error ending session: {e}")
133 
134                     signal.signal(signal.SIGINT, signal_handler)
135 
136                     # Wait for session to end
137                     conversation_id = conversation.wait_for_session_end()
138                     print(f"Conversation ID: {conversation_id}")
139 
140                 except Exception as e:
141                     print(f"Error during conversation: {e}")
142                 finally:
143                     # Cleanup
144                     convai_active = False
145                     print("Conversation ended, cleaning up...")
146 
147                     # Give some time for cleanup
148                     time.sleep(1)
149 
150                     # Restart microphone stream
151                     start_mic_stream()
152                     print("Ready for next wake word...")
153 
154         except Exception as e:
155             print(f"Error in wake word detection: {e}")
156             # Try to restart microphone stream if there's an error
157             mic_stream = None
158             time.sleep(1)
159             start_mic_stream()

Agent configuration

Create a new agent

Navigate to Conversational AI > Agents and create a new agent from the blank template.

Set the first message

Set the first message and specify the dynamic variable for the platform.

{{greeting}} {{user_name}}, Eleven here, what's up?

Set the system prompt

Set the system prompt. You can find our best practises docs here.

You are a helpful conversational AI assistant with access to a weather tool. When users ask about
weather conditions, use the get_weather tool to fetch accurate, real-time data. The tool requires
a latitude and longitude - use your geographic knowledge to convert location names to coordinates
accurately.
Never ask users for coordinates - you must determine these yourself. Always report weather
information conversationally, referring to locations by name only. For weather requests:
1. Extract the location from the user's message
2. Convert the location to coordinates and call get_weather
3. Present the information naturally and helpfully
For non-weather queries, provide friendly assistance within your knowledge boundaries. Always be
concise, accurate, and helpful.

Set up a server tool

We’ll set up a simple server tool that will fetch the weather data for us. Follow the setup steps here to set up the tool.

Run the app

To run the app, first set the required environment variables:

$ export ELEVENLABS_API_KEY=YOUR_API_KEY
> export ELEVENLABS_AGENT_ID=YOUR_AGENT_ID

Then simply run the following command:

$ python hotword.py

Now say “Hey Eleven” to start the conversation. Happy chattin’!

[Optional] Train your custom hotword

Generate training audio

To generate the hotword embeddings, you can use ElevenLabs to generate four training samples. Simply navigate to Text To Speech within your ElevenLabs app, and type in your hotword, e.g. “Hey Eleven”. Select a voice and click on the “Generate” button.

After the audio has been generated, download the audio file and save them into a folder called hotword_training_audio at the root of your project. Repeat this process three more times with different voices.

Train the hotword

In your terminal, with your virtual environment activated, run the following command to train the hotword:

$ python -m eff_word_net.generate_reference --input-dir hotword_training_audio --output-dir hotword_refs --wakeword hey_eleven --model-type resnet_50_arc

This will generate the hey_eleven_ref.json file in the hotword_refs folder. Now you simply need to update the reference_file parameter in the HotwordDetector class in hotword.py to point to the new reference file and you’re good to go!