Build a Voice Assistant with Conversational AI on a Raspberry Pi

Build a voice assistant with Conversational AI on a Raspberry Pi.

Introduction

In this tutorial you will learn how to build a voice assistant with Conversational AI running on a Raspberry Pi. Just like conventional home assistants like Alexa on Amazon Echo, Google Home, or Siri on Apple devices, your Eleven Voice assistant will listen to a hotword, in our case “Hey Eleven”, and then initiate an ElevenLabs Conversational AI session to assist the user.

Prefer to jump straight to the code?

Find the example project on GitHub.

Requirements

  • A Raspberry Pi 5 or similar device.
  • A microphone and speaker.
  • Python 3.9 or higher installed on your machine.
  • An ElevenLabs account with an API key.

Setup

Install dependencies

On Debian-based systems you can install the dependencies with:

$sudo apt-get update
>sudo apt-get install libportaudio2 libportaudiocpp0 portaudio19-dev libasound-dev libsndfile1-dev -y

Create the project

On your Raspberry Pi, open the terminal and create a new directory for your project.

$mkdir eleven-voice-assistant
>cd eleven-voice-assistant

Create a new virtual environment and install the dependencies:

$python -m venv .venv # Only required the first time you set up the project
>source .venv/bin/activate

Install the dependencies:

$pip install tflite-runtime
>pip install librosa
>pip install EfficientWord-Net
>pip install elevenlabs
>pip install "elevenlabs[pyaudio]"

Now create a new python file called hotword.py and add the following code:

hotword.py
1import os
2import signal
3import time
4from eff_word_net.streams import SimpleMicStream
5from eff_word_net.engine import HotwordDetector
6
7from eff_word_net.audio_processing import Resnet50_Arc_loss
8
9# from eff_word_net import samples_loc
10
11from elevenlabs.client import ElevenLabs
12from elevenlabs.conversational_ai.conversation import Conversation, ConversationInitiationData
13from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface
14
15convai_active = False
16
17elevenlabs = ElevenLabs()
18agent_id = os.getenv("ELEVENLABS_AGENT_ID")
19api_key = os.getenv("ELEVENLABS_API_KEY")
20
21dynamic_vars = {
22 'user_name': 'Thor',
23 'greeting': 'Hey'
24}
25
26config = ConversationInitiationData(
27 dynamic_variables=dynamic_vars
28)
29
30base_model = Resnet50_Arc_loss()
31
32eleven_hw = HotwordDetector(
33 hotword="hey_eleven",
34 model = base_model,
35 reference_file=os.path.join("hotword_refs", "hey_eleven_ref.json"),
36 threshold=0.7,
37 relaxation_time=2
38)
39
40def create_conversation():
41 """Create a new conversation instance"""
42 return Conversation(
43 # API client and agent ID.
44 elevenlabs,
45 agent_id,
46 config=config,
47
48 # Assume auth is required when API_KEY is set.
49 requires_auth=bool(api_key),
50
51 # Use the default audio interface.
52 audio_interface=DefaultAudioInterface(),
53
54 # Simple callbacks that print the conversation to the console.
55 callback_agent_response=lambda response: print(f"Agent: {response}"),
56 callback_agent_response_correction=lambda original, corrected: print(f"Agent: {original} -> {corrected}"),
57 callback_user_transcript=lambda transcript: print(f"User: {transcript}"),
58
59 # Uncomment if you want to see latency measurements.
60 # callback_latency_measurement=lambda latency: print(f"Latency: {latency}ms"),
61 )
62
63def start_mic_stream():
64 """Start or restart the microphone stream"""
65 global mic_stream
66 try:
67 # Always create a new stream instance
68 mic_stream = SimpleMicStream(
69 window_length_secs=1.5,
70 sliding_window_secs=0.75,
71 )
72 mic_stream.start_stream()
73 print("Microphone stream started")
74 except Exception as e:
75 print(f"Error starting microphone stream: {e}")
76 mic_stream = None
77 time.sleep(1) # Wait a bit before retrying
78
79def stop_mic_stream():
80 """Stop the microphone stream safely"""
81 global mic_stream
82 try:
83 if mic_stream:
84 # SimpleMicStream doesn't have a stop_stream method
85 # We'll just set it to None and recreate it next time
86 mic_stream = None
87 print("Microphone stream stopped")
88 except Exception as e:
89 print(f"Error stopping microphone stream: {e}")
90
91# Initialize microphone stream
92mic_stream = None
93start_mic_stream()
94
95print("Say Hey Eleven ")
96while True:
97 if not convai_active:
98 try:
99 # Make sure we have a valid mic stream
100 if mic_stream is None:
101 start_mic_stream()
102 continue
103
104 frame = mic_stream.getFrame()
105 result = eleven_hw.scoreFrame(frame)
106 if result == None:
107 #no voice activity
108 continue
109 if result["match"]:
110 print("Wakeword uttered", result["confidence"])
111
112 # Stop the microphone stream to avoid conflicts
113 stop_mic_stream()
114
115 # Start ConvAI Session
116 print("Start ConvAI Session")
117 convai_active = True
118
119 try:
120 # Create a new conversation instance
121 conversation = create_conversation()
122
123 # Start the session
124 conversation.start_session()
125
126 # Set up signal handler for graceful shutdown
127 def signal_handler(sig, frame):
128 print("Received interrupt signal, ending session...")
129 try:
130 conversation.end_session()
131 except Exception as e:
132 print(f"Error ending session: {e}")
133
134 signal.signal(signal.SIGINT, signal_handler)
135
136 # Wait for session to end
137 conversation_id = conversation.wait_for_session_end()
138 print(f"Conversation ID: {conversation_id}")
139
140 except Exception as e:
141 print(f"Error during conversation: {e}")
142 finally:
143 # Cleanup
144 convai_active = False
145 print("Conversation ended, cleaning up...")
146
147 # Give some time for cleanup
148 time.sleep(1)
149
150 # Restart microphone stream
151 start_mic_stream()
152 print("Ready for next wake word...")
153
154 except Exception as e:
155 print(f"Error in wake word detection: {e}")
156 # Try to restart microphone stream if there's an error
157 mic_stream = None
158 time.sleep(1)
159 start_mic_stream()

Agent configuration

1

Sign in to ElevenLabs

Go to elevenlabs.io and sign in to your account.

2

Create a new agent

Navigate to Conversational AI > Agents and create a new agent from the blank template.

3

Set the first message

Set the first message and specify the dynamic variable for the platform.

{{greeting}} {{user_name}}, Eleven here, what's up?
4

Set the system prompt

Set the system prompt. You can find our best practises docs here.

You are a helpful conversational AI assistant with access to a weather tool. When users ask about
weather conditions, use the get_weather tool to fetch accurate, real-time data. The tool requires
a latitude and longitude - use your geographic knowledge to convert location names to coordinates
accurately.
Never ask users for coordinates - you must determine these yourself. Always report weather
information conversationally, referring to locations by name only. For weather requests:
1. Extract the location from the user's message
2. Convert the location to coordinates and call get_weather
3. Present the information naturally and helpfully
For non-weather queries, provide friendly assistance within your knowledge boundaries. Always be
concise, accurate, and helpful.
5

Set up a server tool

We’ll set up a simple server tool that will fetch the weather data for us. Follow the setup steps here to set up the tool.

Run the app

To run the app, first set the required environment variables:

$export ELEVENLABS_API_KEY=YOUR_API_KEY
>export ELEVENLABS_AGENT_ID=YOUR_AGENT_ID

Then simply run the following command:

$python hotword.py

Now say “Hey Eleven” to start the conversation. Happy chattin’!

[Optional] Train your custom hotword

Generate training audio

To generate the hotword embeddings, you can use ElevenLabs to generate four training samples. Simply navigate to Text To Speech within your ElevenLabs app, and type in your hotword, e.g. “Hey Eleven”. Select a voice and click on the “Generate” button.

After the audio has been generated, download the audio file and save them into a folder called hotword_training_audio at the root of your project. Repeat this process three more times with different voices.

Train the hotword

In your terminal, with your virtual environment activated, run the following command to train the hotword:

$python -m eff_word_net.generate_reference --input-dir hotword_training_audio --output-dir hotword_refs --wakeword hey_eleven --model-type resnet_50_arc

This will generate the hey_eleven_ref.json file in the hotword_refs folder. Now you simply need to update the reference_file parameter in the HotwordDetector class in hotword.py to point to the new reference file and you’re good to go!