|
ros2_whisper repositoryaudio_listener transcript_manager whisper_bringup whisper_cpp_vendor whisper_demos whisper_idl whisper_server whisper_util |
Repository Summary
| Description | Whisper C++ Inference Action Server for ROS 2 |
| Checkout URI | https://github.com/ros-ai/ros2_whisper.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2024-12-13 |
| Dev Status | UNKNOWN |
| Released | UNRELEASED |
| Tags | No category tags. |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Packages
| Name | Version |
|---|---|
| audio_listener | 1.4.0 |
| transcript_manager | 1.4.0 |
| whisper_bringup | 1.4.0 |
| whisper_cpp_vendor | 1.4.0 |
| whisper_demos | 1.4.0 |
| whisper_idl | 1.4.0 |
| whisper_server | 1.4.0 |
| whisper_util | 1.4.0 |
README
ROS 2 Whisper
ROS 2 inference for whisper.cpp.
Example
This example shows live transcription of first minute of the 6’th chapter in Harry Potter and the Philosopher’s Stone from Audible:

Build
- Install
pyaudio, see install instructions. - Build this repository, do
mkdir -p ros-ai/src && cd ros-ai/src && \
git clone https://github.com/ros-ai/ros2_whisper.git && cd .. && \
colcon build --symlink-install --cmake-args -DGGML_CUDA=On --no-warn-unused-cli
Demos
Configure whisper parameters in whisper.yaml.
Whisper On Key
Run the inference action server (this will download models to $HOME/.cache/whisper.cpp):
ros2 launch whisper_bringup bringup.launch.py
Run a client node (activated on space bar press):
ros2 run whisper_demos whisper_on_key
Stream
Bringup whisper:
ros2 launch whisper_bringup bringup.launch.py
Launch the live transcription stream:
ros2 run whisper_demos stream
Parameters
To enable/disable inference, you can set the active parameter from the command line with:
ros2 param set /whisper/inference active false # false/true
- Audio will still be saved in the buffer but whisper will not be run.
Available Actions
Action server under topic inference of type Inference.action.
-
The feedback message regularly publishes the actively changing portion of the transcript.
-
The final result contains stale and active portions from the start of the inference.
Published Topics
Topics of type AudioTranscript.msg on /whisper/transcript_stream, which contain the entire transcript (stale and active), are published on updates to the transcript.
Internally, the topic /whisper/tokens of type WhisperTokens.msg is used to transfer the model output between nodes.
Troubleshoot
- Encoder inference time: https://github.com/ggerganov/whisper.cpp/issues/10#issuecomment-1302462960