Package Summary

Tags No category tags.
Version 2.1.29
License BSD
Build type CATKIN
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/jsk-ros-pkg/jsk_3rdparty.git
VCS Type git
VCS Version master
Last Updated 2025-01-09
Dev Status DEVELOPED
CI status No Continuous Integration
Released RELEASED
Tags No category tags.
Contributing Help Wanted (0)
Good First Issues (0)
Pull Requests to Review (0)

Package Description

ROS wrapper for Python SpeechRecognition library

Additional Links

Maintainers

  • Yuki Furuta

Authors

  • Yuki Furuta

ros_speech_recognition

A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.

Tutorials

Normal tutorial

  1. Install this package and SpeechReconition
  sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
  
  1. Launch speech recognition node
  roslaunch ros_speech_recognition speech_recognition.launch
  
  1. Echo /speech_to_text
  rostopic echo /speech_to_text
  # you can get the recognition result
  

Parrotry tutorial

Parrotry mean オウム返し in Japanese

# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP

speech_recognition_node.py Interface

Publishing Topics

  • ~voice_topic (speech_recognition_msgs/SpeechRecognitionCandidates)

    Speech recognition candidates topic name.

    Topic name is set by parameter ~voice_topic, and default value is speech_to_text.

  • sound_play (sound_play/SoundRequestAction)

    Action client to play sound on events. If the action server is not available or ~enable_sound_effect is False, no sound is played.

Subscribing Topics

  • ~audio_topic (audio_common_msgs/AudioData)

    Audio stream data to be recognized.

    Topis name is set by parameter ~audio_topic and default value is audio.

Advertising Services

  • speech_recognition (speech_recognition_msgs/SpeechRecognition)

    Service for speech recognition

  • speech_recognition/start (std_srvs/Empty)

    Start service for speech recognition

    This service is available when parameter ~contiunous is True.

  • speech_recognition/start (std_srvs/Empty)

    Stop service for speech recognition

    This service is available when parameter ~contiunous is True.

Parameters

  • ~voice_topic (String, default: speech_to_text)

    Publishing voice topic name

  • ~audio_topic (String, default: audio)

    Subscribing audio topic name

  • ~enable_sound_effect (Bool, default: True)

    Flag to enable or disable sound to play sound on recognition.

  • ~language (String, default: en-US)

    Language to be recognized

  • ~engine (Enum[String], default: Google)

    Speech-to-text engine (To see full options use dynamic_reconfigure)

  • ~energy_threshold (Double, default: 300)

    Threshold for Voice activity detection

  • ~dynamic_energy_threshold (Bool, default: True)

    Adaptive estimation for energy_threshold

  • ~dynamic_energy_adjustment_damping (Double, default: 0.15)

    Damping threshold for dynamic VAD

  • ~dynamic_energy_ratio (Double, default: 1.5)

    Energy ratio for dynamic VAD

  • ~pause_threshold (Double, default: 0.8)

    Seconds of non-speaking audio before a phrase is considered complete

  • ~operation_timeout (Double, default: 0.0)

    Seconds after an internal operation (e.g., an API request) starts before it times out

  • ~listen_timeout (Double, default: 0.0)

    The maximum number of seconds that this will wait for a phrase to start before giving up

  • ~phrase_time_limit (Double, default: 10.0)

    The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached

  • ~phrase_threshold (Double, default: 0.3)

    Minimum seconds of speaking audio before we consider the speaking audio a phrase

  • ~non_speaking_duration (Double, default: 0.5)

    Seconds of non-speaking audio to keep on both sides of the recording

  • ~duration (Double, default: 10.0)

    Seconds of waiting for speech

  • ~depth (Int, default: 16)

    Depth of audio signal

  • ~n_channel (Int, default: 1)

    Total number of channels in audio data (e.g. 1: mono, 2: stereo)

  • ~sample_rate (Int, default: 16000)

    Sample rate of audio signal

  • ~buffer_size (Int, default: 10240)

    Maximum buffer size to store audio data for speech recognition

  • ~start_signal (String, default: /usr/share/sounds/freedesktop/stereo/bell.ogg)

    Path to sound file for bell on the start of audio caption

  • ~recognized_signal (String, default: /usr/share/sounds/freedesktop/stereo/message.ogg)

    Path to sound file for bell on the end of audio caption

  • ~success_signal (String, default: /usr/share/sounds/freedesktop/stereo/message-new-instant.ogg)

    Path to sound file for bell on getting successful recognition result

  • ~timeout_signal (String, default: /usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg)

    Path to sound file for bell on timeout for recognition

  • ~continuous (Bool, default: False)

    Selecting to use topic or service. By default, service is used.

  • ~auto_start (Bool, default: True)

    Starting the speech recognition when launching.

  • ~self_cancellation (Bool, default: True)

    Whether the node recognize the sound heard when ~tts_action_names is running or not.

    This options is for ignoring self voice sounds from recognition.

  • ~tts_action_names (List[String], default: ['sound_play'])

    Text-to-speech action name for self cancellation.

    The node ignores the voice heard when these Text-to-speech action is running.

  • ~tts_tolerance (Float, default: 1.0)

    Tolerance seconds for self cancellation.

    The node ignores the voice with this tolerance seconds after ~tts_action_names finish running.

  • ~google_key (String, default: None)

    Auth Key for Google API. If None, use public key. (No guarantee to be blocked.)
    This is valid only if ~engine is Google.

  • ~google_cloud_credentials_json (String, default: None)

    Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if ~engine is GoogleCloud.

  • ~google_cloud_preferred_phrases ([String], default: None)

    Preferred phrases parameters. This is valid only if ~engine is GoogleCloud.

  • ~bing_key (String, default: None)

    Auth key for Bing API.
    This is valid only if ~engine is bing.

  • ~vosk_model_path (String, default: None)

    Path to trainded model for Vosk API. This is valid only if ~engine is Vosk.

    If en-US or ja is selected as ~language, you do not need to specify the path. To load other models, please download them from Model list.

Author

Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»

CHANGELOG

Changelog for package ros_speech_recognition

2.1.29 (2025-01-05)

  • [doc] fix typo in jsk_3rdparty/ros_speech_recognition/README.md (#499)
  • Contributors: Yukina Iwata

2.1.28 (2023-07-24)

2.1.27 (2023-06-24)

  • fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
  • Contributors: Kei Okada

2.1.26 (2023-06-14)

  • add LICENSE files (#476)
  • Contributors: Kei Okada

2.1.25 (2023-06-08)

  • [ros_speech_recognition] Add vosk engine (#474)
  • Pr/use sound themes freedesktop (#472)
  • add test to check if ros node is loadable (#463)
  • add self.conf_thresh in __init_ function (#457)
  • [ros_speech_recognition] add ubuntu-sounds dependency (#453)
  • [ros_speech_recognition] Return if result is empty (#443)
  • [ros_speece_recognition] Set confidence value of google (#434)
  • [ros_speech_recognition] add parrotry.launch (#414)
  • [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
  • [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
  • [ros_speech_recognition] add self cancellation for speech recogntion (#413)
  • [#405 and #410] Fix CI (#415)
  • add ROS interface for https://cloud.google.com/natural-language (#304)
  • GithubAction: add test for aarch64(melodic) / indigo (arm64) (#365)
    • pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
  • Explicit python interpreter in catkin_virtualenv (#367)
  • .github/workflow: integrate all yaml to one (#338)
  • [ros_speech_recognition] Fixed the behavior of launch file (#336)
  • [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
  • [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
  • Enable sound play flag (#315)
  • Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura

2.1.24 (2021-07-26)

2.1.23 (2021-07-21)

2.1.22 (2021-06-10)

  • enable to change topic name from speech_recognition.launch (#254)
  • support SpeakerDiarization, see https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative (#244)
    • [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
    • more exception message for self.recognize
  • Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
  • Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa

2.1.21 (2020-08-19)

2.1.20 (2020-08-07)

2.1.19 (2020-07-21)

2.1.18 (2020-07-20)

  • Fix for noetic (#200)
    • fix 2to3, with print, raise, exception
  • [ros_speech_recognition] Enable multi channel audio recognition (#198)
    • adjust type code to the CPU platform
    • replace rosparam name: channels -> n_channel
    • add rosparam description to README
    • enable multi channel audio recognition
  • Add args to ros_speech_recognition (#197)
    • Add flac as run_depend for SpeechRecognition pip package
    • Use catkin_virtualenv to use SpeechRecognition pip package
    • Add arguments and params to pass rostest
    • Add test for ros_speech_recognition
    • add args to launch
    • add pip install to tutorials
    • add param description to README
  • Contributors: Kei Okada, Naoya Yamaguchi

2.1.17 (2020-04-16)

2.1.16 (2020-04-16)

2.1.15 (2019-12-12)

2.1.14 (2019-11-21)

  • set SoundRequest.volume for kinetic (#173)
  • Contributors: Kei Okada

2.1.13 (2019-07-10)

2.1.12 (2019-05-25)

  • fixes GoogleCloud auth (#158)
  • Contributors: jonasius

2.1.11 (2018-08-29)

2.1.10 (2018-04-25)

2.1.9 (2018-04-24)

2.1.8 (2018-04-17)

2.1.7 (2018-04-09)

2.1.6 (2017-11-21)

2.1.5 (2017-11-20)

  • ros_speech_recognition: add continuous mode (#127)
  • ros_speech_recognition: add README (#123)
  • add ros_speech_recognition package (#121)
  • Contributors: Yuki Furuta

2.1.4 (2017-07-16)

2.1.3 (2017-07-07)

2.1.2 (2017-07-06)

2.1.1 (2017-07-05)

2.1.0 (2017-07-02)

2.0.20 (2017-05-09)

2.0.19 (2017-02-22)

2.0.18 (2016-10-28)

2.0.17 (2016-10-22)

2.0.16 (2016-10-17)

2.0.15 (2016-10-16)

2.0.14 (2016-03-20)

2.0.13 (2015-12-15)

2.0.12 (2015-11-26)

2.0.11 (2015-10-07 14:16)

2.0.10 (2015-10-07 12:47)

2.0.9 (2015-09-26)

2.0.8 (2015-09-15)

2.0.7 (2015-09-14)

2.0.6 (2015-09-08)

2.0.5 (2015-08-23)

2.0.4 (2015-08-18)

2.0.3 (2015-08-01)

2.0.2 (2015-06-29)

2.0.1 (2015-06-19 21:21)

2.0.0 (2015-06-19 10:41)

1.0.71 (2015-05-17)

1.0.70 (2015-05-08)

1.0.69 (2015-05-05 12:28)

1.0.68 (2015-05-05 09:49)

1.0.67 (2015-05-03)

1.0.66 (2015-04-03)

1.0.65 (2015-04-02)

1.0.64 (2015-03-29)

1.0.63 (2015-02-19)

1.0.62 (2015-02-17)

1.0.61 (2015-02-11)

1.0.60 (2015-02-03 10:12)

1.0.59 (2015-02-03 04:05)

1.0.58 (2015-01-07)

1.0.57 (2014-12-23)

1.0.56 (2014-12-17)

1.0.55 (2014-12-09)

1.0.54 (2014-11-15)

1.0.53 (2014-11-01)

1.0.52 (2014-10-23)

1.0.51 (2014-10-20 16:01)

1.0.50 (2014-10-20 01:50)

1.0.49 (2014-10-13)

1.0.48 (2014-10-12)

1.0.47 (2014-10-08)

1.0.46 (2014-10-03)

1.0.45 (2014-09-29)

1.0.44 (2014-09-26 09:17)

1.0.43 (2014-09-26 01:08)

1.0.42 (2014-09-25)

1.0.41 (2014-09-23)

1.0.40 (2014-09-19)

1.0.39 (2014-09-17)

1.0.38 (2014-09-13)

1.0.37 (2014-09-08)

1.0.36 (2014-09-01)

1.0.35 (2014-08-16)

1.0.34 (2014-08-14)

1.0.33 (2014-07-28)

1.0.32 (2014-07-26)

1.0.31 (2014-07-23)

1.0.30 (2014-07-15)

1.0.29 (2014-07-02)

1.0.28 (2014-06-24)

1.0.27 (2014-06-10)

1.0.26 (2014-05-30)

1.0.25 (2014-05-26)

1.0.24 (2014-05-24)

1.0.23 (2014-05-23)

1.0.22 (2014-05-22)

1.0.21 (2014-05-20)

1.0.20 (2014-05-09)

1.0.19 (2014-05-06)

1.0.18 (2014-05-04)

1.0.17 (2014-04-20)

1.0.16 (2014-04-19 23:29)

1.0.15 (2014-04-19 20:19)

1.0.14 (2014-04-19 12:52)

1.0.13 (2014-04-19 11:06)

1.0.12 (2014-04-18 16:58)

1.0.11 (2014-04-18 08:18)

1.0.10 (2014-04-17)

1.0.9 (2014-04-12)

1.0.8 (2014-04-11)

1.0.7 (2014-04-10)

1.0.6 (2014-04-07)

1.0.5 (2014-03-31)

1.0.4 (2014-03-29)

1.0.3 (2014-03-19)

1.0.2 (2014-03-12)

1.0.1 (2014-03-07)

1.0.0 (2014-03-05)

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Launch files

  • launch/speech_recognition.launch
      • launch_sound_play [default: true] — Launch sound_play node to speak
      • launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
      • audio_topic [default: /audio] — Name of audio topic captured from microphone
      • voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
      • n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
      • engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
      • language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
      • continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
      • auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
      • self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
      • tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
      • tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
  • launch/parrotry.launch
      • use_google [default: true]
      • language [default: en-US]
      • confidence_threshold [default: 0.8]

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros_speech_recognition at Robotics Stack Exchange

Package Summary

Tags No category tags.
Version 2.1.29
License BSD
Build type CATKIN
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/jsk-ros-pkg/jsk_3rdparty.git
VCS Type git
VCS Version master
Last Updated 2025-01-09
Dev Status DEVELOPED
CI status No Continuous Integration
Released RELEASED
Tags No category tags.
Contributing Help Wanted (0)
Good First Issues (0)
Pull Requests to Review (0)

Package Description

ROS wrapper for Python SpeechRecognition library

Additional Links

Maintainers

  • Yuki Furuta

Authors

  • Yuki Furuta

ros_speech_recognition

A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.

Tutorials

Normal tutorial

  1. Install this package and SpeechReconition
  sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
  
  1. Launch speech recognition node
  roslaunch ros_speech_recognition speech_recognition.launch
  
  1. Echo /speech_to_text
  rostopic echo /speech_to_text
  # you can get the recognition result
  

Parrotry tutorial

Parrotry mean オウム返し in Japanese

# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP

speech_recognition_node.py Interface

Publishing Topics

  • ~voice_topic (speech_recognition_msgs/SpeechRecognitionCandidates)

    Speech recognition candidates topic name.

    Topic name is set by parameter ~voice_topic, and default value is speech_to_text.

  • sound_play (sound_play/SoundRequestAction)

    Action client to play sound on events. If the action server is not available or ~enable_sound_effect is False, no sound is played.

Subscribing Topics

  • ~audio_topic (audio_common_msgs/AudioData)

    Audio stream data to be recognized.

    Topis name is set by parameter ~audio_topic and default value is audio.

Advertising Services

  • speech_recognition (speech_recognition_msgs/SpeechRecognition)

    Service for speech recognition

  • speech_recognition/start (std_srvs/Empty)

    Start service for speech recognition

    This service is available when parameter ~contiunous is True.

  • speech_recognition/start (std_srvs/Empty)

    Stop service for speech recognition

    This service is available when parameter ~contiunous is True.

Parameters

  • ~voice_topic (String, default: speech_to_text)

    Publishing voice topic name

  • ~audio_topic (String, default: audio)

    Subscribing audio topic name

  • ~enable_sound_effect (Bool, default: True)

    Flag to enable or disable sound to play sound on recognition.

  • ~language (String, default: en-US)

    Language to be recognized

  • ~engine (Enum[String], default: Google)

    Speech-to-text engine (To see full options use dynamic_reconfigure)

  • ~energy_threshold (Double, default: 300)

    Threshold for Voice activity detection

  • ~dynamic_energy_threshold (Bool, default: True)

    Adaptive estimation for energy_threshold

  • ~dynamic_energy_adjustment_damping (Double, default: 0.15)

    Damping threshold for dynamic VAD

  • ~dynamic_energy_ratio (Double, default: 1.5)

    Energy ratio for dynamic VAD

  • ~pause_threshold (Double, default: 0.8)

    Seconds of non-speaking audio before a phrase is considered complete

  • ~operation_timeout (Double, default: 0.0)

    Seconds after an internal operation (e.g., an API request) starts before it times out

  • ~listen_timeout (Double, default: 0.0)

    The maximum number of seconds that this will wait for a phrase to start before giving up

  • ~phrase_time_limit (Double, default: 10.0)

    The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached

  • ~phrase_threshold (Double, default: 0.3)

    Minimum seconds of speaking audio before we consider the speaking audio a phrase

  • ~non_speaking_duration (Double, default: 0.5)

    Seconds of non-speaking audio to keep on both sides of the recording

  • ~duration (Double, default: 10.0)

    Seconds of waiting for speech

  • ~depth (Int, default: 16)

    Depth of audio signal

  • ~n_channel (Int, default: 1)

    Total number of channels in audio data (e.g. 1: mono, 2: stereo)

  • ~sample_rate (Int, default: 16000)

    Sample rate of audio signal

  • ~buffer_size (Int, default: 10240)

    Maximum buffer size to store audio data for speech recognition

  • ~start_signal (String, default: /usr/share/sounds/freedesktop/stereo/bell.ogg)

    Path to sound file for bell on the start of audio caption

  • ~recognized_signal (String, default: /usr/share/sounds/freedesktop/stereo/message.ogg)

    Path to sound file for bell on the end of audio caption

  • ~success_signal (String, default: /usr/share/sounds/freedesktop/stereo/message-new-instant.ogg)

    Path to sound file for bell on getting successful recognition result

  • ~timeout_signal (String, default: /usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg)

    Path to sound file for bell on timeout for recognition

  • ~continuous (Bool, default: False)

    Selecting to use topic or service. By default, service is used.

  • ~auto_start (Bool, default: True)

    Starting the speech recognition when launching.

  • ~self_cancellation (Bool, default: True)

    Whether the node recognize the sound heard when ~tts_action_names is running or not.

    This options is for ignoring self voice sounds from recognition.

  • ~tts_action_names (List[String], default: ['sound_play'])

    Text-to-speech action name for self cancellation.

    The node ignores the voice heard when these Text-to-speech action is running.

  • ~tts_tolerance (Float, default: 1.0)

    Tolerance seconds for self cancellation.

    The node ignores the voice with this tolerance seconds after ~tts_action_names finish running.

  • ~google_key (String, default: None)

    Auth Key for Google API. If None, use public key. (No guarantee to be blocked.)
    This is valid only if ~engine is Google.

  • ~google_cloud_credentials_json (String, default: None)

    Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if ~engine is GoogleCloud.

  • ~google_cloud_preferred_phrases ([String], default: None)

    Preferred phrases parameters. This is valid only if ~engine is GoogleCloud.

  • ~bing_key (String, default: None)

    Auth key for Bing API.
    This is valid only if ~engine is bing.

  • ~vosk_model_path (String, default: None)

    Path to trainded model for Vosk API. This is valid only if ~engine is Vosk.

    If en-US or ja is selected as ~language, you do not need to specify the path. To load other models, please download them from Model list.

Author

Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»

CHANGELOG

Changelog for package ros_speech_recognition

2.1.29 (2025-01-05)

  • [doc] fix typo in jsk_3rdparty/ros_speech_recognition/README.md (#499)
  • Contributors: Yukina Iwata

2.1.28 (2023-07-24)

2.1.27 (2023-06-24)

  • fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
  • Contributors: Kei Okada

2.1.26 (2023-06-14)

  • add LICENSE files (#476)
  • Contributors: Kei Okada

2.1.25 (2023-06-08)

  • [ros_speech_recognition] Add vosk engine (#474)
  • Pr/use sound themes freedesktop (#472)
  • add test to check if ros node is loadable (#463)
  • add self.conf_thresh in __init_ function (#457)
  • [ros_speech_recognition] add ubuntu-sounds dependency (#453)
  • [ros_speech_recognition] Return if result is empty (#443)
  • [ros_speece_recognition] Set confidence value of google (#434)
  • [ros_speech_recognition] add parrotry.launch (#414)
  • [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
  • [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
  • [ros_speech_recognition] add self cancellation for speech recogntion (#413)
  • [#405 and #410] Fix CI (#415)
  • add ROS interface for https://cloud.google.com/natural-language (#304)
  • GithubAction: add test for aarch64(melodic) / indigo (arm64) (#365)
    • pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
  • Explicit python interpreter in catkin_virtualenv (#367)
  • .github/workflow: integrate all yaml to one (#338)
  • [ros_speech_recognition] Fixed the behavior of launch file (#336)
  • [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
  • [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
  • Enable sound play flag (#315)
  • Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura

2.1.24 (2021-07-26)

2.1.23 (2021-07-21)

2.1.22 (2021-06-10)

  • enable to change topic name from speech_recognition.launch (#254)
  • support SpeakerDiarization, see https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative (#244)
    • [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
    • more exception message for self.recognize
  • Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
  • Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa

2.1.21 (2020-08-19)

2.1.20 (2020-08-07)

2.1.19 (2020-07-21)

2.1.18 (2020-07-20)

  • Fix for noetic (#200)
    • fix 2to3, with print, raise, exception
  • [ros_speech_recognition] Enable multi channel audio recognition (#198)
    • adjust type code to the CPU platform
    • replace rosparam name: channels -> n_channel
    • add rosparam description to README
    • enable multi channel audio recognition
  • Add args to ros_speech_recognition (#197)
    • Add flac as run_depend for SpeechRecognition pip package
    • Use catkin_virtualenv to use SpeechRecognition pip package
    • Add arguments and params to pass rostest
    • Add test for ros_speech_recognition
    • add args to launch
    • add pip install to tutorials
    • add param description to README
  • Contributors: Kei Okada, Naoya Yamaguchi

2.1.17 (2020-04-16)

2.1.16 (2020-04-16)

2.1.15 (2019-12-12)

2.1.14 (2019-11-21)

  • set SoundRequest.volume for kinetic (#173)
  • Contributors: Kei Okada

2.1.13 (2019-07-10)

2.1.12 (2019-05-25)

  • fixes GoogleCloud auth (#158)
  • Contributors: jonasius

2.1.11 (2018-08-29)

2.1.10 (2018-04-25)

2.1.9 (2018-04-24)

2.1.8 (2018-04-17)

2.1.7 (2018-04-09)

2.1.6 (2017-11-21)

2.1.5 (2017-11-20)

  • ros_speech_recognition: add continuous mode (#127)
  • ros_speech_recognition: add README (#123)
  • add ros_speech_recognition package (#121)
  • Contributors: Yuki Furuta

2.1.4 (2017-07-16)

2.1.3 (2017-07-07)

2.1.2 (2017-07-06)

2.1.1 (2017-07-05)

2.1.0 (2017-07-02)

2.0.20 (2017-05-09)

2.0.19 (2017-02-22)

2.0.18 (2016-10-28)

2.0.17 (2016-10-22)

2.0.16 (2016-10-17)

2.0.15 (2016-10-16)

2.0.14 (2016-03-20)

2.0.13 (2015-12-15)

2.0.12 (2015-11-26)

2.0.11 (2015-10-07 14:16)

2.0.10 (2015-10-07 12:47)

2.0.9 (2015-09-26)

2.0.8 (2015-09-15)

2.0.7 (2015-09-14)

2.0.6 (2015-09-08)

2.0.5 (2015-08-23)

2.0.4 (2015-08-18)

2.0.3 (2015-08-01)

2.0.2 (2015-06-29)

2.0.1 (2015-06-19 21:21)

2.0.0 (2015-06-19 10:41)

1.0.71 (2015-05-17)

1.0.70 (2015-05-08)

1.0.69 (2015-05-05 12:28)

1.0.68 (2015-05-05 09:49)

1.0.67 (2015-05-03)

1.0.66 (2015-04-03)

1.0.65 (2015-04-02)

1.0.64 (2015-03-29)

1.0.63 (2015-02-19)

1.0.62 (2015-02-17)

1.0.61 (2015-02-11)

1.0.60 (2015-02-03 10:12)

1.0.59 (2015-02-03 04:05)

1.0.58 (2015-01-07)

1.0.57 (2014-12-23)

1.0.56 (2014-12-17)

1.0.55 (2014-12-09)

1.0.54 (2014-11-15)

1.0.53 (2014-11-01)

1.0.52 (2014-10-23)

1.0.51 (2014-10-20 16:01)

1.0.50 (2014-10-20 01:50)

1.0.49 (2014-10-13)

1.0.48 (2014-10-12)

1.0.47 (2014-10-08)

1.0.46 (2014-10-03)

1.0.45 (2014-09-29)

1.0.44 (2014-09-26 09:17)

1.0.43 (2014-09-26 01:08)

1.0.42 (2014-09-25)

1.0.41 (2014-09-23)

1.0.40 (2014-09-19)

1.0.39 (2014-09-17)

1.0.38 (2014-09-13)

1.0.37 (2014-09-08)

1.0.36 (2014-09-01)

1.0.35 (2014-08-16)

1.0.34 (2014-08-14)

1.0.33 (2014-07-28)

1.0.32 (2014-07-26)

1.0.31 (2014-07-23)

1.0.30 (2014-07-15)

1.0.29 (2014-07-02)

1.0.28 (2014-06-24)

1.0.27 (2014-06-10)

1.0.26 (2014-05-30)

1.0.25 (2014-05-26)

1.0.24 (2014-05-24)

1.0.23 (2014-05-23)

1.0.22 (2014-05-22)

1.0.21 (2014-05-20)

1.0.20 (2014-05-09)

1.0.19 (2014-05-06)

1.0.18 (2014-05-04)

1.0.17 (2014-04-20)

1.0.16 (2014-04-19 23:29)

1.0.15 (2014-04-19 20:19)

1.0.14 (2014-04-19 12:52)

1.0.13 (2014-04-19 11:06)

1.0.12 (2014-04-18 16:58)

1.0.11 (2014-04-18 08:18)

1.0.10 (2014-04-17)

1.0.9 (2014-04-12)

1.0.8 (2014-04-11)

1.0.7 (2014-04-10)

1.0.6 (2014-04-07)

1.0.5 (2014-03-31)

1.0.4 (2014-03-29)

1.0.3 (2014-03-19)

1.0.2 (2014-03-12)

1.0.1 (2014-03-07)

1.0.0 (2014-03-05)

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Launch files

  • launch/speech_recognition.launch
      • launch_sound_play [default: true] — Launch sound_play node to speak
      • launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
      • audio_topic [default: /audio] — Name of audio topic captured from microphone
      • voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
      • n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
      • engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
      • language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
      • continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
      • auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
      • self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
      • tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
      • tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
  • launch/parrotry.launch
      • use_google [default: true]
      • language [default: en-US]
      • confidence_threshold [default: 0.8]

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros_speech_recognition at Robotics Stack Exchange

Package Summary

Tags No category tags.
Version 2.1.29
License BSD
Build type CATKIN
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/jsk-ros-pkg/jsk_3rdparty.git
VCS Type git
VCS Version master
Last Updated 2025-01-09
Dev Status DEVELOPED
CI status Continuous Integration
Released RELEASED
Tags No category tags.
Contributing Help Wanted (0)
Good First Issues (0)
Pull Requests to Review (0)

Package Description

ROS wrapper for Python SpeechRecognition library

Additional Links

Maintainers

  • Yuki Furuta

Authors

  • Yuki Furuta

ros_speech_recognition

A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.

Tutorials

Normal tutorial

  1. Install this package and SpeechReconition
  sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
  
  1. Launch speech recognition node
  roslaunch ros_speech_recognition speech_recognition.launch
  
  1. Echo /speech_to_text
  rostopic echo /speech_to_text
  # you can get the recognition result
  

Parrotry tutorial

Parrotry mean オウム返し in Japanese

# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP

speech_recognition_node.py Interface

Publishing Topics

  • ~voice_topic (speech_recognition_msgs/SpeechRecognitionCandidates)

    Speech recognition candidates topic name.

    Topic name is set by parameter ~voice_topic, and default value is speech_to_text.

  • sound_play (sound_play/SoundRequestAction)

    Action client to play sound on events. If the action server is not available or ~enable_sound_effect is False, no sound is played.

Subscribing Topics

  • ~audio_topic (audio_common_msgs/AudioData)

    Audio stream data to be recognized.

    Topis name is set by parameter ~audio_topic and default value is audio.

Advertising Services

  • speech_recognition (speech_recognition_msgs/SpeechRecognition)

    Service for speech recognition

  • speech_recognition/start (std_srvs/Empty)

    Start service for speech recognition

    This service is available when parameter ~contiunous is True.

  • speech_recognition/start (std_srvs/Empty)

    Stop service for speech recognition

    This service is available when parameter ~contiunous is True.

Parameters

  • ~voice_topic (String, default: speech_to_text)

    Publishing voice topic name

  • ~audio_topic (String, default: audio)

    Subscribing audio topic name

  • ~enable_sound_effect (Bool, default: True)

    Flag to enable or disable sound to play sound on recognition.

  • ~language (String, default: en-US)

    Language to be recognized

  • ~engine (Enum[String], default: Google)

    Speech-to-text engine (To see full options use dynamic_reconfigure)

  • ~energy_threshold (Double, default: 300)

    Threshold for Voice activity detection

  • ~dynamic_energy_threshold (Bool, default: True)

    Adaptive estimation for energy_threshold

  • ~dynamic_energy_adjustment_damping (Double, default: 0.15)

    Damping threshold for dynamic VAD

  • ~dynamic_energy_ratio (Double, default: 1.5)

    Energy ratio for dynamic VAD

  • ~pause_threshold (Double, default: 0.8)

    Seconds of non-speaking audio before a phrase is considered complete

  • ~operation_timeout (Double, default: 0.0)

    Seconds after an internal operation (e.g., an API request) starts before it times out

  • ~listen_timeout (Double, default: 0.0)

    The maximum number of seconds that this will wait for a phrase to start before giving up

  • ~phrase_time_limit (Double, default: 10.0)

    The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached

  • ~phrase_threshold (Double, default: 0.3)

    Minimum seconds of speaking audio before we consider the speaking audio a phrase

  • ~non_speaking_duration (Double, default: 0.5)

    Seconds of non-speaking audio to keep on both sides of the recording

  • ~duration (Double, default: 10.0)

    Seconds of waiting for speech

  • ~depth (Int, default: 16)

    Depth of audio signal

  • ~n_channel (Int, default: 1)

    Total number of channels in audio data (e.g. 1: mono, 2: stereo)

  • ~sample_rate (Int, default: 16000)

    Sample rate of audio signal

  • ~buffer_size (Int, default: 10240)

    Maximum buffer size to store audio data for speech recognition

  • ~start_signal (String, default: /usr/share/sounds/freedesktop/stereo/bell.ogg)

    Path to sound file for bell on the start of audio caption

  • ~recognized_signal (String, default: /usr/share/sounds/freedesktop/stereo/message.ogg)

    Path to sound file for bell on the end of audio caption

  • ~success_signal (String, default: /usr/share/sounds/freedesktop/stereo/message-new-instant.ogg)

    Path to sound file for bell on getting successful recognition result

  • ~timeout_signal (String, default: /usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg)

    Path to sound file for bell on timeout for recognition

  • ~continuous (Bool, default: False)

    Selecting to use topic or service. By default, service is used.

  • ~auto_start (Bool, default: True)

    Starting the speech recognition when launching.

  • ~self_cancellation (Bool, default: True)

    Whether the node recognize the sound heard when ~tts_action_names is running or not.

    This options is for ignoring self voice sounds from recognition.

  • ~tts_action_names (List[String], default: ['sound_play'])

    Text-to-speech action name for self cancellation.

    The node ignores the voice heard when these Text-to-speech action is running.

  • ~tts_tolerance (Float, default: 1.0)

    Tolerance seconds for self cancellation.

    The node ignores the voice with this tolerance seconds after ~tts_action_names finish running.

  • ~google_key (String, default: None)

    Auth Key for Google API. If None, use public key. (No guarantee to be blocked.)
    This is valid only if ~engine is Google.

  • ~google_cloud_credentials_json (String, default: None)

    Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if ~engine is GoogleCloud.

  • ~google_cloud_preferred_phrases ([String], default: None)

    Preferred phrases parameters. This is valid only if ~engine is GoogleCloud.

  • ~bing_key (String, default: None)

    Auth key for Bing API.
    This is valid only if ~engine is bing.

  • ~vosk_model_path (String, default: None)

    Path to trainded model for Vosk API. This is valid only if ~engine is Vosk.

    If en-US or ja is selected as ~language, you do not need to specify the path. To load other models, please download them from Model list.

Author

Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»

CHANGELOG

Changelog for package ros_speech_recognition

2.1.29 (2025-01-05)

  • [doc] fix typo in jsk_3rdparty/ros_speech_recognition/README.md (#499)
  • Contributors: Yukina Iwata

2.1.28 (2023-07-24)

2.1.27 (2023-06-24)

  • fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
  • Contributors: Kei Okada

2.1.26 (2023-06-14)

  • add LICENSE files (#476)
  • Contributors: Kei Okada

2.1.25 (2023-06-08)

  • [ros_speech_recognition] Add vosk engine (#474)
  • Pr/use sound themes freedesktop (#472)
  • add test to check if ros node is loadable (#463)
  • add self.conf_thresh in __init_ function (#457)
  • [ros_speech_recognition] add ubuntu-sounds dependency (#453)
  • [ros_speech_recognition] Return if result is empty (#443)
  • [ros_speece_recognition] Set confidence value of google (#434)
  • [ros_speech_recognition] add parrotry.launch (#414)
  • [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
  • [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
  • [ros_speech_recognition] add self cancellation for speech recogntion (#413)
  • [#405 and #410] Fix CI (#415)
  • add ROS interface for https://cloud.google.com/natural-language (#304)
  • GithubAction: add test for aarch64(melodic) / indigo (arm64) (#365)
    • pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
  • Explicit python interpreter in catkin_virtualenv (#367)
  • .github/workflow: integrate all yaml to one (#338)
  • [ros_speech_recognition] Fixed the behavior of launch file (#336)
  • [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
  • [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
  • Enable sound play flag (#315)
  • Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura

2.1.24 (2021-07-26)

2.1.23 (2021-07-21)

2.1.22 (2021-06-10)

  • enable to change topic name from speech_recognition.launch (#254)
  • support SpeakerDiarization, see https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative (#244)
    • [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
    • more exception message for self.recognize
  • Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
  • Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa

2.1.21 (2020-08-19)

2.1.20 (2020-08-07)

2.1.19 (2020-07-21)

2.1.18 (2020-07-20)

  • Fix for noetic (#200)
    • fix 2to3, with print, raise, exception
  • [ros_speech_recognition] Enable multi channel audio recognition (#198)
    • adjust type code to the CPU platform
    • replace rosparam name: channels -> n_channel
    • add rosparam description to README
    • enable multi channel audio recognition
  • Add args to ros_speech_recognition (#197)
    • Add flac as run_depend for SpeechRecognition pip package
    • Use catkin_virtualenv to use SpeechRecognition pip package
    • Add arguments and params to pass rostest
    • Add test for ros_speech_recognition
    • add args to launch
    • add pip install to tutorials
    • add param description to README
  • Contributors: Kei Okada, Naoya Yamaguchi

2.1.17 (2020-04-16)

2.1.16 (2020-04-16)

2.1.15 (2019-12-12)

2.1.14 (2019-11-21)

  • set SoundRequest.volume for kinetic (#173)
  • Contributors: Kei Okada

2.1.13 (2019-07-10)

2.1.12 (2019-05-25)

  • fixes GoogleCloud auth (#158)
  • Contributors: jonasius

2.1.11 (2018-08-29)

2.1.10 (2018-04-25)

2.1.9 (2018-04-24)

2.1.8 (2018-04-17)

2.1.7 (2018-04-09)

2.1.6 (2017-11-21)

2.1.5 (2017-11-20)

  • ros_speech_recognition: add continuous mode (#127)
  • ros_speech_recognition: add README (#123)
  • add ros_speech_recognition package (#121)
  • Contributors: Yuki Furuta

2.1.4 (2017-07-16)

2.1.3 (2017-07-07)

2.1.2 (2017-07-06)

2.1.1 (2017-07-05)

2.1.0 (2017-07-02)

2.0.20 (2017-05-09)

2.0.19 (2017-02-22)

2.0.18 (2016-10-28)

2.0.17 (2016-10-22)

2.0.16 (2016-10-17)

2.0.15 (2016-10-16)

2.0.14 (2016-03-20)

2.0.13 (2015-12-15)

2.0.12 (2015-11-26)

2.0.11 (2015-10-07 14:16)

2.0.10 (2015-10-07 12:47)

2.0.9 (2015-09-26)

2.0.8 (2015-09-15)

2.0.7 (2015-09-14)

2.0.6 (2015-09-08)

2.0.5 (2015-08-23)

2.0.4 (2015-08-18)

2.0.3 (2015-08-01)

2.0.2 (2015-06-29)

2.0.1 (2015-06-19 21:21)

2.0.0 (2015-06-19 10:41)

1.0.71 (2015-05-17)

1.0.70 (2015-05-08)

1.0.69 (2015-05-05 12:28)

1.0.68 (2015-05-05 09:49)

1.0.67 (2015-05-03)

1.0.66 (2015-04-03)

1.0.65 (2015-04-02)

1.0.64 (2015-03-29)

1.0.63 (2015-02-19)

1.0.62 (2015-02-17)

1.0.61 (2015-02-11)

1.0.60 (2015-02-03 10:12)

1.0.59 (2015-02-03 04:05)

1.0.58 (2015-01-07)

1.0.57 (2014-12-23)

1.0.56 (2014-12-17)

1.0.55 (2014-12-09)

1.0.54 (2014-11-15)

1.0.53 (2014-11-01)

1.0.52 (2014-10-23)

1.0.51 (2014-10-20 16:01)

1.0.50 (2014-10-20 01:50)

1.0.49 (2014-10-13)

1.0.48 (2014-10-12)

1.0.47 (2014-10-08)

1.0.46 (2014-10-03)

1.0.45 (2014-09-29)

1.0.44 (2014-09-26 09:17)

1.0.43 (2014-09-26 01:08)

1.0.42 (2014-09-25)

1.0.41 (2014-09-23)

1.0.40 (2014-09-19)

1.0.39 (2014-09-17)

1.0.38 (2014-09-13)

1.0.37 (2014-09-08)

1.0.36 (2014-09-01)

1.0.35 (2014-08-16)

1.0.34 (2014-08-14)

1.0.33 (2014-07-28)

1.0.32 (2014-07-26)

1.0.31 (2014-07-23)

1.0.30 (2014-07-15)

1.0.29 (2014-07-02)

1.0.28 (2014-06-24)

1.0.27 (2014-06-10)

1.0.26 (2014-05-30)

1.0.25 (2014-05-26)

1.0.24 (2014-05-24)

1.0.23 (2014-05-23)

1.0.22 (2014-05-22)

1.0.21 (2014-05-20)

1.0.20 (2014-05-09)

1.0.19 (2014-05-06)

1.0.18 (2014-05-04)

1.0.17 (2014-04-20)

1.0.16 (2014-04-19 23:29)

1.0.15 (2014-04-19 20:19)

1.0.14 (2014-04-19 12:52)

1.0.13 (2014-04-19 11:06)

1.0.12 (2014-04-18 16:58)

1.0.11 (2014-04-18 08:18)

1.0.10 (2014-04-17)

1.0.9 (2014-04-12)

1.0.8 (2014-04-11)

1.0.7 (2014-04-10)

1.0.6 (2014-04-07)

1.0.5 (2014-03-31)

1.0.4 (2014-03-29)

1.0.3 (2014-03-19)

1.0.2 (2014-03-12)

1.0.1 (2014-03-07)

1.0.0 (2014-03-05)

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Launch files

  • launch/speech_recognition.launch
      • launch_sound_play [default: true] — Launch sound_play node to speak
      • launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
      • audio_topic [default: /audio] — Name of audio topic captured from microphone
      • voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
      • n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
      • engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
      • language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
      • continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
      • auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
      • self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
      • tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
      • tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
  • launch/parrotry.launch
      • use_google [default: true]
      • language [default: en-US]
      • confidence_threshold [default: 0.8]

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros_speech_recognition at Robotics Stack Exchange

Package Summary

Tags No category tags.
Version 2.1.29
License BSD
Build type CATKIN
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/jsk-ros-pkg/jsk_3rdparty.git
VCS Type git
VCS Version master
Last Updated 2025-01-09
Dev Status DEVELOPED
CI status Continuous Integration
Released RELEASED
Tags No category tags.
Contributing Help Wanted (0)
Good First Issues (0)
Pull Requests to Review (0)

Package Description

ROS wrapper for Python SpeechRecognition library

Additional Links

Maintainers

  • Yuki Furuta

Authors

  • Yuki Furuta

ros_speech_recognition

A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.

Tutorials

Normal tutorial

  1. Install this package and SpeechReconition
  sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
  
  1. Launch speech recognition node
  roslaunch ros_speech_recognition speech_recognition.launch
  
  1. Echo /speech_to_text
  rostopic echo /speech_to_text
  # you can get the recognition result
  

Parrotry tutorial

Parrotry mean オウム返し in Japanese

# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP

speech_recognition_node.py Interface

Publishing Topics

  • ~voice_topic (speech_recognition_msgs/SpeechRecognitionCandidates)

    Speech recognition candidates topic name.

    Topic name is set by parameter ~voice_topic, and default value is speech_to_text.

  • sound_play (sound_play/SoundRequestAction)

    Action client to play sound on events. If the action server is not available or ~enable_sound_effect is False, no sound is played.

Subscribing Topics

  • ~audio_topic (audio_common_msgs/AudioData)

    Audio stream data to be recognized.

    Topis name is set by parameter ~audio_topic and default value is audio.

Advertising Services

  • speech_recognition (speech_recognition_msgs/SpeechRecognition)

    Service for speech recognition

  • speech_recognition/start (std_srvs/Empty)

    Start service for speech recognition

    This service is available when parameter ~contiunous is True.

  • speech_recognition/start (std_srvs/Empty)

    Stop service for speech recognition

    This service is available when parameter ~contiunous is True.

Parameters

  • ~voice_topic (String, default: speech_to_text)

    Publishing voice topic name

  • ~audio_topic (String, default: audio)

    Subscribing audio topic name

  • ~enable_sound_effect (Bool, default: True)

    Flag to enable or disable sound to play sound on recognition.

  • ~language (String, default: en-US)

    Language to be recognized

  • ~engine (Enum[String], default: Google)

    Speech-to-text engine (To see full options use dynamic_reconfigure)

  • ~energy_threshold (Double, default: 300)

    Threshold for Voice activity detection

  • ~dynamic_energy_threshold (Bool, default: True)

    Adaptive estimation for energy_threshold

  • ~dynamic_energy_adjustment_damping (Double, default: 0.15)

    Damping threshold for dynamic VAD

  • ~dynamic_energy_ratio (Double, default: 1.5)

    Energy ratio for dynamic VAD

  • ~pause_threshold (Double, default: 0.8)

    Seconds of non-speaking audio before a phrase is considered complete

  • ~operation_timeout (Double, default: 0.0)

    Seconds after an internal operation (e.g., an API request) starts before it times out

  • ~listen_timeout (Double, default: 0.0)

    The maximum number of seconds that this will wait for a phrase to start before giving up

  • ~phrase_time_limit (Double, default: 10.0)

    The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached

  • ~phrase_threshold (Double, default: 0.3)

    Minimum seconds of speaking audio before we consider the speaking audio a phrase

  • ~non_speaking_duration (Double, default: 0.5)

    Seconds of non-speaking audio to keep on both sides of the recording

  • ~duration (Double, default: 10.0)

    Seconds of waiting for speech

  • ~depth (Int, default: 16)

    Depth of audio signal

  • ~n_channel (Int, default: 1)

    Total number of channels in audio data (e.g. 1: mono, 2: stereo)

  • ~sample_rate (Int, default: 16000)

    Sample rate of audio signal

  • ~buffer_size (Int, default: 10240)

    Maximum buffer size to store audio data for speech recognition

  • ~start_signal (String, default: /usr/share/sounds/freedesktop/stereo/bell.ogg)

    Path to sound file for bell on the start of audio caption

  • ~recognized_signal (String, default: /usr/share/sounds/freedesktop/stereo/message.ogg)

    Path to sound file for bell on the end of audio caption

  • ~success_signal (String, default: /usr/share/sounds/freedesktop/stereo/message-new-instant.ogg)

    Path to sound file for bell on getting successful recognition result

  • ~timeout_signal (String, default: /usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg)

    Path to sound file for bell on timeout for recognition

  • ~continuous (Bool, default: False)

    Selecting to use topic or service. By default, service is used.

  • ~auto_start (Bool, default: True)

    Starting the speech recognition when launching.

  • ~self_cancellation (Bool, default: True)

    Whether the node recognize the sound heard when ~tts_action_names is running or not.

    This options is for ignoring self voice sounds from recognition.

  • ~tts_action_names (List[String], default: ['sound_play'])

    Text-to-speech action name for self cancellation.

    The node ignores the voice heard when these Text-to-speech action is running.

  • ~tts_tolerance (Float, default: 1.0)

    Tolerance seconds for self cancellation.

    The node ignores the voice with this tolerance seconds after ~tts_action_names finish running.

  • ~google_key (String, default: None)

    Auth Key for Google API. If None, use public key. (No guarantee to be blocked.)
    This is valid only if ~engine is Google.

  • ~google_cloud_credentials_json (String, default: None)

    Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if ~engine is GoogleCloud.

  • ~google_cloud_preferred_phrases ([String], default: None)

    Preferred phrases parameters. This is valid only if ~engine is GoogleCloud.

  • ~bing_key (String, default: None)

    Auth key for Bing API.
    This is valid only if ~engine is bing.

  • ~vosk_model_path (String, default: None)

    Path to trainded model for Vosk API. This is valid only if ~engine is Vosk.

    If en-US or ja is selected as ~language, you do not need to specify the path. To load other models, please download them from Model list.

Author

Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»

CHANGELOG

Changelog for package ros_speech_recognition

2.1.29 (2025-01-05)

  • [doc] fix typo in jsk_3rdparty/ros_speech_recognition/README.md (#499)
  • Contributors: Yukina Iwata

2.1.28 (2023-07-24)

2.1.27 (2023-06-24)

  • fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
  • Contributors: Kei Okada

2.1.26 (2023-06-14)

  • add LICENSE files (#476)
  • Contributors: Kei Okada

2.1.25 (2023-06-08)

  • [ros_speech_recognition] Add vosk engine (#474)
  • Pr/use sound themes freedesktop (#472)
  • add test to check if ros node is loadable (#463)
  • add self.conf_thresh in __init_ function (#457)
  • [ros_speech_recognition] add ubuntu-sounds dependency (#453)
  • [ros_speech_recognition] Return if result is empty (#443)
  • [ros_speece_recognition] Set confidence value of google (#434)
  • [ros_speech_recognition] add parrotry.launch (#414)
  • [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
  • [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
  • [ros_speech_recognition] add self cancellation for speech recogntion (#413)
  • [#405 and #410] Fix CI (#415)
  • add ROS interface for https://cloud.google.com/natural-language (#304)
  • GithubAction: add test for aarch64(melodic) / indigo (arm64) (#365)
    • pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
  • Explicit python interpreter in catkin_virtualenv (#367)
  • .github/workflow: integrate all yaml to one (#338)
  • [ros_speech_recognition] Fixed the behavior of launch file (#336)
  • [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
  • [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
  • Enable sound play flag (#315)
  • Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura

2.1.24 (2021-07-26)

2.1.23 (2021-07-21)

2.1.22 (2021-06-10)

  • enable to change topic name from speech_recognition.launch (#254)
  • support SpeakerDiarization, see https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative (#244)
    • [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
    • more exception message for self.recognize
  • Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
  • Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa

2.1.21 (2020-08-19)

2.1.20 (2020-08-07)

2.1.19 (2020-07-21)

2.1.18 (2020-07-20)

  • Fix for noetic (#200)
    • fix 2to3, with print, raise, exception
  • [ros_speech_recognition] Enable multi channel audio recognition (#198)
    • adjust type code to the CPU platform
    • replace rosparam name: channels -> n_channel
    • add rosparam description to README
    • enable multi channel audio recognition
  • Add args to ros_speech_recognition (#197)
    • Add flac as run_depend for SpeechRecognition pip package
    • Use catkin_virtualenv to use SpeechRecognition pip package
    • Add arguments and params to pass rostest
    • Add test for ros_speech_recognition
    • add args to launch
    • add pip install to tutorials
    • add param description to README
  • Contributors: Kei Okada, Naoya Yamaguchi

2.1.17 (2020-04-16)

2.1.16 (2020-04-16)

2.1.15 (2019-12-12)

2.1.14 (2019-11-21)

  • set SoundRequest.volume for kinetic (#173)
  • Contributors: Kei Okada

2.1.13 (2019-07-10)

2.1.12 (2019-05-25)

  • fixes GoogleCloud auth (#158)
  • Contributors: jonasius

2.1.11 (2018-08-29)

2.1.10 (2018-04-25)

2.1.9 (2018-04-24)

2.1.8 (2018-04-17)

2.1.7 (2018-04-09)

2.1.6 (2017-11-21)

2.1.5 (2017-11-20)

  • ros_speech_recognition: add continuous mode (#127)
  • ros_speech_recognition: add README (#123)
  • add ros_speech_recognition package (#121)
  • Contributors: Yuki Furuta

2.1.4 (2017-07-16)

2.1.3 (2017-07-07)

2.1.2 (2017-07-06)

2.1.1 (2017-07-05)

2.1.0 (2017-07-02)

2.0.20 (2017-05-09)

2.0.19 (2017-02-22)

2.0.18 (2016-10-28)

2.0.17 (2016-10-22)

2.0.16 (2016-10-17)

2.0.15 (2016-10-16)

2.0.14 (2016-03-20)

2.0.13 (2015-12-15)

2.0.12 (2015-11-26)

2.0.11 (2015-10-07 14:16)

2.0.10 (2015-10-07 12:47)

2.0.9 (2015-09-26)

2.0.8 (2015-09-15)

2.0.7 (2015-09-14)

2.0.6 (2015-09-08)

2.0.5 (2015-08-23)

2.0.4 (2015-08-18)

2.0.3 (2015-08-01)

2.0.2 (2015-06-29)

2.0.1 (2015-06-19 21:21)

2.0.0 (2015-06-19 10:41)

1.0.71 (2015-05-17)

1.0.70 (2015-05-08)

1.0.69 (2015-05-05 12:28)

1.0.68 (2015-05-05 09:49)

1.0.67 (2015-05-03)

1.0.66 (2015-04-03)

1.0.65 (2015-04-02)

1.0.64 (2015-03-29)

1.0.63 (2015-02-19)

1.0.62 (2015-02-17)

1.0.61 (2015-02-11)

1.0.60 (2015-02-03 10:12)

1.0.59 (2015-02-03 04:05)

1.0.58 (2015-01-07)

1.0.57 (2014-12-23)

1.0.56 (2014-12-17)

1.0.55 (2014-12-09)

1.0.54 (2014-11-15)

1.0.53 (2014-11-01)

1.0.52 (2014-10-23)

1.0.51 (2014-10-20 16:01)

1.0.50 (2014-10-20 01:50)

1.0.49 (2014-10-13)

1.0.48 (2014-10-12)

1.0.47 (2014-10-08)

1.0.46 (2014-10-03)

1.0.45 (2014-09-29)

1.0.44 (2014-09-26 09:17)

1.0.43 (2014-09-26 01:08)

1.0.42 (2014-09-25)

1.0.41 (2014-09-23)

1.0.40 (2014-09-19)

1.0.39 (2014-09-17)

1.0.38 (2014-09-13)

1.0.37 (2014-09-08)

1.0.36 (2014-09-01)

1.0.35 (2014-08-16)

1.0.34 (2014-08-14)

1.0.33 (2014-07-28)

1.0.32 (2014-07-26)

1.0.31 (2014-07-23)

1.0.30 (2014-07-15)

1.0.29 (2014-07-02)

1.0.28 (2014-06-24)

1.0.27 (2014-06-10)

1.0.26 (2014-05-30)

1.0.25 (2014-05-26)

1.0.24 (2014-05-24)

1.0.23 (2014-05-23)

1.0.22 (2014-05-22)

1.0.21 (2014-05-20)

1.0.20 (2014-05-09)

1.0.19 (2014-05-06)

1.0.18 (2014-05-04)

1.0.17 (2014-04-20)

1.0.16 (2014-04-19 23:29)

1.0.15 (2014-04-19 20:19)

1.0.14 (2014-04-19 12:52)

1.0.13 (2014-04-19 11:06)

1.0.12 (2014-04-18 16:58)

1.0.11 (2014-04-18 08:18)

1.0.10 (2014-04-17)

1.0.9 (2014-04-12)

1.0.8 (2014-04-11)

1.0.7 (2014-04-10)

1.0.6 (2014-04-07)

1.0.5 (2014-03-31)

1.0.4 (2014-03-29)

1.0.3 (2014-03-19)

1.0.2 (2014-03-12)

1.0.1 (2014-03-07)

1.0.0 (2014-03-05)

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Launch files

  • launch/speech_recognition.launch
      • launch_sound_play [default: true] — Launch sound_play node to speak
      • launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
      • audio_topic [default: /audio] — Name of audio topic captured from microphone
      • voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
      • n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
      • engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
      • language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
      • continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
      • auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
      • self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
      • tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
      • tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
  • launch/parrotry.launch
      • use_google [default: true]
      • language [default: en-US]
      • confidence_threshold [default: 0.8]

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros_speech_recognition at Robotics Stack Exchange

Package Summary

Tags No category tags.
Version 2.1.29
License BSD
Build type CATKIN
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/jsk-ros-pkg/jsk_3rdparty.git
VCS Type git
VCS Version master
Last Updated 2025-01-09
Dev Status DEVELOPED
CI status No Continuous Integration
Released RELEASED
Tags No category tags.
Contributing Help Wanted (0)
Good First Issues (0)
Pull Requests to Review (0)

Package Description

ROS wrapper for Python SpeechRecognition library

Additional Links

Maintainers

  • Yuki Furuta

Authors

  • Yuki Furuta

ros_speech_recognition

A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.

Tutorials

Normal tutorial

  1. Install this package and SpeechReconition
  sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
  
  1. Launch speech recognition node
  roslaunch ros_speech_recognition speech_recognition.launch
  
  1. Echo /speech_to_text
  rostopic echo /speech_to_text
  # you can get the recognition result
  

Parrotry tutorial

Parrotry mean オウム返し in Japanese

# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP

speech_recognition_node.py Interface

Publishing Topics

  • ~voice_topic (speech_recognition_msgs/SpeechRecognitionCandidates)

    Speech recognition candidates topic name.

    Topic name is set by parameter ~voice_topic, and default value is speech_to_text.

  • sound_play (sound_play/SoundRequestAction)

    Action client to play sound on events. If the action server is not available or ~enable_sound_effect is False, no sound is played.

Subscribing Topics

  • ~audio_topic (audio_common_msgs/AudioData)

    Audio stream data to be recognized.

    Topis name is set by parameter ~audio_topic and default value is audio.

Advertising Services

  • speech_recognition (speech_recognition_msgs/SpeechRecognition)

    Service for speech recognition

  • speech_recognition/start (std_srvs/Empty)

    Start service for speech recognition

    This service is available when parameter ~contiunous is True.

  • speech_recognition/start (std_srvs/Empty)

    Stop service for speech recognition

    This service is available when parameter ~contiunous is True.

Parameters

  • ~voice_topic (String, default: speech_to_text)

    Publishing voice topic name

  • ~audio_topic (String, default: audio)

    Subscribing audio topic name

  • ~enable_sound_effect (Bool, default: True)

    Flag to enable or disable sound to play sound on recognition.

  • ~language (String, default: en-US)

    Language to be recognized

  • ~engine (Enum[String], default: Google)

    Speech-to-text engine (To see full options use dynamic_reconfigure)

  • ~energy_threshold (Double, default: 300)

    Threshold for Voice activity detection

  • ~dynamic_energy_threshold (Bool, default: True)

    Adaptive estimation for energy_threshold

  • ~dynamic_energy_adjustment_damping (Double, default: 0.15)

    Damping threshold for dynamic VAD

  • ~dynamic_energy_ratio (Double, default: 1.5)

    Energy ratio for dynamic VAD

  • ~pause_threshold (Double, default: 0.8)

    Seconds of non-speaking audio before a phrase is considered complete

  • ~operation_timeout (Double, default: 0.0)

    Seconds after an internal operation (e.g., an API request) starts before it times out

  • ~listen_timeout (Double, default: 0.0)

    The maximum number of seconds that this will wait for a phrase to start before giving up

  • ~phrase_time_limit (Double, default: 10.0)

    The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached

  • ~phrase_threshold (Double, default: 0.3)

    Minimum seconds of speaking audio before we consider the speaking audio a phrase

  • ~non_speaking_duration (Double, default: 0.5)

    Seconds of non-speaking audio to keep on both sides of the recording

  • ~duration (Double, default: 10.0)

    Seconds of waiting for speech

  • ~depth (Int, default: 16)

    Depth of audio signal

  • ~n_channel (Int, default: 1)

    Total number of channels in audio data (e.g. 1: mono, 2: stereo)

  • ~sample_rate (Int, default: 16000)

    Sample rate of audio signal

  • ~buffer_size (Int, default: 10240)

    Maximum buffer size to store audio data for speech recognition

  • ~start_signal (String, default: /usr/share/sounds/freedesktop/stereo/bell.ogg)

    Path to sound file for bell on the start of audio caption

  • ~recognized_signal (String, default: /usr/share/sounds/freedesktop/stereo/message.ogg)

    Path to sound file for bell on the end of audio caption

  • ~success_signal (String, default: /usr/share/sounds/freedesktop/stereo/message-new-instant.ogg)

    Path to sound file for bell on getting successful recognition result

  • ~timeout_signal (String, default: /usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg)

    Path to sound file for bell on timeout for recognition

  • ~continuous (Bool, default: False)

    Selecting to use topic or service. By default, service is used.

  • ~auto_start (Bool, default: True)

    Starting the speech recognition when launching.

  • ~self_cancellation (Bool, default: True)

    Whether the node recognize the sound heard when ~tts_action_names is running or not.

    This options is for ignoring self voice sounds from recognition.

  • ~tts_action_names (List[String], default: ['sound_play'])

    Text-to-speech action name for self cancellation.

    The node ignores the voice heard when these Text-to-speech action is running.

  • ~tts_tolerance (Float, default: 1.0)

    Tolerance seconds for self cancellation.

    The node ignores the voice with this tolerance seconds after ~tts_action_names finish running.

  • ~google_key (String, default: None)

    Auth Key for Google API. If None, use public key. (No guarantee to be blocked.)
    This is valid only if ~engine is Google.

  • ~google_cloud_credentials_json (String, default: None)

    Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if ~engine is GoogleCloud.

  • ~google_cloud_preferred_phrases ([String], default: None)

    Preferred phrases parameters. This is valid only if ~engine is GoogleCloud.

  • ~bing_key (String, default: None)

    Auth key for Bing API.
    This is valid only if ~engine is bing.

  • ~vosk_model_path (String, default: None)

    Path to trainded model for Vosk API. This is valid only if ~engine is Vosk.

    If en-US or ja is selected as ~language, you do not need to specify the path. To load other models, please download them from Model list.

Author

Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»

CHANGELOG

Changelog for package ros_speech_recognition

2.1.29 (2025-01-05)

  • [doc] fix typo in jsk_3rdparty/ros_speech_recognition/README.md (#499)
  • Contributors: Yukina Iwata

2.1.28 (2023-07-24)

2.1.27 (2023-06-24)

  • fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
  • Contributors: Kei Okada

2.1.26 (2023-06-14)

  • add LICENSE files (#476)
  • Contributors: Kei Okada

2.1.25 (2023-06-08)

  • [ros_speech_recognition] Add vosk engine (#474)
  • Pr/use sound themes freedesktop (#472)
  • add test to check if ros node is loadable (#463)
  • add self.conf_thresh in __init_ function (#457)
  • [ros_speech_recognition] add ubuntu-sounds dependency (#453)
  • [ros_speech_recognition] Return if result is empty (#443)
  • [ros_speece_recognition] Set confidence value of google (#434)
  • [ros_speech_recognition] add parrotry.launch (#414)
  • [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
  • [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
  • [ros_speech_recognition] add self cancellation for speech recogntion (#413)
  • [#405 and #410] Fix CI (#415)
  • add ROS interface for https://cloud.google.com/natural-language (#304)
  • GithubAction: add test for aarch64(melodic) / indigo (arm64) (#365)
    • pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
  • Explicit python interpreter in catkin_virtualenv (#367)
  • .github/workflow: integrate all yaml to one (#338)
  • [ros_speech_recognition] Fixed the behavior of launch file (#336)
  • [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
  • [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
  • Enable sound play flag (#315)
  • Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura

2.1.24 (2021-07-26)

2.1.23 (2021-07-21)

2.1.22 (2021-06-10)

  • enable to change topic name from speech_recognition.launch (#254)
  • support SpeakerDiarization, see https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative (#244)
    • [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
    • more exception message for self.recognize
  • Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
  • Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa

2.1.21 (2020-08-19)

2.1.20 (2020-08-07)

2.1.19 (2020-07-21)

2.1.18 (2020-07-20)

  • Fix for noetic (#200)
    • fix 2to3, with print, raise, exception
  • [ros_speech_recognition] Enable multi channel audio recognition (#198)
    • adjust type code to the CPU platform
    • replace rosparam name: channels -> n_channel
    • add rosparam description to README
    • enable multi channel audio recognition
  • Add args to ros_speech_recognition (#197)
    • Add flac as run_depend for SpeechRecognition pip package
    • Use catkin_virtualenv to use SpeechRecognition pip package
    • Add arguments and params to pass rostest
    • Add test for ros_speech_recognition
    • add args to launch
    • add pip install to tutorials
    • add param description to README
  • Contributors: Kei Okada, Naoya Yamaguchi

2.1.17 (2020-04-16)

2.1.16 (2020-04-16)

2.1.15 (2019-12-12)

2.1.14 (2019-11-21)

  • set SoundRequest.volume for kinetic (#173)
  • Contributors: Kei Okada

2.1.13 (2019-07-10)

2.1.12 (2019-05-25)

  • fixes GoogleCloud auth (#158)
  • Contributors: jonasius

2.1.11 (2018-08-29)

2.1.10 (2018-04-25)

2.1.9 (2018-04-24)

2.1.8 (2018-04-17)

2.1.7 (2018-04-09)

2.1.6 (2017-11-21)

2.1.5 (2017-11-20)

  • ros_speech_recognition: add continuous mode (#127)
  • ros_speech_recognition: add README (#123)
  • add ros_speech_recognition package (#121)
  • Contributors: Yuki Furuta

2.1.4 (2017-07-16)

2.1.3 (2017-07-07)

2.1.2 (2017-07-06)

2.1.1 (2017-07-05)

2.1.0 (2017-07-02)

2.0.20 (2017-05-09)

2.0.19 (2017-02-22)

2.0.18 (2016-10-28)

2.0.17 (2016-10-22)

2.0.16 (2016-10-17)

2.0.15 (2016-10-16)

2.0.14 (2016-03-20)

2.0.13 (2015-12-15)

2.0.12 (2015-11-26)

2.0.11 (2015-10-07 14:16)

2.0.10 (2015-10-07 12:47)

2.0.9 (2015-09-26)

2.0.8 (2015-09-15)

2.0.7 (2015-09-14)

2.0.6 (2015-09-08)

2.0.5 (2015-08-23)

2.0.4 (2015-08-18)

2.0.3 (2015-08-01)

2.0.2 (2015-06-29)

2.0.1 (2015-06-19 21:21)

2.0.0 (2015-06-19 10:41)

1.0.71 (2015-05-17)

1.0.70 (2015-05-08)

1.0.69 (2015-05-05 12:28)

1.0.68 (2015-05-05 09:49)

1.0.67 (2015-05-03)

1.0.66 (2015-04-03)

1.0.65 (2015-04-02)

1.0.64 (2015-03-29)

1.0.63 (2015-02-19)

1.0.62 (2015-02-17)

1.0.61 (2015-02-11)

1.0.60 (2015-02-03 10:12)

1.0.59 (2015-02-03 04:05)

1.0.58 (2015-01-07)

1.0.57 (2014-12-23)

1.0.56 (2014-12-17)

1.0.55 (2014-12-09)

1.0.54 (2014-11-15)

1.0.53 (2014-11-01)

1.0.52 (2014-10-23)

1.0.51 (2014-10-20 16:01)

1.0.50 (2014-10-20 01:50)

1.0.49 (2014-10-13)

1.0.48 (2014-10-12)

1.0.47 (2014-10-08)

1.0.46 (2014-10-03)

1.0.45 (2014-09-29)

1.0.44 (2014-09-26 09:17)

1.0.43 (2014-09-26 01:08)

1.0.42 (2014-09-25)

1.0.41 (2014-09-23)

1.0.40 (2014-09-19)

1.0.39 (2014-09-17)

1.0.38 (2014-09-13)

1.0.37 (2014-09-08)

1.0.36 (2014-09-01)

1.0.35 (2014-08-16)

1.0.34 (2014-08-14)

1.0.33 (2014-07-28)

1.0.32 (2014-07-26)

1.0.31 (2014-07-23)

1.0.30 (2014-07-15)

1.0.29 (2014-07-02)

1.0.28 (2014-06-24)

1.0.27 (2014-06-10)

1.0.26 (2014-05-30)

1.0.25 (2014-05-26)

1.0.24 (2014-05-24)

1.0.23 (2014-05-23)

1.0.22 (2014-05-22)

1.0.21 (2014-05-20)

1.0.20 (2014-05-09)

1.0.19 (2014-05-06)

1.0.18 (2014-05-04)

1.0.17 (2014-04-20)

1.0.16 (2014-04-19 23:29)

1.0.15 (2014-04-19 20:19)

1.0.14 (2014-04-19 12:52)

1.0.13 (2014-04-19 11:06)

1.0.12 (2014-04-18 16:58)

1.0.11 (2014-04-18 08:18)

1.0.10 (2014-04-17)

1.0.9 (2014-04-12)

1.0.8 (2014-04-11)

1.0.7 (2014-04-10)

1.0.6 (2014-04-07)

1.0.5 (2014-03-31)

1.0.4 (2014-03-29)

1.0.3 (2014-03-19)

1.0.2 (2014-03-12)

1.0.1 (2014-03-07)

1.0.0 (2014-03-05)

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Launch files

  • launch/speech_recognition.launch
      • launch_sound_play [default: true] — Launch sound_play node to speak
      • launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
      • audio_topic [default: /audio] — Name of audio topic captured from microphone
      • voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
      • n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
      • engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
      • language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
      • continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
      • auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
      • self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
      • tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
      • tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
  • launch/parrotry.launch
      • use_google [default: true]
      • language [default: en-US]
      • confidence_threshold [default: 0.8]

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros_speech_recognition at Robotics Stack Exchange

Package Summary

Tags No category tags.
Version 2.1.29
License BSD
Build type CATKIN
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/jsk-ros-pkg/jsk_3rdparty.git
VCS Type git
VCS Version master
Last Updated 2025-01-09
Dev Status DEVELOPED
CI status Continuous Integration
Released RELEASED
Tags No category tags.
Contributing Help Wanted (0)
Good First Issues (0)
Pull Requests to Review (0)

Package Description

ROS wrapper for Python SpeechRecognition library

Additional Links

Maintainers

  • Yuki Furuta

Authors

  • Yuki Furuta

ros_speech_recognition

A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.

Tutorials

Normal tutorial

  1. Install this package and SpeechReconition
  sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
  
  1. Launch speech recognition node
  roslaunch ros_speech_recognition speech_recognition.launch
  
  1. Echo /speech_to_text
  rostopic echo /speech_to_text
  # you can get the recognition result
  

Parrotry tutorial

Parrotry mean オウム返し in Japanese

# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP

speech_recognition_node.py Interface

Publishing Topics

  • ~voice_topic (speech_recognition_msgs/SpeechRecognitionCandidates)

    Speech recognition candidates topic name.

    Topic name is set by parameter ~voice_topic, and default value is speech_to_text.

  • sound_play (sound_play/SoundRequestAction)

    Action client to play sound on events. If the action server is not available or ~enable_sound_effect is False, no sound is played.

Subscribing Topics

  • ~audio_topic (audio_common_msgs/AudioData)

    Audio stream data to be recognized.

    Topis name is set by parameter ~audio_topic and default value is audio.

Advertising Services

  • speech_recognition (speech_recognition_msgs/SpeechRecognition)

    Service for speech recognition

  • speech_recognition/start (std_srvs/Empty)

    Start service for speech recognition

    This service is available when parameter ~contiunous is True.

  • speech_recognition/start (std_srvs/Empty)

    Stop service for speech recognition

    This service is available when parameter ~contiunous is True.

Parameters

  • ~voice_topic (String, default: speech_to_text)

    Publishing voice topic name

  • ~audio_topic (String, default: audio)

    Subscribing audio topic name

  • ~enable_sound_effect (Bool, default: True)

    Flag to enable or disable sound to play sound on recognition.

  • ~language (String, default: en-US)

    Language to be recognized

  • ~engine (Enum[String], default: Google)

    Speech-to-text engine (To see full options use dynamic_reconfigure)

  • ~energy_threshold (Double, default: 300)

    Threshold for Voice activity detection

  • ~dynamic_energy_threshold (Bool, default: True)

    Adaptive estimation for energy_threshold

  • ~dynamic_energy_adjustment_damping (Double, default: 0.15)

    Damping threshold for dynamic VAD

  • ~dynamic_energy_ratio (Double, default: 1.5)

    Energy ratio for dynamic VAD

  • ~pause_threshold (Double, default: 0.8)

    Seconds of non-speaking audio before a phrase is considered complete

  • ~operation_timeout (Double, default: 0.0)

    Seconds after an internal operation (e.g., an API request) starts before it times out

  • ~listen_timeout (Double, default: 0.0)

    The maximum number of seconds that this will wait for a phrase to start before giving up

  • ~phrase_time_limit (Double, default: 10.0)

    The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached

  • ~phrase_threshold (Double, default: 0.3)

    Minimum seconds of speaking audio before we consider the speaking audio a phrase

  • ~non_speaking_duration (Double, default: 0.5)

    Seconds of non-speaking audio to keep on both sides of the recording

  • ~duration (Double, default: 10.0)

    Seconds of waiting for speech

  • ~depth (Int, default: 16)

    Depth of audio signal

  • ~n_channel (Int, default: 1)

    Total number of channels in audio data (e.g. 1: mono, 2: stereo)

  • ~sample_rate (Int, default: 16000)

    Sample rate of audio signal

  • ~buffer_size (Int, default: 10240)

    Maximum buffer size to store audio data for speech recognition

  • ~start_signal (String, default: /usr/share/sounds/freedesktop/stereo/bell.ogg)

    Path to sound file for bell on the start of audio caption

  • ~recognized_signal (String, default: /usr/share/sounds/freedesktop/stereo/message.ogg)

    Path to sound file for bell on the end of audio caption

  • ~success_signal (String, default: /usr/share/sounds/freedesktop/stereo/message-new-instant.ogg)

    Path to sound file for bell on getting successful recognition result

  • ~timeout_signal (String, default: /usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg)

    Path to sound file for bell on timeout for recognition

  • ~continuous (Bool, default: False)

    Selecting to use topic or service. By default, service is used.

  • ~auto_start (Bool, default: True)

    Starting the speech recognition when launching.

  • ~self_cancellation (Bool, default: True)

    Whether the node recognize the sound heard when ~tts_action_names is running or not.

    This options is for ignoring self voice sounds from recognition.

  • ~tts_action_names (List[String], default: ['sound_play'])

    Text-to-speech action name for self cancellation.

    The node ignores the voice heard when these Text-to-speech action is running.

  • ~tts_tolerance (Float, default: 1.0)

    Tolerance seconds for self cancellation.

    The node ignores the voice with this tolerance seconds after ~tts_action_names finish running.

  • ~google_key (String, default: None)

    Auth Key for Google API. If None, use public key. (No guarantee to be blocked.)
    This is valid only if ~engine is Google.

  • ~google_cloud_credentials_json (String, default: None)

    Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if ~engine is GoogleCloud.

  • ~google_cloud_preferred_phrases ([String], default: None)

    Preferred phrases parameters. This is valid only if ~engine is GoogleCloud.

  • ~bing_key (String, default: None)

    Auth key for Bing API.
    This is valid only if ~engine is bing.

  • ~vosk_model_path (String, default: None)

    Path to trainded model for Vosk API. This is valid only if ~engine is Vosk.

    If en-US or ja is selected as ~language, you do not need to specify the path. To load other models, please download them from Model list.

Author

Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»

CHANGELOG

Changelog for package ros_speech_recognition

2.1.29 (2025-01-05)

  • [doc] fix typo in jsk_3rdparty/ros_speech_recognition/README.md (#499)
  • Contributors: Yukina Iwata

2.1.28 (2023-07-24)

2.1.27 (2023-06-24)

  • fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
  • Contributors: Kei Okada

2.1.26 (2023-06-14)

  • add LICENSE files (#476)
  • Contributors: Kei Okada

2.1.25 (2023-06-08)

  • [ros_speech_recognition] Add vosk engine (#474)
  • Pr/use sound themes freedesktop (#472)
  • add test to check if ros node is loadable (#463)
  • add self.conf_thresh in __init_ function (#457)
  • [ros_speech_recognition] add ubuntu-sounds dependency (#453)
  • [ros_speech_recognition] Return if result is empty (#443)
  • [ros_speece_recognition] Set confidence value of google (#434)
  • [ros_speech_recognition] add parrotry.launch (#414)
  • [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
  • [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
  • [ros_speech_recognition] add self cancellation for speech recogntion (#413)
  • [#405 and #410] Fix CI (#415)
  • add ROS interface for https://cloud.google.com/natural-language (#304)
  • GithubAction: add test for aarch64(melodic) / indigo (arm64) (#365)
    • pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
  • Explicit python interpreter in catkin_virtualenv (#367)
  • .github/workflow: integrate all yaml to one (#338)
  • [ros_speech_recognition] Fixed the behavior of launch file (#336)
  • [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
  • [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
  • Enable sound play flag (#315)
  • Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura

2.1.24 (2021-07-26)

2.1.23 (2021-07-21)

2.1.22 (2021-06-10)

  • enable to change topic name from speech_recognition.launch (#254)
  • support SpeakerDiarization, see https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative (#244)
    • [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
    • more exception message for self.recognize
  • Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
  • Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa

2.1.21 (2020-08-19)

2.1.20 (2020-08-07)

2.1.19 (2020-07-21)

2.1.18 (2020-07-20)

  • Fix for noetic (#200)
    • fix 2to3, with print, raise, exception
  • [ros_speech_recognition] Enable multi channel audio recognition (#198)
    • adjust type code to the CPU platform
    • replace rosparam name: channels -> n_channel
    • add rosparam description to README
    • enable multi channel audio recognition
  • Add args to ros_speech_recognition (#197)
    • Add flac as run_depend for SpeechRecognition pip package
    • Use catkin_virtualenv to use SpeechRecognition pip package
    • Add arguments and params to pass rostest
    • Add test for ros_speech_recognition
    • add args to launch
    • add pip install to tutorials
    • add param description to README
  • Contributors: Kei Okada, Naoya Yamaguchi

2.1.17 (2020-04-16)

2.1.16 (2020-04-16)

2.1.15 (2019-12-12)

2.1.14 (2019-11-21)

  • set SoundRequest.volume for kinetic (#173)
  • Contributors: Kei Okada

2.1.13 (2019-07-10)

2.1.12 (2019-05-25)

  • fixes GoogleCloud auth (#158)
  • Contributors: jonasius

2.1.11 (2018-08-29)

2.1.10 (2018-04-25)

2.1.9 (2018-04-24)

2.1.8 (2018-04-17)

2.1.7 (2018-04-09)

2.1.6 (2017-11-21)

2.1.5 (2017-11-20)

  • ros_speech_recognition: add continuous mode (#127)
  • ros_speech_recognition: add README (#123)
  • add ros_speech_recognition package (#121)
  • Contributors: Yuki Furuta

2.1.4 (2017-07-16)

2.1.3 (2017-07-07)

2.1.2 (2017-07-06)

2.1.1 (2017-07-05)

2.1.0 (2017-07-02)

2.0.20 (2017-05-09)

2.0.19 (2017-02-22)

2.0.18 (2016-10-28)

2.0.17 (2016-10-22)

2.0.16 (2016-10-17)

2.0.15 (2016-10-16)

2.0.14 (2016-03-20)

2.0.13 (2015-12-15)

2.0.12 (2015-11-26)

2.0.11 (2015-10-07 14:16)

2.0.10 (2015-10-07 12:47)

2.0.9 (2015-09-26)

2.0.8 (2015-09-15)

2.0.7 (2015-09-14)

2.0.6 (2015-09-08)

2.0.5 (2015-08-23)

2.0.4 (2015-08-18)

2.0.3 (2015-08-01)

2.0.2 (2015-06-29)

2.0.1 (2015-06-19 21:21)

2.0.0 (2015-06-19 10:41)

1.0.71 (2015-05-17)

1.0.70 (2015-05-08)

1.0.69 (2015-05-05 12:28)

1.0.68 (2015-05-05 09:49)

1.0.67 (2015-05-03)

1.0.66 (2015-04-03)

1.0.65 (2015-04-02)

1.0.64 (2015-03-29)

1.0.63 (2015-02-19)

1.0.62 (2015-02-17)

1.0.61 (2015-02-11)

1.0.60 (2015-02-03 10:12)

1.0.59 (2015-02-03 04:05)

1.0.58 (2015-01-07)

1.0.57 (2014-12-23)

1.0.56 (2014-12-17)

1.0.55 (2014-12-09)

1.0.54 (2014-11-15)

1.0.53 (2014-11-01)

1.0.52 (2014-10-23)

1.0.51 (2014-10-20 16:01)

1.0.50 (2014-10-20 01:50)

1.0.49 (2014-10-13)

1.0.48 (2014-10-12)

1.0.47 (2014-10-08)

1.0.46 (2014-10-03)

1.0.45 (2014-09-29)

1.0.44 (2014-09-26 09:17)

1.0.43 (2014-09-26 01:08)

1.0.42 (2014-09-25)

1.0.41 (2014-09-23)

1.0.40 (2014-09-19)

1.0.39 (2014-09-17)

1.0.38 (2014-09-13)

1.0.37 (2014-09-08)

1.0.36 (2014-09-01)

1.0.35 (2014-08-16)

1.0.34 (2014-08-14)

1.0.33 (2014-07-28)

1.0.32 (2014-07-26)

1.0.31 (2014-07-23)

1.0.30 (2014-07-15)

1.0.29 (2014-07-02)

1.0.28 (2014-06-24)

1.0.27 (2014-06-10)

1.0.26 (2014-05-30)

1.0.25 (2014-05-26)

1.0.24 (2014-05-24)

1.0.23 (2014-05-23)

1.0.22 (2014-05-22)

1.0.21 (2014-05-20)

1.0.20 (2014-05-09)

1.0.19 (2014-05-06)

1.0.18 (2014-05-04)

1.0.17 (2014-04-20)

1.0.16 (2014-04-19 23:29)

1.0.15 (2014-04-19 20:19)

1.0.14 (2014-04-19 12:52)

1.0.13 (2014-04-19 11:06)

1.0.12 (2014-04-18 16:58)

1.0.11 (2014-04-18 08:18)

1.0.10 (2014-04-17)

1.0.9 (2014-04-12)

1.0.8 (2014-04-11)

1.0.7 (2014-04-10)

1.0.6 (2014-04-07)

1.0.5 (2014-03-31)

1.0.4 (2014-03-29)

1.0.3 (2014-03-19)

1.0.2 (2014-03-12)

1.0.1 (2014-03-07)

1.0.0 (2014-03-05)

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Launch files

  • launch/speech_recognition.launch
      • launch_sound_play [default: true] — Launch sound_play node to speak
      • launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
      • audio_topic [default: /audio] — Name of audio topic captured from microphone
      • voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
      • n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
      • engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
      • language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
      • continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
      • auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
      • self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
      • tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
      • tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
  • launch/parrotry.launch
      • use_google [default: true]
      • language [default: en-US]
      • confidence_threshold [default: 0.8]

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros_speech_recognition at Robotics Stack Exchange

Package Summary

Tags No category tags.
Version 2.1.29
License BSD
Build type CATKIN
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/jsk-ros-pkg/jsk_3rdparty.git
VCS Type git
VCS Version master
Last Updated 2025-01-09
Dev Status DEVELOPED
CI status Continuous Integration
Released RELEASED
Tags No category tags.
Contributing Help Wanted (0)
Good First Issues (0)
Pull Requests to Review (0)

Package Description

ROS wrapper for Python SpeechRecognition library

Additional Links

Maintainers

  • Yuki Furuta

Authors

  • Yuki Furuta

ros_speech_recognition

A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.

Tutorials

Normal tutorial

  1. Install this package and SpeechReconition
  sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
  
  1. Launch speech recognition node
  roslaunch ros_speech_recognition speech_recognition.launch
  
  1. Echo /speech_to_text
  rostopic echo /speech_to_text
  # you can get the recognition result
  

Parrotry tutorial

Parrotry mean オウム返し in Japanese

# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP

speech_recognition_node.py Interface

Publishing Topics

  • ~voice_topic (speech_recognition_msgs/SpeechRecognitionCandidates)

    Speech recognition candidates topic name.

    Topic name is set by parameter ~voice_topic, and default value is speech_to_text.

  • sound_play (sound_play/SoundRequestAction)

    Action client to play sound on events. If the action server is not available or ~enable_sound_effect is False, no sound is played.

Subscribing Topics

  • ~audio_topic (audio_common_msgs/AudioData)

    Audio stream data to be recognized.

    Topis name is set by parameter ~audio_topic and default value is audio.

Advertising Services

  • speech_recognition (speech_recognition_msgs/SpeechRecognition)

    Service for speech recognition

  • speech_recognition/start (std_srvs/Empty)

    Start service for speech recognition

    This service is available when parameter ~contiunous is True.

  • speech_recognition/start (std_srvs/Empty)

    Stop service for speech recognition

    This service is available when parameter ~contiunous is True.

Parameters

  • ~voice_topic (String, default: speech_to_text)

    Publishing voice topic name

  • ~audio_topic (String, default: audio)

    Subscribing audio topic name

  • ~enable_sound_effect (Bool, default: True)

    Flag to enable or disable sound to play sound on recognition.

  • ~language (String, default: en-US)

    Language to be recognized

  • ~engine (Enum[String], default: Google)

    Speech-to-text engine (To see full options use dynamic_reconfigure)

  • ~energy_threshold (Double, default: 300)

    Threshold for Voice activity detection

  • ~dynamic_energy_threshold (Bool, default: True)

    Adaptive estimation for energy_threshold

  • ~dynamic_energy_adjustment_damping (Double, default: 0.15)

    Damping threshold for dynamic VAD

  • ~dynamic_energy_ratio (Double, default: 1.5)

    Energy ratio for dynamic VAD

  • ~pause_threshold (Double, default: 0.8)

    Seconds of non-speaking audio before a phrase is considered complete

  • ~operation_timeout (Double, default: 0.0)

    Seconds after an internal operation (e.g., an API request) starts before it times out

  • ~listen_timeout (Double, default: 0.0)

    The maximum number of seconds that this will wait for a phrase to start before giving up

  • ~phrase_time_limit (Double, default: 10.0)

    The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached

  • ~phrase_threshold (Double, default: 0.3)

    Minimum seconds of speaking audio before we consider the speaking audio a phrase

  • ~non_speaking_duration (Double, default: 0.5)

    Seconds of non-speaking audio to keep on both sides of the recording

  • ~duration (Double, default: 10.0)

    Seconds of waiting for speech

  • ~depth (Int, default: 16)

    Depth of audio signal

  • ~n_channel (Int, default: 1)

    Total number of channels in audio data (e.g. 1: mono, 2: stereo)

  • ~sample_rate (Int, default: 16000)

    Sample rate of audio signal

  • ~buffer_size (Int, default: 10240)

    Maximum buffer size to store audio data for speech recognition

  • ~start_signal (String, default: /usr/share/sounds/freedesktop/stereo/bell.ogg)

    Path to sound file for bell on the start of audio caption

  • ~recognized_signal (String, default: /usr/share/sounds/freedesktop/stereo/message.ogg)

    Path to sound file for bell on the end of audio caption

  • ~success_signal (String, default: /usr/share/sounds/freedesktop/stereo/message-new-instant.ogg)

    Path to sound file for bell on getting successful recognition result

  • ~timeout_signal (String, default: /usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg)

    Path to sound file for bell on timeout for recognition

  • ~continuous (Bool, default: False)

    Selecting to use topic or service. By default, service is used.

  • ~auto_start (Bool, default: True)

    Starting the speech recognition when launching.

  • ~self_cancellation (Bool, default: True)

    Whether the node recognize the sound heard when ~tts_action_names is running or not.

    This options is for ignoring self voice sounds from recognition.

  • ~tts_action_names (List[String], default: ['sound_play'])

    Text-to-speech action name for self cancellation.

    The node ignores the voice heard when these Text-to-speech action is running.

  • ~tts_tolerance (Float, default: 1.0)

    Tolerance seconds for self cancellation.

    The node ignores the voice with this tolerance seconds after ~tts_action_names finish running.

  • ~google_key (String, default: None)

    Auth Key for Google API. If None, use public key. (No guarantee to be blocked.)
    This is valid only if ~engine is Google.

  • ~google_cloud_credentials_json (String, default: None)

    Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if ~engine is GoogleCloud.

  • ~google_cloud_preferred_phrases ([String], default: None)

    Preferred phrases parameters. This is valid only if ~engine is GoogleCloud.

  • ~bing_key (String, default: None)

    Auth key for Bing API.
    This is valid only if ~engine is bing.

  • ~vosk_model_path (String, default: None)

    Path to trainded model for Vosk API. This is valid only if ~engine is Vosk.

    If en-US or ja is selected as ~language, you do not need to specify the path. To load other models, please download them from Model list.

Author

Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»

CHANGELOG

Changelog for package ros_speech_recognition

2.1.29 (2025-01-05)

  • [doc] fix typo in jsk_3rdparty/ros_speech_recognition/README.md (#499)
  • Contributors: Yukina Iwata

2.1.28 (2023-07-24)

2.1.27 (2023-06-24)

  • fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
  • Contributors: Kei Okada

2.1.26 (2023-06-14)

  • add LICENSE files (#476)
  • Contributors: Kei Okada

2.1.25 (2023-06-08)

  • [ros_speech_recognition] Add vosk engine (#474)
  • Pr/use sound themes freedesktop (#472)
  • add test to check if ros node is loadable (#463)
  • add self.conf_thresh in __init_ function (#457)
  • [ros_speech_recognition] add ubuntu-sounds dependency (#453)
  • [ros_speech_recognition] Return if result is empty (#443)
  • [ros_speece_recognition] Set confidence value of google (#434)
  • [ros_speech_recognition] add parrotry.launch (#414)
  • [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
  • [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
  • [ros_speech_recognition] add self cancellation for speech recogntion (#413)
  • [#405 and #410] Fix CI (#415)
  • add ROS interface for https://cloud.google.com/natural-language (#304)
  • GithubAction: add test for aarch64(melodic) / indigo (arm64) (#365)
    • pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
  • Explicit python interpreter in catkin_virtualenv (#367)
  • .github/workflow: integrate all yaml to one (#338)
  • [ros_speech_recognition] Fixed the behavior of launch file (#336)
  • [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
  • [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
  • Enable sound play flag (#315)
  • Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura

2.1.24 (2021-07-26)

2.1.23 (2021-07-21)

2.1.22 (2021-06-10)

  • enable to change topic name from speech_recognition.launch (#254)
  • support SpeakerDiarization, see https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative (#244)
    • [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
    • more exception message for self.recognize
  • Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
  • Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa

2.1.21 (2020-08-19)

2.1.20 (2020-08-07)

2.1.19 (2020-07-21)

2.1.18 (2020-07-20)

  • Fix for noetic (#200)
    • fix 2to3, with print, raise, exception
  • [ros_speech_recognition] Enable multi channel audio recognition (#198)
    • adjust type code to the CPU platform
    • replace rosparam name: channels -> n_channel
    • add rosparam description to README
    • enable multi channel audio recognition
  • Add args to ros_speech_recognition (#197)
    • Add flac as run_depend for SpeechRecognition pip package
    • Use catkin_virtualenv to use SpeechRecognition pip package
    • Add arguments and params to pass rostest
    • Add test for ros_speech_recognition
    • add args to launch
    • add pip install to tutorials
    • add param description to README
  • Contributors: Kei Okada, Naoya Yamaguchi

2.1.17 (2020-04-16)

2.1.16 (2020-04-16)

2.1.15 (2019-12-12)

2.1.14 (2019-11-21)

  • set SoundRequest.volume for kinetic (#173)
  • Contributors: Kei Okada

2.1.13 (2019-07-10)

2.1.12 (2019-05-25)

  • fixes GoogleCloud auth (#158)
  • Contributors: jonasius

2.1.11 (2018-08-29)

2.1.10 (2018-04-25)

2.1.9 (2018-04-24)

2.1.8 (2018-04-17)

2.1.7 (2018-04-09)

2.1.6 (2017-11-21)

2.1.5 (2017-11-20)

  • ros_speech_recognition: add continuous mode (#127)
  • ros_speech_recognition: add README (#123)
  • add ros_speech_recognition package (#121)
  • Contributors: Yuki Furuta

2.1.4 (2017-07-16)

2.1.3 (2017-07-07)

2.1.2 (2017-07-06)

2.1.1 (2017-07-05)

2.1.0 (2017-07-02)

2.0.20 (2017-05-09)

2.0.19 (2017-02-22)

2.0.18 (2016-10-28)

2.0.17 (2016-10-22)

2.0.16 (2016-10-17)

2.0.15 (2016-10-16)

2.0.14 (2016-03-20)

2.0.13 (2015-12-15)

2.0.12 (2015-11-26)

2.0.11 (2015-10-07 14:16)

2.0.10 (2015-10-07 12:47)

2.0.9 (2015-09-26)

2.0.8 (2015-09-15)

2.0.7 (2015-09-14)

2.0.6 (2015-09-08)

2.0.5 (2015-08-23)

2.0.4 (2015-08-18)

2.0.3 (2015-08-01)

2.0.2 (2015-06-29)

2.0.1 (2015-06-19 21:21)

2.0.0 (2015-06-19 10:41)

1.0.71 (2015-05-17)

1.0.70 (2015-05-08)

1.0.69 (2015-05-05 12:28)

1.0.68 (2015-05-05 09:49)

1.0.67 (2015-05-03)

1.0.66 (2015-04-03)

1.0.65 (2015-04-02)

1.0.64 (2015-03-29)

1.0.63 (2015-02-19)

1.0.62 (2015-02-17)

1.0.61 (2015-02-11)

1.0.60 (2015-02-03 10:12)

1.0.59 (2015-02-03 04:05)

1.0.58 (2015-01-07)

1.0.57 (2014-12-23)

1.0.56 (2014-12-17)

1.0.55 (2014-12-09)

1.0.54 (2014-11-15)

1.0.53 (2014-11-01)

1.0.52 (2014-10-23)

1.0.51 (2014-10-20 16:01)

1.0.50 (2014-10-20 01:50)

1.0.49 (2014-10-13)

1.0.48 (2014-10-12)

1.0.47 (2014-10-08)

1.0.46 (2014-10-03)

1.0.45 (2014-09-29)

1.0.44 (2014-09-26 09:17)

1.0.43 (2014-09-26 01:08)

1.0.42 (2014-09-25)

1.0.41 (2014-09-23)

1.0.40 (2014-09-19)

1.0.39 (2014-09-17)

1.0.38 (2014-09-13)

1.0.37 (2014-09-08)

1.0.36 (2014-09-01)

1.0.35 (2014-08-16)

1.0.34 (2014-08-14)

1.0.33 (2014-07-28)

1.0.32 (2014-07-26)

1.0.31 (2014-07-23)

1.0.30 (2014-07-15)

1.0.29 (2014-07-02)

1.0.28 (2014-06-24)

1.0.27 (2014-06-10)

1.0.26 (2014-05-30)

1.0.25 (2014-05-26)

1.0.24 (2014-05-24)

1.0.23 (2014-05-23)

1.0.22 (2014-05-22)

1.0.21 (2014-05-20)

1.0.20 (2014-05-09)

1.0.19 (2014-05-06)

1.0.18 (2014-05-04)

1.0.17 (2014-04-20)

1.0.16 (2014-04-19 23:29)

1.0.15 (2014-04-19 20:19)

1.0.14 (2014-04-19 12:52)

1.0.13 (2014-04-19 11:06)

1.0.12 (2014-04-18 16:58)

1.0.11 (2014-04-18 08:18)

1.0.10 (2014-04-17)

1.0.9 (2014-04-12)

1.0.8 (2014-04-11)

1.0.7 (2014-04-10)

1.0.6 (2014-04-07)

1.0.5 (2014-03-31)

1.0.4 (2014-03-29)

1.0.3 (2014-03-19)

1.0.2 (2014-03-12)

1.0.1 (2014-03-07)

1.0.0 (2014-03-05)

Wiki Tutorials

This package does not provide any links to tutorials in it's rosindex metadata. You can check on the ROS Wiki Tutorials page for the package.

Launch files

  • launch/speech_recognition.launch
      • launch_sound_play [default: true] — Launch sound_play node to speak
      • launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
      • audio_topic [default: /audio] — Name of audio topic captured from microphone
      • voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
      • n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
      • device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
      • engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
      • language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
      • continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
      • auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
      • self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
      • tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
      • tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
  • launch/parrotry.launch
      • use_google [default: true]
      • language [default: en-US]
      • confidence_threshold [default: 0.8]

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros_speech_recognition at Robotics Stack Exchange