![]() |
jetson-voice repositorynlp text-to-speech deep-learning pytorch speech-recognition jetson tensorrt jetson-nano |
![]() |
jetson-voice repositorynlp text-to-speech deep-learning pytorch speech-recognition jetson tensorrt jetson-nano |
![]() |
jetson-voice repositorynlp text-to-speech deep-learning pytorch speech-recognition jetson tensorrt jetson-nano |
![]() |
jetson-voice repositorynlp text-to-speech deep-learning pytorch speech-recognition jetson tensorrt jetson-nano |
![]() |
jetson-voice repositorynlp text-to-speech deep-learning pytorch speech-recognition jetson tensorrt jetson-nano jetson_voice_ros |
Repository Summary
Description | ASR/NLP/TTS deep learning inference library for NVIDIA Jetson using PyTorch and TensorRT |
Checkout URI | https://github.com/dusty-nv/jetson-voice.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2022-03-10 |
Dev Status | UNKNOWN |
Released | UNRELEASED |
Tags | nlp text-to-speech deep-learning pytorch speech-recognition jetson tensorrt jetson-nano |
Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Packages
Name | Version |
---|---|
jetson_voice_ros | 0.0.0 |
README
jetson-voice
jetson-voice is an ASR/NLP/TTS deep learning inference library for Jetson Nano, TX1/TX2, Xavier NX, and AGX Xavier. It supports Python and JetPack 4.4.1 or newer. The DNN models were trained with NeMo and deployed with TensorRT for optimized performance. All computation is performed using the onboard GPU.
Currently the following capabilities are included:
The NLP models are using the DistilBERT transformer architecture for reduced memory usage and increased performance. For samples of the text-to-speech output, see the TTS Audio Samples section below.
Running the Container
jetson-voice is distributed as a Docker container due to the number of dependencies. There are pre-built containers images available on DockerHub for JetPack 4.4.1 and newer:
dustynv/jetson-voice:r32.4.4 # JetPack 4.4.1 (L4T R32.4.4)
dustynv/jetson-voice:r32.5.0 # JetPack 4.5 (L4T R32.5.0) / JetPack 4.5.1 (L4T R32.5.1)
dustynv/jetson-voice:r32.6.1 # JetPack 4.6 (L4T R32.6.1)
dustynv/jetson-voice:r32.7.1 # JetPack 4.6.1 (L4T R32.7.1)
To download and run the container, you can simply clone this repo and use the docker/run.sh
script:
$ git clone --branch dev https://github.com/dusty-nv/jetson-voice
$ cd jetson-voice
$ docker/run.sh
note: if you want to use a USB microphone or speaker, plug it in before you start the container
There are some optional arguments to docker/run.sh
that you can use:
-
-r
(--run
) specifies a run command, otherwise the container will start in an interactive shell. -
-v
(--volume
) mount a directory from the host into the container (/host/path:/container/path
) -
--dev
starts the container in development mode, where all the source files are mounted for easy editing
The run script will automatically mount the data/
directory into the container, which stores the models and other data files. If you save files from the container there, they will also show up under data/
on the host.
Automatic Speech Recognition (ASR)
The speech recognition in jetson-voice is a streaming service, so it’s intended to be used on live sources and transcribes the audio in 1-second chunks. It uses a QuartzNet-15x5 model followed by a CTC beamsearch decoder and language model, to further refine the raw output of the network. It detects breaks in the audio to determine the end of sentences. For information about using the ASR APIs, please refer to jetson_voice/asr.py
and see examples/asr.py
After you start the container, first run a test audio file (wav/ogg/flac) through examples/asr.py
to verify that the system is functional. Run this command (and all subsequent commands) inside the container:
$ examples/asr.py --wav data/audio/dusty.wav
hi
hi hi this is dust
hi hi this is dusty check
hi hi this is dusty check one two
hi hi this is dusty check one two three
hi hi this is dusty check one two three.
what's the weather or
what's the weather going to be tomorrow
what's the weather going to be tomorrow in pittsburgh
what's the weather going to be tomorrow in pittsburgh.
today is
today is wednesday
today is wednesday tomorrow is thursday
today is wednesday tomorrow is thursday.
i would like
i would like to order a large
i would like to order a large pepperoni pizza
i would like to order a large pepperoni pizza.
is it going to be
is it going to be cloudy tomorrow.
The first time you run each model, TensorRT will take a few minutes to optimize it.
This optimized model is then cached to disk, so the next time you run the model it will load faster.
Live Microphone
To test the ASR on a mic, first list the audio devices in your system to get the audio device ID’s:
``` bash $ scripts/list_audio_devices.sh
Audio Input Devices —————————————————- Input Device ID 1 - ‘tegra-snd-t210ref-mobile-rt565x: - (hw:1,0)’ (inputs=16) (sample_rate=44100) Input Device ID 2 - ‘tegra-snd-t210ref-mobile-rt565x: - (hw:1,1)’ (inputs=16) (sample_rate=44100) Input Device ID 3 - ‘tegra-snd-t210ref-mobile-rt565x: - (hw:1,2)’ (inputs=16) (sample_rate=44100) Input Device ID 4 - ‘tegra-snd-t210ref-mobile-rt565x: - (hw:1,3)’ (inputs=16) (sample_rate=44100)
File truncated at 100 lines see the full file
CONTRIBUTING
![]() |
jetson-voice repositorynlp text-to-speech deep-learning pytorch speech-recognition jetson tensorrt jetson-nano |
![]() |
jetson-voice repositorynlp text-to-speech deep-learning pytorch speech-recognition jetson tensorrt jetson-nano |
![]() |
jetson-voice repositorynlp text-to-speech deep-learning pytorch speech-recognition jetson tensorrt jetson-nano |
![]() |
jetson-voice repositorynlp text-to-speech deep-learning pytorch speech-recognition jetson tensorrt jetson-nano |