![]() |
llama_ros repositoryaudio cpp embeddings llama gpt ros2 vlm reranking multimodal llm langchain llava llamacpp ggml gguf rerank llavacpp |
![]() |
llama_ros repositoryaudio cpp embeddings llama gpt ros2 vlm reranking multimodal llm langchain llava llamacpp ggml gguf rerank llavacpp |
![]() |
llama_ros repositoryaudio cpp embeddings llama gpt ros2 vlm reranking multimodal llm langchain llava llamacpp ggml gguf rerank llavacpp |
![]() |
llama_ros repositoryaudio cpp embeddings llama gpt ros2 vlm reranking multimodal llm langchain llava llamacpp ggml gguf rerank llavacpp |
![]() |
llama_ros repositoryaudio cpp embeddings llama gpt ros2 vlm reranking multimodal llm langchain llava llamacpp ggml gguf rerank llavacpp llama_bringup llama_bt llama_cli llama_cpp_vendor llama_demos llama_hfhub_vendor llama_msgs llama_ros |
Repository Summary
Description | llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2 |
Checkout URI | https://github.com/mgonzs13/llama_ros.git |
VCS Type | git |
VCS Version | main |
Last Updated | 2025-07-20 |
Dev Status | UNKNOWN |
Released | UNRELEASED |
Tags | audio cpp embeddings llama gpt ros2 vlm reranking multimodal llm langchain llava llamacpp ggml gguf rerank llavacpp |
Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Packages
Name | Version |
---|---|
llama_bringup | 5.3.1 |
llama_bt | 1.0.0 |
llama_cli | 5.3.1 |
llama_cpp_vendor | 5.3.1 |
llama_demos | 5.3.1 |
llama_hfhub_vendor | 5.3.1 |
llama_msgs | 5.3.1 |
llama_ros | 5.3.1 |
README
llama_ros
This repository provides a set of ROS 2 packages to integrate llama.cpp into ROS 2. Using the llama_ros packages, you can easily incorporate the powerful optimization capabilities of llama.cpp into your ROS 2 projects by running GGUF-based LLMs and VLMs. You can also use features from llama.cpp such as GBNF grammars and modify LoRAs in real-time.
Table of Contents
Related Projects
- chatbot_ros → This chatbot, integrated into ROS 2, uses whisper_ros, to listen to people speech; and llama_ros, to generate responses. The chatbot is controlled by a state machine created with YASMIN.
- explainable_ros → A ROS 2 tool to explain the behavior of a robot. Using the integration of LangChain, logs are stored in a vector database. Then, RAG is applied to retrieve relevant logs for user questions answered with llama_ros.
Installation
To run llama_ros with CUDA, first, you must install the CUDA Toolkit. Then, you can compile llama_ros with --cmake-args -DGGML_CUDA=ON
to enable CUDA support.
cd ~/ros2_ws/src
git clone https://github.com/mgonzs13/llama_ros.git
pip3 install -r llama_ros/requirements.txt
cd ~/ros2_ws
rosdep install --from-paths src --ignore-src -r -y
colcon build --cmake-args -DGGML_CUDA=ON # add this for CUDA
Docker
Build the llama_ros docker or download an image from DockerHub. You can choose to build llama_ros with CUDA (USE_CUDA
) and choose the CUDA version (CUDA_VERSION
). Remember that you have to use DOCKER_BUILDKIT=0
to compile llama_ros with CUDA when building the image.
DOCKER_BUILDKIT=0 docker build -t llama_ros --build-arg USE_CUDA=1 --build-arg CUDA_VERSION=12-6 .
Run the docker container. If you want to use CUDA, you have to install the NVIDIA Container Tollkit and add --gpus all
.
docker run -it --rm --gpus all llama_ros
Usage
llama_cli
Commands are included in llama_ros to speed up the test of GGUF-based LLMs within the ROS 2 ecosystem. This way, the following commands are integrating into the ROS 2 commands:
launch
Using this command launch a LLM from a YAML file. The configuration of the YAML is used to launch the LLM in the same way as using a regular launch file. Here is an example of how to use it:
ros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/StableLM-Zephyr.yaml
prompt
Using this command send a prompt to a launched LLM. The command uses a string, which is the prompt and has the following arguments:
- (
-r
,--reset
): Whether to reset the LLM before prompting - (
-t
,--temp
): The temperature value - (
--image-url
): Image url to sent to a VLM
Here is an example of how to use it:
ros2 llama prompt "Do you know ROS 2?" -t 0.0
Launch Files
First of all, you need to create a launch file to use llama_ros or llava_ros. This launch file will contain the main parameters to download the model from HuggingFace and configure it. Take a look at the following examples and the predefined launch files.
llama_ros (Python Launch)
CONTRIBUTING
![]() |
llama_ros repositoryaudio cpp embeddings llama gpt ros2 vlm reranking multimodal llm langchain llava llamacpp ggml gguf rerank llavacpp |
![]() |
llama_ros repositoryaudio cpp embeddings llama gpt ros2 vlm reranking multimodal llm langchain llava llamacpp ggml gguf rerank llavacpp |
![]() |
llama_ros repositoryaudio cpp embeddings llama gpt ros2 vlm reranking multimodal llm langchain llava llamacpp ggml gguf rerank llavacpp |
![]() |
llama_ros repositoryaudio cpp embeddings llama gpt ros2 vlm reranking multimodal llm langchain llava llamacpp ggml gguf rerank llavacpp |