No version for distro humble showing github. Known supported distros are highlighted in the buttons above.
Package symbol

rosgpt_vision package from rosgpt_vision repo

rosgpt_vision

ROS Distro
github

Package Summary

Tags No category tags.
Version 0.0.0
License TODO: License declaration
Build type AMENT_PYTHON
Use RECOMMENDED

Repository Summary

Description Commanding robots using only Language Models' prompts
Checkout URI https://github.com/bilel-bj/rosgpt_vision.git
VCS Type git
VCS Version main
Last Updated 2025-02-16
Dev Status UNKNOWN
Released UNRELEASED
Tags robotics language-models ros2 robotic-vision large-language-models llm prompt-engineering chatgpt language-models-are-next robotic-design-patterns prompting-robotic-modalities visual-language-models
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

TODO: Package description

Additional Links

No additional links.

Maintainers

  • bbenjdira

Authors

No additional authors.

ROSGPT_Vision: Commanding Robots Using Only Language Models’ Prompts

Bilel Benjdira, Anis Koubaa and Anas M. Ali arXiv YouTube arXiv

Robotics and Internet of Things Lab (RIOTU Lab), Prince Sultan University, Saudi Arabia

Inspired by ROSGPT. Both projects aim to bridge the gap between robotics, natural language understanding, and image analysis.

Collaborators who want to participate in this project, are very welcome.


  • ROSGPT_Vision is a new robotic framework dsigned to command robots using only two prompts:
    • a Visual Prompt (for visual semantic features), and - an LLM Prompt (to regulate robotic reactions).
  • It is based on a new robotic design pattern: Prompting Robotic Modalities (PRM).
  • ROSGPT_Vision is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. It showcases cost-effective development.
  • We demonstrated how to optimize the prompting strategies to improve the application.
  • LangChain framework is used by to easily customize prompts.
  • More details are described in the academic paper “ROSGPT_Vision: Commanding Robots using only Language Models’ Prompts”.

Video Demo

An illustrative video demonstration of ROSGPT_Vision is provided: ROSGPT Video Demonstration

Table of Contents

Overview

ROSGPT_Vision offers a unified platform that allows robots to perceive, interpret, and interact with visual data through natural language. The framework leverages state-of-the-art language models, including LLAVA, MiniGPT-4, and Caption-Anything, to facilitate advanced reasoning about image data. LangChain is used for easy customization of the prompts. The provided implementation includes the CarMate application, a driver monitoring and assistance system designed to ensure safe and efficient driving experiences.

ROSGPT_Vision diagram

Prompting Robotic Modalities (PRM) Design Pattern

  • A new design approach emphasizing modular and individualized sensory queries.
  • Uses specific Modality Language Models (MLM) for textual interpretations of inputs, like the Vision Language Model (VLM) for visual data.
  • Ensures precise data collection by treating each sensory input separately.
  • Task Modality’s Role: Serves as the central coordinator, synthesizing data from various modalities.

** for more information go to arXiv

CarMate Application

CarMate is a complete application for monitoring driver behavior which was developed just by setting two prompts in the YAML file. It automatically analyses the input video using the Visual prompt, analyses what should be done using the LLM prompt, and gives an instant alert to the driver when needed.

These are the prompts used to develop the application, without needing extra code:

The Visual prompt:

Visual prompt: "Describe the driver’s current level of focus 
 	on driving based on the visual cues, Answer with one short sentence."

The LLM prompt:

LLM prompt:"Consider the following ontology: You must write your Reply 
 	with one short sentence. Behave as a carmate that surveys the driver 
  	and gives him advice and instruction to drive safely. You will be given 
   	human language prompts describing an image. Your task is to provide 
	appropriate instructions to the driver based on the description."

We can see three examples of scenarios, got during the driving:

Scenario 1: The driver is using phone

We can see in the top box the description generated by the image semantics module for the input image using the Visual prompt. Meanwhile, the second box generates the alert that should be given to the driver using the LLM prompt.

Scenario 2: The driver is taking pictures

Scenario 3: The driver is drinking

Installation

To use ROSGPT_Vision, follow these steps:

1. Prepare the code and the environment

Git clone our repository, creating a python environment and ativate it via the following command

```bash git clone https://github.com/bilel-bj/ROSGPT_Vision.git

File truncated at 100 lines see the full file

CHANGELOG
No CHANGELOG found.

Dependant Packages

No known dependants.

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged rosgpt_vision at Robotics Stack Exchange

No version for distro jazzy showing github. Known supported distros are highlighted in the buttons above.
Package symbol

rosgpt_vision package from rosgpt_vision repo

rosgpt_vision

ROS Distro
github

Package Summary

Tags No category tags.
Version 0.0.0
License TODO: License declaration
Build type AMENT_PYTHON
Use RECOMMENDED

Repository Summary

Description Commanding robots using only Language Models' prompts
Checkout URI https://github.com/bilel-bj/rosgpt_vision.git
VCS Type git
VCS Version main
Last Updated 2025-02-16
Dev Status UNKNOWN
Released UNRELEASED
Tags robotics language-models ros2 robotic-vision large-language-models llm prompt-engineering chatgpt language-models-are-next robotic-design-patterns prompting-robotic-modalities visual-language-models
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

TODO: Package description

Additional Links

No additional links.

Maintainers

  • bbenjdira

Authors

No additional authors.

ROSGPT_Vision: Commanding Robots Using Only Language Models’ Prompts

Bilel Benjdira, Anis Koubaa and Anas M. Ali arXiv YouTube arXiv

Robotics and Internet of Things Lab (RIOTU Lab), Prince Sultan University, Saudi Arabia

Inspired by ROSGPT. Both projects aim to bridge the gap between robotics, natural language understanding, and image analysis.

Collaborators who want to participate in this project, are very welcome.


  • ROSGPT_Vision is a new robotic framework dsigned to command robots using only two prompts:
    • a Visual Prompt (for visual semantic features), and - an LLM Prompt (to regulate robotic reactions).
  • It is based on a new robotic design pattern: Prompting Robotic Modalities (PRM).
  • ROSGPT_Vision is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. It showcases cost-effective development.
  • We demonstrated how to optimize the prompting strategies to improve the application.
  • LangChain framework is used by to easily customize prompts.
  • More details are described in the academic paper “ROSGPT_Vision: Commanding Robots using only Language Models’ Prompts”.

Video Demo

An illustrative video demonstration of ROSGPT_Vision is provided: ROSGPT Video Demonstration

Table of Contents

Overview

ROSGPT_Vision offers a unified platform that allows robots to perceive, interpret, and interact with visual data through natural language. The framework leverages state-of-the-art language models, including LLAVA, MiniGPT-4, and Caption-Anything, to facilitate advanced reasoning about image data. LangChain is used for easy customization of the prompts. The provided implementation includes the CarMate application, a driver monitoring and assistance system designed to ensure safe and efficient driving experiences.

ROSGPT_Vision diagram

Prompting Robotic Modalities (PRM) Design Pattern

  • A new design approach emphasizing modular and individualized sensory queries.
  • Uses specific Modality Language Models (MLM) for textual interpretations of inputs, like the Vision Language Model (VLM) for visual data.
  • Ensures precise data collection by treating each sensory input separately.
  • Task Modality’s Role: Serves as the central coordinator, synthesizing data from various modalities.

** for more information go to arXiv

CarMate Application

CarMate is a complete application for monitoring driver behavior which was developed just by setting two prompts in the YAML file. It automatically analyses the input video using the Visual prompt, analyses what should be done using the LLM prompt, and gives an instant alert to the driver when needed.

These are the prompts used to develop the application, without needing extra code:

The Visual prompt:

Visual prompt: "Describe the driver’s current level of focus 
 	on driving based on the visual cues, Answer with one short sentence."

The LLM prompt:

LLM prompt:"Consider the following ontology: You must write your Reply 
 	with one short sentence. Behave as a carmate that surveys the driver 
  	and gives him advice and instruction to drive safely. You will be given 
   	human language prompts describing an image. Your task is to provide 
	appropriate instructions to the driver based on the description."

We can see three examples of scenarios, got during the driving:

Scenario 1: The driver is using phone

We can see in the top box the description generated by the image semantics module for the input image using the Visual prompt. Meanwhile, the second box generates the alert that should be given to the driver using the LLM prompt.

Scenario 2: The driver is taking pictures

Scenario 3: The driver is drinking

Installation

To use ROSGPT_Vision, follow these steps:

1. Prepare the code and the environment

Git clone our repository, creating a python environment and ativate it via the following command

```bash git clone https://github.com/bilel-bj/ROSGPT_Vision.git

File truncated at 100 lines see the full file

CHANGELOG
No CHANGELOG found.

Dependant Packages

No known dependants.

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged rosgpt_vision at Robotics Stack Exchange

No version for distro kilted showing github. Known supported distros are highlighted in the buttons above.
Package symbol

rosgpt_vision package from rosgpt_vision repo

rosgpt_vision

ROS Distro
github

Package Summary

Tags No category tags.
Version 0.0.0
License TODO: License declaration
Build type AMENT_PYTHON
Use RECOMMENDED

Repository Summary

Description Commanding robots using only Language Models' prompts
Checkout URI https://github.com/bilel-bj/rosgpt_vision.git
VCS Type git
VCS Version main
Last Updated 2025-02-16
Dev Status UNKNOWN
Released UNRELEASED
Tags robotics language-models ros2 robotic-vision large-language-models llm prompt-engineering chatgpt language-models-are-next robotic-design-patterns prompting-robotic-modalities visual-language-models
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

TODO: Package description

Additional Links

No additional links.

Maintainers

  • bbenjdira

Authors

No additional authors.

ROSGPT_Vision: Commanding Robots Using Only Language Models’ Prompts

Bilel Benjdira, Anis Koubaa and Anas M. Ali arXiv YouTube arXiv

Robotics and Internet of Things Lab (RIOTU Lab), Prince Sultan University, Saudi Arabia

Inspired by ROSGPT. Both projects aim to bridge the gap between robotics, natural language understanding, and image analysis.

Collaborators who want to participate in this project, are very welcome.


  • ROSGPT_Vision is a new robotic framework dsigned to command robots using only two prompts:
    • a Visual Prompt (for visual semantic features), and - an LLM Prompt (to regulate robotic reactions).
  • It is based on a new robotic design pattern: Prompting Robotic Modalities (PRM).
  • ROSGPT_Vision is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. It showcases cost-effective development.
  • We demonstrated how to optimize the prompting strategies to improve the application.
  • LangChain framework is used by to easily customize prompts.
  • More details are described in the academic paper “ROSGPT_Vision: Commanding Robots using only Language Models’ Prompts”.

Video Demo

An illustrative video demonstration of ROSGPT_Vision is provided: ROSGPT Video Demonstration

Table of Contents

Overview

ROSGPT_Vision offers a unified platform that allows robots to perceive, interpret, and interact with visual data through natural language. The framework leverages state-of-the-art language models, including LLAVA, MiniGPT-4, and Caption-Anything, to facilitate advanced reasoning about image data. LangChain is used for easy customization of the prompts. The provided implementation includes the CarMate application, a driver monitoring and assistance system designed to ensure safe and efficient driving experiences.

ROSGPT_Vision diagram

Prompting Robotic Modalities (PRM) Design Pattern

  • A new design approach emphasizing modular and individualized sensory queries.
  • Uses specific Modality Language Models (MLM) for textual interpretations of inputs, like the Vision Language Model (VLM) for visual data.
  • Ensures precise data collection by treating each sensory input separately.
  • Task Modality’s Role: Serves as the central coordinator, synthesizing data from various modalities.

** for more information go to arXiv

CarMate Application

CarMate is a complete application for monitoring driver behavior which was developed just by setting two prompts in the YAML file. It automatically analyses the input video using the Visual prompt, analyses what should be done using the LLM prompt, and gives an instant alert to the driver when needed.

These are the prompts used to develop the application, without needing extra code:

The Visual prompt:

Visual prompt: "Describe the driver’s current level of focus 
 	on driving based on the visual cues, Answer with one short sentence."

The LLM prompt:

LLM prompt:"Consider the following ontology: You must write your Reply 
 	with one short sentence. Behave as a carmate that surveys the driver 
  	and gives him advice and instruction to drive safely. You will be given 
   	human language prompts describing an image. Your task is to provide 
	appropriate instructions to the driver based on the description."

We can see three examples of scenarios, got during the driving:

Scenario 1: The driver is using phone

We can see in the top box the description generated by the image semantics module for the input image using the Visual prompt. Meanwhile, the second box generates the alert that should be given to the driver using the LLM prompt.

Scenario 2: The driver is taking pictures

Scenario 3: The driver is drinking

Installation

To use ROSGPT_Vision, follow these steps:

1. Prepare the code and the environment

Git clone our repository, creating a python environment and ativate it via the following command

```bash git clone https://github.com/bilel-bj/ROSGPT_Vision.git

File truncated at 100 lines see the full file

CHANGELOG
No CHANGELOG found.

Dependant Packages

No known dependants.

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged rosgpt_vision at Robotics Stack Exchange

No version for distro rolling showing github. Known supported distros are highlighted in the buttons above.
Package symbol

rosgpt_vision package from rosgpt_vision repo

rosgpt_vision

ROS Distro
github

Package Summary

Tags No category tags.
Version 0.0.0
License TODO: License declaration
Build type AMENT_PYTHON
Use RECOMMENDED

Repository Summary

Description Commanding robots using only Language Models' prompts
Checkout URI https://github.com/bilel-bj/rosgpt_vision.git
VCS Type git
VCS Version main
Last Updated 2025-02-16
Dev Status UNKNOWN
Released UNRELEASED
Tags robotics language-models ros2 robotic-vision large-language-models llm prompt-engineering chatgpt language-models-are-next robotic-design-patterns prompting-robotic-modalities visual-language-models
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

TODO: Package description

Additional Links

No additional links.

Maintainers

  • bbenjdira

Authors

No additional authors.

ROSGPT_Vision: Commanding Robots Using Only Language Models’ Prompts

Bilel Benjdira, Anis Koubaa and Anas M. Ali arXiv YouTube arXiv

Robotics and Internet of Things Lab (RIOTU Lab), Prince Sultan University, Saudi Arabia

Inspired by ROSGPT. Both projects aim to bridge the gap between robotics, natural language understanding, and image analysis.

Collaborators who want to participate in this project, are very welcome.


  • ROSGPT_Vision is a new robotic framework dsigned to command robots using only two prompts:
    • a Visual Prompt (for visual semantic features), and - an LLM Prompt (to regulate robotic reactions).
  • It is based on a new robotic design pattern: Prompting Robotic Modalities (PRM).
  • ROSGPT_Vision is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. It showcases cost-effective development.
  • We demonstrated how to optimize the prompting strategies to improve the application.
  • LangChain framework is used by to easily customize prompts.
  • More details are described in the academic paper “ROSGPT_Vision: Commanding Robots using only Language Models’ Prompts”.

Video Demo

An illustrative video demonstration of ROSGPT_Vision is provided: ROSGPT Video Demonstration

Table of Contents

Overview

ROSGPT_Vision offers a unified platform that allows robots to perceive, interpret, and interact with visual data through natural language. The framework leverages state-of-the-art language models, including LLAVA, MiniGPT-4, and Caption-Anything, to facilitate advanced reasoning about image data. LangChain is used for easy customization of the prompts. The provided implementation includes the CarMate application, a driver monitoring and assistance system designed to ensure safe and efficient driving experiences.

ROSGPT_Vision diagram

Prompting Robotic Modalities (PRM) Design Pattern

  • A new design approach emphasizing modular and individualized sensory queries.
  • Uses specific Modality Language Models (MLM) for textual interpretations of inputs, like the Vision Language Model (VLM) for visual data.
  • Ensures precise data collection by treating each sensory input separately.
  • Task Modality’s Role: Serves as the central coordinator, synthesizing data from various modalities.

** for more information go to arXiv

CarMate Application

CarMate is a complete application for monitoring driver behavior which was developed just by setting two prompts in the YAML file. It automatically analyses the input video using the Visual prompt, analyses what should be done using the LLM prompt, and gives an instant alert to the driver when needed.

These are the prompts used to develop the application, without needing extra code:

The Visual prompt:

Visual prompt: "Describe the driver’s current level of focus 
 	on driving based on the visual cues, Answer with one short sentence."

The LLM prompt:

LLM prompt:"Consider the following ontology: You must write your Reply 
 	with one short sentence. Behave as a carmate that surveys the driver 
  	and gives him advice and instruction to drive safely. You will be given 
   	human language prompts describing an image. Your task is to provide 
	appropriate instructions to the driver based on the description."

We can see three examples of scenarios, got during the driving:

Scenario 1: The driver is using phone

We can see in the top box the description generated by the image semantics module for the input image using the Visual prompt. Meanwhile, the second box generates the alert that should be given to the driver using the LLM prompt.

Scenario 2: The driver is taking pictures

Scenario 3: The driver is drinking

Installation

To use ROSGPT_Vision, follow these steps:

1. Prepare the code and the environment

Git clone our repository, creating a python environment and ativate it via the following command

```bash git clone https://github.com/bilel-bj/ROSGPT_Vision.git

File truncated at 100 lines see the full file

CHANGELOG
No CHANGELOG found.

Dependant Packages

No known dependants.

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged rosgpt_vision at Robotics Stack Exchange

Package symbol

rosgpt_vision package from rosgpt_vision repo

rosgpt_vision

ROS Distro
github

Package Summary

Tags No category tags.
Version 0.0.0
License TODO: License declaration
Build type AMENT_PYTHON
Use RECOMMENDED

Repository Summary

Description Commanding robots using only Language Models' prompts
Checkout URI https://github.com/bilel-bj/rosgpt_vision.git
VCS Type git
VCS Version main
Last Updated 2025-02-16
Dev Status UNKNOWN
Released UNRELEASED
Tags robotics language-models ros2 robotic-vision large-language-models llm prompt-engineering chatgpt language-models-are-next robotic-design-patterns prompting-robotic-modalities visual-language-models
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

TODO: Package description

Additional Links

No additional links.

Maintainers

  • bbenjdira

Authors

No additional authors.

ROSGPT_Vision: Commanding Robots Using Only Language Models’ Prompts

Bilel Benjdira, Anis Koubaa and Anas M. Ali arXiv YouTube arXiv

Robotics and Internet of Things Lab (RIOTU Lab), Prince Sultan University, Saudi Arabia

Inspired by ROSGPT. Both projects aim to bridge the gap between robotics, natural language understanding, and image analysis.

Collaborators who want to participate in this project, are very welcome.


  • ROSGPT_Vision is a new robotic framework dsigned to command robots using only two prompts:
    • a Visual Prompt (for visual semantic features), and - an LLM Prompt (to regulate robotic reactions).
  • It is based on a new robotic design pattern: Prompting Robotic Modalities (PRM).
  • ROSGPT_Vision is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. It showcases cost-effective development.
  • We demonstrated how to optimize the prompting strategies to improve the application.
  • LangChain framework is used by to easily customize prompts.
  • More details are described in the academic paper “ROSGPT_Vision: Commanding Robots using only Language Models’ Prompts”.

Video Demo

An illustrative video demonstration of ROSGPT_Vision is provided: ROSGPT Video Demonstration

Table of Contents

Overview

ROSGPT_Vision offers a unified platform that allows robots to perceive, interpret, and interact with visual data through natural language. The framework leverages state-of-the-art language models, including LLAVA, MiniGPT-4, and Caption-Anything, to facilitate advanced reasoning about image data. LangChain is used for easy customization of the prompts. The provided implementation includes the CarMate application, a driver monitoring and assistance system designed to ensure safe and efficient driving experiences.

ROSGPT_Vision diagram

Prompting Robotic Modalities (PRM) Design Pattern

  • A new design approach emphasizing modular and individualized sensory queries.
  • Uses specific Modality Language Models (MLM) for textual interpretations of inputs, like the Vision Language Model (VLM) for visual data.
  • Ensures precise data collection by treating each sensory input separately.
  • Task Modality’s Role: Serves as the central coordinator, synthesizing data from various modalities.

** for more information go to arXiv

CarMate Application

CarMate is a complete application for monitoring driver behavior which was developed just by setting two prompts in the YAML file. It automatically analyses the input video using the Visual prompt, analyses what should be done using the LLM prompt, and gives an instant alert to the driver when needed.

These are the prompts used to develop the application, without needing extra code:

The Visual prompt:

Visual prompt: "Describe the driver’s current level of focus 
 	on driving based on the visual cues, Answer with one short sentence."

The LLM prompt:

LLM prompt:"Consider the following ontology: You must write your Reply 
 	with one short sentence. Behave as a carmate that surveys the driver 
  	and gives him advice and instruction to drive safely. You will be given 
   	human language prompts describing an image. Your task is to provide 
	appropriate instructions to the driver based on the description."

We can see three examples of scenarios, got during the driving:

Scenario 1: The driver is using phone

We can see in the top box the description generated by the image semantics module for the input image using the Visual prompt. Meanwhile, the second box generates the alert that should be given to the driver using the LLM prompt.

Scenario 2: The driver is taking pictures

Scenario 3: The driver is drinking

Installation

To use ROSGPT_Vision, follow these steps:

1. Prepare the code and the environment

Git clone our repository, creating a python environment and ativate it via the following command

```bash git clone https://github.com/bilel-bj/ROSGPT_Vision.git

File truncated at 100 lines see the full file

CHANGELOG
No CHANGELOG found.

Dependant Packages

No known dependants.

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged rosgpt_vision at Robotics Stack Exchange

No version for distro galactic showing github. Known supported distros are highlighted in the buttons above.
Package symbol

rosgpt_vision package from rosgpt_vision repo

rosgpt_vision

ROS Distro
github

Package Summary

Tags No category tags.
Version 0.0.0
License TODO: License declaration
Build type AMENT_PYTHON
Use RECOMMENDED

Repository Summary

Description Commanding robots using only Language Models' prompts
Checkout URI https://github.com/bilel-bj/rosgpt_vision.git
VCS Type git
VCS Version main
Last Updated 2025-02-16
Dev Status UNKNOWN
Released UNRELEASED
Tags robotics language-models ros2 robotic-vision large-language-models llm prompt-engineering chatgpt language-models-are-next robotic-design-patterns prompting-robotic-modalities visual-language-models
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

TODO: Package description

Additional Links

No additional links.

Maintainers

  • bbenjdira

Authors

No additional authors.

ROSGPT_Vision: Commanding Robots Using Only Language Models’ Prompts

Bilel Benjdira, Anis Koubaa and Anas M. Ali arXiv YouTube arXiv

Robotics and Internet of Things Lab (RIOTU Lab), Prince Sultan University, Saudi Arabia

Inspired by ROSGPT. Both projects aim to bridge the gap between robotics, natural language understanding, and image analysis.

Collaborators who want to participate in this project, are very welcome.


  • ROSGPT_Vision is a new robotic framework dsigned to command robots using only two prompts:
    • a Visual Prompt (for visual semantic features), and - an LLM Prompt (to regulate robotic reactions).
  • It is based on a new robotic design pattern: Prompting Robotic Modalities (PRM).
  • ROSGPT_Vision is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. It showcases cost-effective development.
  • We demonstrated how to optimize the prompting strategies to improve the application.
  • LangChain framework is used by to easily customize prompts.
  • More details are described in the academic paper “ROSGPT_Vision: Commanding Robots using only Language Models’ Prompts”.

Video Demo

An illustrative video demonstration of ROSGPT_Vision is provided: ROSGPT Video Demonstration

Table of Contents

Overview

ROSGPT_Vision offers a unified platform that allows robots to perceive, interpret, and interact with visual data through natural language. The framework leverages state-of-the-art language models, including LLAVA, MiniGPT-4, and Caption-Anything, to facilitate advanced reasoning about image data. LangChain is used for easy customization of the prompts. The provided implementation includes the CarMate application, a driver monitoring and assistance system designed to ensure safe and efficient driving experiences.

ROSGPT_Vision diagram

Prompting Robotic Modalities (PRM) Design Pattern

  • A new design approach emphasizing modular and individualized sensory queries.
  • Uses specific Modality Language Models (MLM) for textual interpretations of inputs, like the Vision Language Model (VLM) for visual data.
  • Ensures precise data collection by treating each sensory input separately.
  • Task Modality’s Role: Serves as the central coordinator, synthesizing data from various modalities.

** for more information go to arXiv

CarMate Application

CarMate is a complete application for monitoring driver behavior which was developed just by setting two prompts in the YAML file. It automatically analyses the input video using the Visual prompt, analyses what should be done using the LLM prompt, and gives an instant alert to the driver when needed.

These are the prompts used to develop the application, without needing extra code:

The Visual prompt:

Visual prompt: "Describe the driver’s current level of focus 
 	on driving based on the visual cues, Answer with one short sentence."

The LLM prompt:

LLM prompt:"Consider the following ontology: You must write your Reply 
 	with one short sentence. Behave as a carmate that surveys the driver 
  	and gives him advice and instruction to drive safely. You will be given 
   	human language prompts describing an image. Your task is to provide 
	appropriate instructions to the driver based on the description."

We can see three examples of scenarios, got during the driving:

Scenario 1: The driver is using phone

We can see in the top box the description generated by the image semantics module for the input image using the Visual prompt. Meanwhile, the second box generates the alert that should be given to the driver using the LLM prompt.

Scenario 2: The driver is taking pictures

Scenario 3: The driver is drinking

Installation

To use ROSGPT_Vision, follow these steps:

1. Prepare the code and the environment

Git clone our repository, creating a python environment and ativate it via the following command

```bash git clone https://github.com/bilel-bj/ROSGPT_Vision.git

File truncated at 100 lines see the full file

CHANGELOG
No CHANGELOG found.

Dependant Packages

No known dependants.

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged rosgpt_vision at Robotics Stack Exchange

No version for distro iron showing github. Known supported distros are highlighted in the buttons above.
Package symbol

rosgpt_vision package from rosgpt_vision repo

rosgpt_vision

ROS Distro
github

Package Summary

Tags No category tags.
Version 0.0.0
License TODO: License declaration
Build type AMENT_PYTHON
Use RECOMMENDED

Repository Summary

Description Commanding robots using only Language Models' prompts
Checkout URI https://github.com/bilel-bj/rosgpt_vision.git
VCS Type git
VCS Version main
Last Updated 2025-02-16
Dev Status UNKNOWN
Released UNRELEASED
Tags robotics language-models ros2 robotic-vision large-language-models llm prompt-engineering chatgpt language-models-are-next robotic-design-patterns prompting-robotic-modalities visual-language-models
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

TODO: Package description

Additional Links

No additional links.

Maintainers

  • bbenjdira

Authors

No additional authors.

ROSGPT_Vision: Commanding Robots Using Only Language Models’ Prompts

Bilel Benjdira, Anis Koubaa and Anas M. Ali arXiv YouTube arXiv

Robotics and Internet of Things Lab (RIOTU Lab), Prince Sultan University, Saudi Arabia

Inspired by ROSGPT. Both projects aim to bridge the gap between robotics, natural language understanding, and image analysis.

Collaborators who want to participate in this project, are very welcome.


  • ROSGPT_Vision is a new robotic framework dsigned to command robots using only two prompts:
    • a Visual Prompt (for visual semantic features), and - an LLM Prompt (to regulate robotic reactions).
  • It is based on a new robotic design pattern: Prompting Robotic Modalities (PRM).
  • ROSGPT_Vision is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. It showcases cost-effective development.
  • We demonstrated how to optimize the prompting strategies to improve the application.
  • LangChain framework is used by to easily customize prompts.
  • More details are described in the academic paper “ROSGPT_Vision: Commanding Robots using only Language Models’ Prompts”.

Video Demo

An illustrative video demonstration of ROSGPT_Vision is provided: ROSGPT Video Demonstration

Table of Contents

Overview

ROSGPT_Vision offers a unified platform that allows robots to perceive, interpret, and interact with visual data through natural language. The framework leverages state-of-the-art language models, including LLAVA, MiniGPT-4, and Caption-Anything, to facilitate advanced reasoning about image data. LangChain is used for easy customization of the prompts. The provided implementation includes the CarMate application, a driver monitoring and assistance system designed to ensure safe and efficient driving experiences.

ROSGPT_Vision diagram

Prompting Robotic Modalities (PRM) Design Pattern

  • A new design approach emphasizing modular and individualized sensory queries.
  • Uses specific Modality Language Models (MLM) for textual interpretations of inputs, like the Vision Language Model (VLM) for visual data.
  • Ensures precise data collection by treating each sensory input separately.
  • Task Modality’s Role: Serves as the central coordinator, synthesizing data from various modalities.

** for more information go to arXiv

CarMate Application

CarMate is a complete application for monitoring driver behavior which was developed just by setting two prompts in the YAML file. It automatically analyses the input video using the Visual prompt, analyses what should be done using the LLM prompt, and gives an instant alert to the driver when needed.

These are the prompts used to develop the application, without needing extra code:

The Visual prompt:

Visual prompt: "Describe the driver’s current level of focus 
 	on driving based on the visual cues, Answer with one short sentence."

The LLM prompt:

LLM prompt:"Consider the following ontology: You must write your Reply 
 	with one short sentence. Behave as a carmate that surveys the driver 
  	and gives him advice and instruction to drive safely. You will be given 
   	human language prompts describing an image. Your task is to provide 
	appropriate instructions to the driver based on the description."

We can see three examples of scenarios, got during the driving:

Scenario 1: The driver is using phone

We can see in the top box the description generated by the image semantics module for the input image using the Visual prompt. Meanwhile, the second box generates the alert that should be given to the driver using the LLM prompt.

Scenario 2: The driver is taking pictures

Scenario 3: The driver is drinking

Installation

To use ROSGPT_Vision, follow these steps:

1. Prepare the code and the environment

Git clone our repository, creating a python environment and ativate it via the following command

```bash git clone https://github.com/bilel-bj/ROSGPT_Vision.git

File truncated at 100 lines see the full file

CHANGELOG
No CHANGELOG found.

Dependant Packages

No known dependants.

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged rosgpt_vision at Robotics Stack Exchange

No version for distro melodic showing github. Known supported distros are highlighted in the buttons above.
Package symbol

rosgpt_vision package from rosgpt_vision repo

rosgpt_vision

ROS Distro
github

Package Summary

Tags No category tags.
Version 0.0.0
License TODO: License declaration
Build type AMENT_PYTHON
Use RECOMMENDED

Repository Summary

Description Commanding robots using only Language Models' prompts
Checkout URI https://github.com/bilel-bj/rosgpt_vision.git
VCS Type git
VCS Version main
Last Updated 2025-02-16
Dev Status UNKNOWN
Released UNRELEASED
Tags robotics language-models ros2 robotic-vision large-language-models llm prompt-engineering chatgpt language-models-are-next robotic-design-patterns prompting-robotic-modalities visual-language-models
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

TODO: Package description

Additional Links

No additional links.

Maintainers

  • bbenjdira

Authors

No additional authors.

ROSGPT_Vision: Commanding Robots Using Only Language Models’ Prompts

Bilel Benjdira, Anis Koubaa and Anas M. Ali arXiv YouTube arXiv

Robotics and Internet of Things Lab (RIOTU Lab), Prince Sultan University, Saudi Arabia

Inspired by ROSGPT. Both projects aim to bridge the gap between robotics, natural language understanding, and image analysis.

Collaborators who want to participate in this project, are very welcome.


  • ROSGPT_Vision is a new robotic framework dsigned to command robots using only two prompts:
    • a Visual Prompt (for visual semantic features), and - an LLM Prompt (to regulate robotic reactions).
  • It is based on a new robotic design pattern: Prompting Robotic Modalities (PRM).
  • ROSGPT_Vision is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. It showcases cost-effective development.
  • We demonstrated how to optimize the prompting strategies to improve the application.
  • LangChain framework is used by to easily customize prompts.
  • More details are described in the academic paper “ROSGPT_Vision: Commanding Robots using only Language Models’ Prompts”.

Video Demo

An illustrative video demonstration of ROSGPT_Vision is provided: ROSGPT Video Demonstration

Table of Contents

Overview

ROSGPT_Vision offers a unified platform that allows robots to perceive, interpret, and interact with visual data through natural language. The framework leverages state-of-the-art language models, including LLAVA, MiniGPT-4, and Caption-Anything, to facilitate advanced reasoning about image data. LangChain is used for easy customization of the prompts. The provided implementation includes the CarMate application, a driver monitoring and assistance system designed to ensure safe and efficient driving experiences.

ROSGPT_Vision diagram

Prompting Robotic Modalities (PRM) Design Pattern

  • A new design approach emphasizing modular and individualized sensory queries.
  • Uses specific Modality Language Models (MLM) for textual interpretations of inputs, like the Vision Language Model (VLM) for visual data.
  • Ensures precise data collection by treating each sensory input separately.
  • Task Modality’s Role: Serves as the central coordinator, synthesizing data from various modalities.

** for more information go to arXiv

CarMate Application

CarMate is a complete application for monitoring driver behavior which was developed just by setting two prompts in the YAML file. It automatically analyses the input video using the Visual prompt, analyses what should be done using the LLM prompt, and gives an instant alert to the driver when needed.

These are the prompts used to develop the application, without needing extra code:

The Visual prompt:

Visual prompt: "Describe the driver’s current level of focus 
 	on driving based on the visual cues, Answer with one short sentence."

The LLM prompt:

LLM prompt:"Consider the following ontology: You must write your Reply 
 	with one short sentence. Behave as a carmate that surveys the driver 
  	and gives him advice and instruction to drive safely. You will be given 
   	human language prompts describing an image. Your task is to provide 
	appropriate instructions to the driver based on the description."

We can see three examples of scenarios, got during the driving:

Scenario 1: The driver is using phone

We can see in the top box the description generated by the image semantics module for the input image using the Visual prompt. Meanwhile, the second box generates the alert that should be given to the driver using the LLM prompt.

Scenario 2: The driver is taking pictures

Scenario 3: The driver is drinking

Installation

To use ROSGPT_Vision, follow these steps:

1. Prepare the code and the environment

Git clone our repository, creating a python environment and ativate it via the following command

```bash git clone https://github.com/bilel-bj/ROSGPT_Vision.git

File truncated at 100 lines see the full file

CHANGELOG
No CHANGELOG found.

Dependant Packages

No known dependants.

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged rosgpt_vision at Robotics Stack Exchange

No version for distro noetic showing github. Known supported distros are highlighted in the buttons above.
Package symbol

rosgpt_vision package from rosgpt_vision repo

rosgpt_vision

ROS Distro
github

Package Summary

Tags No category tags.
Version 0.0.0
License TODO: License declaration
Build type AMENT_PYTHON
Use RECOMMENDED

Repository Summary

Description Commanding robots using only Language Models' prompts
Checkout URI https://github.com/bilel-bj/rosgpt_vision.git
VCS Type git
VCS Version main
Last Updated 2025-02-16
Dev Status UNKNOWN
Released UNRELEASED
Tags robotics language-models ros2 robotic-vision large-language-models llm prompt-engineering chatgpt language-models-are-next robotic-design-patterns prompting-robotic-modalities visual-language-models
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

TODO: Package description

Additional Links

No additional links.

Maintainers

  • bbenjdira

Authors

No additional authors.

ROSGPT_Vision: Commanding Robots Using Only Language Models’ Prompts

Bilel Benjdira, Anis Koubaa and Anas M. Ali arXiv YouTube arXiv

Robotics and Internet of Things Lab (RIOTU Lab), Prince Sultan University, Saudi Arabia

Inspired by ROSGPT. Both projects aim to bridge the gap between robotics, natural language understanding, and image analysis.

Collaborators who want to participate in this project, are very welcome.


  • ROSGPT_Vision is a new robotic framework dsigned to command robots using only two prompts:
    • a Visual Prompt (for visual semantic features), and - an LLM Prompt (to regulate robotic reactions).
  • It is based on a new robotic design pattern: Prompting Robotic Modalities (PRM).
  • ROSGPT_Vision is used to develop CarMate, a robotic application for monitoring driver distractions and providing real-time vocal notifications. It showcases cost-effective development.
  • We demonstrated how to optimize the prompting strategies to improve the application.
  • LangChain framework is used by to easily customize prompts.
  • More details are described in the academic paper “ROSGPT_Vision: Commanding Robots using only Language Models’ Prompts”.

Video Demo

An illustrative video demonstration of ROSGPT_Vision is provided: ROSGPT Video Demonstration

Table of Contents

Overview

ROSGPT_Vision offers a unified platform that allows robots to perceive, interpret, and interact with visual data through natural language. The framework leverages state-of-the-art language models, including LLAVA, MiniGPT-4, and Caption-Anything, to facilitate advanced reasoning about image data. LangChain is used for easy customization of the prompts. The provided implementation includes the CarMate application, a driver monitoring and assistance system designed to ensure safe and efficient driving experiences.

ROSGPT_Vision diagram

Prompting Robotic Modalities (PRM) Design Pattern

  • A new design approach emphasizing modular and individualized sensory queries.
  • Uses specific Modality Language Models (MLM) for textual interpretations of inputs, like the Vision Language Model (VLM) for visual data.
  • Ensures precise data collection by treating each sensory input separately.
  • Task Modality’s Role: Serves as the central coordinator, synthesizing data from various modalities.

** for more information go to arXiv

CarMate Application

CarMate is a complete application for monitoring driver behavior which was developed just by setting two prompts in the YAML file. It automatically analyses the input video using the Visual prompt, analyses what should be done using the LLM prompt, and gives an instant alert to the driver when needed.

These are the prompts used to develop the application, without needing extra code:

The Visual prompt:

Visual prompt: "Describe the driver’s current level of focus 
 	on driving based on the visual cues, Answer with one short sentence."

The LLM prompt:

LLM prompt:"Consider the following ontology: You must write your Reply 
 	with one short sentence. Behave as a carmate that surveys the driver 
  	and gives him advice and instruction to drive safely. You will be given 
   	human language prompts describing an image. Your task is to provide 
	appropriate instructions to the driver based on the description."

We can see three examples of scenarios, got during the driving:

Scenario 1: The driver is using phone

We can see in the top box the description generated by the image semantics module for the input image using the Visual prompt. Meanwhile, the second box generates the alert that should be given to the driver using the LLM prompt.

Scenario 2: The driver is taking pictures

Scenario 3: The driver is drinking

Installation

To use ROSGPT_Vision, follow these steps:

1. Prepare the code and the environment

Git clone our repository, creating a python environment and ativate it via the following command

```bash git clone https://github.com/bilel-bj/ROSGPT_Vision.git

File truncated at 100 lines see the full file

CHANGELOG
No CHANGELOG found.

Dependant Packages

No known dependants.

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged rosgpt_vision at Robotics Stack Exchange