No version for distro humble showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.4.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Description
Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-22
Dev Status DEVELOPED
Released RELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)
entity_thresholds.config_file string "" Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

  • Per-entity confirmation and healing thresholds via manifest configuration (#269)
  • Default rosbag storage format changed from sqlite3 to mcap
  • Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
  • Build: use shared cmake modules from ros2_medkit_cmake package
  • Build: centralized clang-tidy configuration
  • Contributors: \@bburda

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

Package Summary

Version 0.4.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Description
Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-22
Dev Status DEVELOPED
Released RELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)
entity_thresholds.config_file string "" Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

  • Per-entity confirmation and healing thresholds via manifest configuration (#269)
  • Default rosbag storage format changed from sqlite3 to mcap
  • Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
  • Build: use shared cmake modules from ros2_medkit_cmake package
  • Build: centralized clang-tidy configuration
  • Contributors: \@bburda

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro kilted showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.4.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Description
Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-22
Dev Status DEVELOPED
Released RELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)
entity_thresholds.config_file string "" Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

  • Per-entity confirmation and healing thresholds via manifest configuration (#269)
  • Default rosbag storage format changed from sqlite3 to mcap
  • Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
  • Build: use shared cmake modules from ros2_medkit_cmake package
  • Build: centralized clang-tidy configuration
  • Contributors: \@bburda

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro rolling showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.4.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Description
Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-22
Dev Status DEVELOPED
Released RELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)
entity_thresholds.config_file string "" Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

  • Per-entity confirmation and healing thresholds via manifest configuration (#269)
  • Default rosbag storage format changed from sqlite3 to mcap
  • Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
  • Build: use shared cmake modules from ros2_medkit_cmake package
  • Build: centralized clang-tidy configuration
  • Contributors: \@bburda

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro github showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.4.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Description
Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-22
Dev Status DEVELOPED
Released RELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)
entity_thresholds.config_file string "" Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

  • Per-entity confirmation and healing thresholds via manifest configuration (#269)
  • Default rosbag storage format changed from sqlite3 to mcap
  • Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
  • Build: use shared cmake modules from ros2_medkit_cmake package
  • Build: centralized clang-tidy configuration
  • Contributors: \@bburda

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro galactic showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.4.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Description
Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-22
Dev Status DEVELOPED
Released RELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)
entity_thresholds.config_file string "" Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

  • Per-entity confirmation and healing thresholds via manifest configuration (#269)
  • Default rosbag storage format changed from sqlite3 to mcap
  • Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
  • Build: use shared cmake modules from ros2_medkit_cmake package
  • Build: centralized clang-tidy configuration
  • Contributors: \@bburda

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro iron showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.4.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Description
Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-22
Dev Status DEVELOPED
Released RELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)
entity_thresholds.config_file string "" Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

  • Per-entity confirmation and healing thresholds via manifest configuration (#269)
  • Default rosbag storage format changed from sqlite3 to mcap
  • Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
  • Build: use shared cmake modules from ros2_medkit_cmake package
  • Build: centralized clang-tidy configuration
  • Contributors: \@bburda

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro melodic showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.4.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Description
Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-22
Dev Status DEVELOPED
Released RELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)
entity_thresholds.config_file string "" Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

  • Per-entity confirmation and healing thresholds via manifest configuration (#269)
  • Default rosbag storage format changed from sqlite3 to mcap
  • Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
  • Build: use shared cmake modules from ros2_medkit_cmake package
  • Build: centralized clang-tidy configuration
  • Contributors: \@bburda

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro noetic showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.4.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Description
Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-22
Dev Status DEVELOPED
Released RELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)
entity_thresholds.config_file string "" Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

  • Per-entity confirmation and healing thresholds via manifest configuration (#269)
  • Default rosbag storage format changed from sqlite3 to mcap
  • Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
  • Build: use shared cmake modules from ros2_medkit_cmake package
  • Build: centralized clang-tidy configuration
  • Contributors: \@bburda

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange