performance_test package from performance_test repoperformance_report performance_test performance_test_ros1_msgs performance_test_ros1_publisher |
|
Package Summary
Tags | No category tags. |
Version | 2.3.0 |
License | Apache 2.0 |
Build type | AMENT_CMAKE |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://gitlab.com/ApexAI/performance_test.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-09-24 |
Dev Status | MAINTAINED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Apex AI, Inc.
Authors
performance_test
[TOC]
The performance_test tool tests latency and other performance metrics of various middleware implementations that support a pub/sub pattern. It is used to simulate non-functional performance of your application.
The performance_test tool allows you to quickly set up a pub/sub configuration, e.g. number of publisher/subscribers, message size, QOS settings, middleware. The following metrics are automatically recorded when the application is running:
- latency: corresponds to the time a message takes to travel from a publisher to subscriber. The latency is measured by timestamping the sample when it’s published and subtracting the timestamp (from the sample) from the measured time when the sample arrives at the subscriber (only logged when a subscriber is created)
-
CPU usage: percentage of the total system wide CPU usage (logged separately for each instance of
perf_test
) -
resident memory: heap allocations, shared memory segments, stack (used for system’s internal
work) (logged separately for each instance of
perf_test
) - sample statistics: number of samples received, sent, and lost per experiment run.
This master
branch is compatible with the following ROS 2 versions
- rolling
- jazzy
- iron
- humble
- galactic
- foxy
- eloquent
- dashing
- Apex.OS
How to use this document
- Start here for a quick example of building and running the performance_test tool with the Cyclone DDS plugin.
- If needed, find more detailed information about building and running
- Or, if the quick example is good enough, skip ahead to the list of supported middleware plugins to learn how to test a specific middleware implementation.
- Check out the tools for visualizing the results
- If desired, read about the design and architecture of the tool.
Example
This example shows how to test the non-functional performance of the following configuration:
Option | Value |
---|---|
Plugin | Cyclone DDS |
Message type | Array1k |
Publishing rate | 100Hz |
Topic name | test_topic |
Duration of the experiment | 30s |
Number of publisher(s) | 1 (default) |
Number of subscriber(s) | 1 (default) |
-
Install ROS 2
-
Install Cyclone DDS to /opt/cyclonedds
-
Build performance_test with the CMake build flag for Cyclone DDS:
source /opt/ros/rolling/setup.bash
cd ~/perf_test_ws
colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
source ./install/setup.bash
- Run with the communication plugin option for Cyclone DDS:
mkdir experiment
./install/performance_test/lib/performance_test/perf_test --communication CycloneDDS
--msg Array1k
--rate 100
--topic test_topic
--max-runtime 30
--logfile experiment/log.csv
At the end of the experiment, a CSV log file will be generated in the experiment folder with a name
that starts with log
.
Building the performance_test tool
For a simple example, see Dockerfile.rclcpp.
The performance_test tool is structured as a ROS 2 package, so colcon
is used to build it.
Therefore, you must source a ROS 2 installation:
source /opt/ros/rolling/setup.bash
Select a middleware plugin from this list. Then build the performance_test tool with the selected middleware:
mkdir -p ~/perf_test_ws/src
cd ~/perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
# At this stage, you need to choose which middleware you want to use
# The list of available flags is described in the middleware plugins section
# Square brackets denote optional arguments, like in the Python documentation.
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_PLUGIN=<plugin>
source install/setup.bash
Running an experiment
The performance_test experiments are run through the perf_test
executable.
To find the available settings, run with --help
(note the required and default arguments):
~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help
- The
-c
argument should match the selected middleware plugin from the build phase. - The
--msg
argument should be one of the supported message types, which are shown in the--help
output.
Single machine or distributed system?
Based on the configuration you want to test, the usage of the performance_test tool differs. The different possibilities are explained below.
For running tests on a single machine, you can choose between the following options:
- Intraprocess means that the publisher and subscriber threads are in the same process.
perf_test <options> --num-sub-threads 1 --num-pub-threads 1
- Interprocess means that the publisher and subscriber are in different processes. To test interprocess communication, two instances of the performance_test must be run, e.g.
# Start the subscriber first
perf_test <options> --num-sub-threads 1 --num-pub-threads 0 &
sleep 1 # give the subscriber time to finish initializing
perf_test <options> --num-sub-threads 0 --num-pub-threads 1
On a distributed system, testing latency is difficult, because the clocks are probably not perfectly synchronized between the two devices. To work around this, the performance_test tool supports relay mode, which allows for a round-trip style of communication:
# On the main machine
perf_test <options> --roundtrip-mode Main
# On the relay machine:
perf_test <options> --roundtrip-mode Relay
In relay mode, the Main machine sends messages to the Relay machine, which immediately sends the messages back. The Main machine receives the relayed message, and reports the round-trip latency. Therefore, the reported latency will be roughly double the latency compared to the latency reported in non-relay mode.
Single machine, single thread
An intra-thread configuration is experimentally supported, in which a publisher and subscriber both operate in the same thread. The publisher writes a messages, and the subscriber immediately takes it.
perf_test <options> -e INTRA_THREAD
Notes:
- This is only available when zero copy transfer is enabled
- This requires exactly one publisher and one subscriber
- This is not compatible with roundtrip mode
Middleware plugins
The performance test tool can measure the performance of a variety of communication solutions from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines.
The performance_test tool implements an executor that runs the publisher(s) and/or the subscriber(s) in their own thread.
The following plugins are currently implemented:
Eclipse Cyclone DDS
- Eclipse Cyclone DDS 0.9.0b1
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
- Communication plugin:
-c CycloneDDS
- Docker file: Dockerfile.CycloneDDS
- Available transports:
- Cyclone DDS zero copy requires
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
--shared-memory
), LoanedSamples (--zero-copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero-copy
) | UDP |
- Cyclone DDS zero copy requires
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
Eclipse Cyclone DDS C++ binding
- Eclipse Cyclone DDS C++ bindings 0.9.0b1
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS_CXX
- Communication plugin:
-c CycloneDDS-CXX
- Docker file: Dockerfile.CycloneDDS-CXX
- Available transports:
- Cyclone DDS zero copy requires the
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
--shared-memory
), LoanedSamples (--zero-copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero-copy
) | UDP |
- Cyclone DDS zero copy requires the
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
Eclipse iceoryx
- iceoryx (latest master as of Feb 13)
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=ICEORYX
- Communication plugin:
-c iceoryx
- Docker file: Dockerfile.iceoryx
- The iceoryx plugin is not a DDS implementation.
- The DDS-specific options (such as domain ID, durability, and reliability) do not apply.
- To run with the iceoryx plugin, RouDi must be running.
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |———–|———————|———————————–| | LoanedSamples | LoanedSamples | Not supported by performance_test |
eProsima Fast DDS
- FastDDS 2.6.2
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=FASTDDS
- Communication plugin:
-c FastRTPS
- Docker file: Dockerfile.FastDDS
- Available transports:
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), LoanedSamples (
--zero-copy
) | SHMEM (default), LoanedSamples (--zero-copy
) | UDP |
OCI OpenDDS
- OpenDDS 3.13.2
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=OPENDDS
- Communication plugin:
-c OpenDDS
- Docker file: Dockerfile.OpenDDS
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | TCP | TCP | TCP |
RTI Connext DDS
- RTI Connext DDS 5.3.1+
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CONNEXTDDS
- Communication plugin:
-c ConnextDDS
- Docker file: Not available
- A license is required
- You need to source an RTI Connext DDS environment.
- If RTI Connext DDS was installed with ROS 2 (Linux only):
source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
- If RTI Connext DDS is installed separately, you can source the following script to set the
environment:
source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash
- If RTI Connext DDS was installed with ROS 2 (Linux only):
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |
RTI Connext DDS Micro
- Connext DDS Micro 3.0.3
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CONNEXTDDSMICRO
- Communication plugin:
-c ConnextDDSMicro
- Docker file: Not available
- A license is required
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |
Framework plugins
The performance_test tool can also measure the end-to-end latency of a framework. In this case, the executor of the framework is used to run the publisher(s) and/or the subscriber(s). The potential overhead of the rclcpp or rmw layer is measured.
ROS 2
The performance test tool can also measure the performance of a variety of RMW implementations,
through the ROS 2 rclcpp::publisher
and rclcpp::subscriber
API.
- ROS 2
rclcpp::publisher
andrclcpp::subscriber
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=ROS2
(default) - Communication plugin:
- Callback with Single Threaded Executor:
-c rclcpp-single-threaded-executor
- Callback with Static Single Threaded Executor:
-c rclcpp-static-single-threaded-executor
-
rclcpp::WaitSet
:-c rclcpp-waitset
- Callback with Single Threaded Executor:
- Docker file: Dockerfile.rclcpp
-
Available underlying RMW implementations:
- ROS 2 Rolling is pre-configured to use
rmw_fastrtps_cpp
- Follow these instructions to use a different RMW implementation
- ROS 2 Rolling is pre-configured to use
- Available transports: depends on underlying RMW implementation
- LoanedSamples are available (
--zero-copy
) forROS_DISTRO = foxy
and above
- LoanedSamples are available (
Apex.OS
- Apex.OS
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=APEX_OS
- It is also required to
source /opt/ApexOS/setup.bash
instead of a ROS 2 distribution
- It is also required to
- Communication plugin:
-c ApexOSPollingSubscription
- Docker file: Not available
- Available underlying RMW implementations:
rmw_apex_middleware
- Available transports:
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| UDP (default), SHMEM (
--shared-memory
), LoanedSamples (--zero_copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero_copy
) | UDP |
Analyze the results
After an experiment is run with the -l
flag, a log file is recorded. Both CSV
and JSON formats are supported. It is possible to add custom data to the log
file by setting theAPEX_PERFORMANCE_TEST
environment variable before running
an experiment, e.g.
# JSON format
export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"
Plot the results
To plot the results in the JSON or CSV log files, see the plotter README.
Architecture
Apex.AI’s Performance Testing in ROS 2 white paper (available here) describes how to design a fair and unbiased performance test, and is the basis for this project.
Each middleware has a different API. Thanks to the Plugin
abstraction, the core logic of
setting up and running an experiment is completely decoupled from the implementation details
of sending and receiving individual messages.
Exactly one Plugin
implementation is selected at build time. The design is similar to the
Abstract Factory pattern.
performance_test
declares, but does not define, a static factory method in the PluginFactory
class. Each middleware provides a definition for this factory method to create a concrete Plugin
implementation, and perf_test
calls this factory method directly.
An example plugin is available here.
Performance optimizations
- On linux-based platforms,
perf_test
writes0
to/dev/cpu_dma_latency
and holds open the file handle, which will prevent the CPU from entering any idle states for the duration of the experiment. This should result in lower message latency and lower variance in that latency.
Future extensions and limitations
- Communication frameworks like DDS have a huge amount of settings. This tool only allows the most common QOS settings to be configured. The other QOS settings are hardcoded in the application.
- Only one publisher per topic is allowed, because the data verification logic does not support matching data to the different publishers.
- Some communication plugins can get stuck in their internal loops if too much data is received. Figuring out ways around such issues is one of the goals of this tool.
- FastRTPS wait-set does not support timeouts which can lead to the receiving not aborting. In that case the performance test must be manually killed.
- Using Connext DDS Micro INTRA transport with
reliable
QoS and history kind set tokeep_all
is not supported with Connext Micro. Setkeep-last
as QoS history kind always when usingreliable
.
Possible additional communication which could be implemented are:
- Raw UDP communication
Building with limited resources
When building this tool, the compiler must perform a lot of template expansion. This can be overwhelming for a system with a low-power CPU or limited RAM. There are some additional CMake options which can reduce the system load during compilation:
- This tool includes many different message types, each with many different sizes. Reduce the number of
messages, and thus the compilation load, by disabling one or more message types. For example, to build
without
PointCloud
messages, add-DENABLE_MSGS_POINDCLOUD=OFF
to the--cmake-args
. The message types, and their options for enabling/disabling, can be found here.
Changelog for package performance_test
X.Y.Z (YYYY/MM/DD)
2.3.0 (2024/09/24)
Removed
- Moved
apex_performance_plotter
to its own package here
2.2.0 (2024/05/15)
Added
- performance_test can be built with ROS 2 Iron and Jazzy
Changed
- Renamed the
--dds-domain_id
CLI arg to--dds-domain-id
- When
--dds-domain-id
is unspecified, fall back to theROS_DOMAIN_ID
environment variable -
--zero-copy
has been separated into two flags:-
--shared-memory
: Enable shared-memory transfer in the plugin. This is meant to replace the need to manually set runtime flags viaCYCLONEDDS_URI
,APEX_MIDDLEWARE_SETTINGS
, etc. -
--loaned-samples
: When publishing messages in the plugin, borrow loaned samples instead of publishing by copy -
--zero-copy
is now an alias for--shared-memory --loaned-samples
- Supported plugins include:
-c CycloneDDS
-c CycloneDDS-CXX
-c ApexOSPollingSubscription
-
-c rclcpp-*
withRMW_IMPLEMENTATION=rmw_cyclonedds_cpp
-
-c rclcpp-*
withRMW_IMPLEMENTATION=rmw_fastrtps_cpp
-
2.1.0 (2024/04/17)
Added
- Add new function
prepare()
to the Publisher and Subscriber API, intended to allow participant discovery without blocking the main threadChanged
- Change the default
--history
arg fromKEEP_ALL
toKEEP_LAST
- Change the default
--history-depth
arg from1000
to16
- If
--expected-num-pubs
is unspecified, set it to the same value as-p
- If
--expected-num-subs
is unspecified, set it to the same value as-s
Fixed
- Removed an unused variable to fix a Clang build
- Remove unused variable names in the
Plugin
abstract class - Fix a potential lockup in PublisherTask on QNX
2.0.0 (2024/03/19)
Added
- Add experimental bazel support
bazel build //performance_test --//:plugin_implementation=//path/to/a/plugin
- Add a rudimentary socket-based plugin for testing the bazel support
-
bazel run //performance_test --//:plugin_implementation=//performance_test/plugins/demo:demo_plugin -- --help
Changed
-
- Instead of enabling/disabling each plugin, you select exactly one
with a CMake string option, for example:
colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=ROS2
- Renamed the
--communication
CLI arg to--communicator
. The short-c
is unchanged.Removed
- Removed the deprecated CLI flags for QOS settings:
- Instead of
--reliable
, use--reliability RELIABLE
- Instead of
--transient
, use--durability TRANSIENT_LOCAL
- Instead of
--keep-last
, use--history KEEP_LAST
- Instead of
- Removed the obsolete
BoundedSequenceFlat
messages - Removed the superfluous
--msg-list
CLI flag. The--help
message already lists the available messages.Fixed
- Update the Apex.OS Runner to use
executor_runner::deferred
instead ofexecutor_runner::deferred_tag()
- Ensure that the first few published samples are sent at the expected rate
1.5.2 (YYYY/MM/DD)
Added
-
--prevent-cpu-idle
is available on QNXChanged
- JSON log files will contain all values in the
APEX_PERFORMANCE_TEST
dictionary, instead of the five specific values used previously - Switch to build as C++17 by default
Fixed
- Zero copy transfer is again enabled for the rclcpp publisher
1.5.0 (2023/06/14)
Added
- New CLI switch
--prevent-cpu-idle
(linux only). When specified, perf_test will use/dev/cpu_dma_latency
to request that the CPU not enter any sleep states, to potentially give more consistent results - Some smaller
Array
messages, down to 32 bits - Added support to the FastDDS plugin for bounded and unbounded sequences
Changed
- Update the README to better explain how to use this tool with Apex.OS
- In the
Runner
, allocate theAnalysisResult
s on the stack instead of usingshared_ptr
-
Subscriber
methods accept a callback parameter, instead of returning avector
of results, to reduce heap usage - Refactored the interaction between
SubscriberStats
andAnalysisResult
to remove the need for astd::vector
of latency samples, to reduce heap usage - Adjusted the
Array
message sizes to make the name match the contents - Updated
apex_os_communicator
to use the new zero-copy API
1.4.2 (2023/03/15)
Added
- Added
perfplot
support for JSON log filesChanged
- Migrate the Apex.OS target to use
rosidl_get_typesupport_target
- Preallocate the JSON logger’s string buffer to prevent reallocations after the experiment begins
1.4.1 (2023/02/23)
Changed
- Updated the iceoryx plugin to the latest master as of Feb 13
1.4.0 (2023/02/20)
Added
- New message type
BoundedSequenceFlat
- This is a
BoundedSequence
with the@flat
annotation - Sizes range from 1kB to 8MB, like
Array
andBoundedSequence
Changed
- This is a
- Messages of different types can be optionally included via CMake args:
-
-DENABLE_MSGS_ARRAY
(default ON) -
-DENABLE_MSGS_STRUCT
(default ON) -
-DENABLE_MSGS_POINT_CLOUD
(default ON) -
-DENABLE_MSGS_BOUNDED_SEQUENCE
(default OFF) -
-DENABLE_MSGS_BOUNDED_SEQUENCE_FLAT
(default OFF) -
-DENABLE_MSGS_UNBOUNDED_SEQUENCE
(default OFF) -
-DENABLE_MSGS_ALL
(default OFF)- when ON, overrides the other defaults to ON
- you can still optionally exclude some messages by explicitly setting them to OFF
Removed
-
- Removed a few messages:
- Range
- RadarTrack
- RadarDetection
- NavSatFix
Fixed
- In all cases, including loaned messages, capture the timestamp as the last step of initializing the message
1.3.7 (2023/01/04)
1.3.6 (2023/01/03)
Fixed
- Set the correct
IDL_GEN_ROOT
for rclcpp plugins
1.3.5 (2022/12/05)
Fixed
- Exit cleanly when a publisher process terminates before a subscriber process
1.3.4 (2022/11/28)
Changed
- Updated Apex.OS plugins to use the unified
LoanedSample::data()
1.3.3 (2022/11/28)
Fixed
- Implement the missing
take()
method inApexOSPollingSubscriptionSubscriber
1.3.2 (2022/11/21)
Fixed
- Capture the
this
pointer in the lambda in the iceoryx publisher
1.3.1 (2022/11/21)
Added
- New Apex.OS plugin, compatible with the
ThreadedRunner
s- The
INTER_THREAD
andINTRA_THREAD
execution strategies, combined with-c ApexOSPollingSubscription
, will use theThreadedRunner
instances - The new
APEX_SINGLE_EXECUTOR
execution strategy will add all publishers and subscribers to a single Apex.OS Executor - The new
APEX_EXECUTOR_PER_COMMUNICATOR
execution strategy will add each publisher and each subscriber to its own Apex.OS Executor instance - The new
APEX_CHAIN
execution strategy will add a publisher and subscriber as a chain of nodes to an Apex.OS ExecutorChanged
- The
- Refactored FastRTPS communicator plugin:
- Uses DDS compliant API
- Code generator updated
- Implementation for
publish_loaned()
- Dockerfile improvements
Removed
- CLI arg
--disable-async
. Synchronous / asynchronous publishing should be configured externally depending on the communication mean used.
1.3.0 (2022/08/25)
Added
- New execution strategy option:
- The default
-e INTER_THREAD
runs each publisher and subscriber in its own separate thread, which matches the previous behavior - A new
-e INTRA_THREAD
, which runs a single publisher and subscriber in the same thread. The publisher writes, and the subscriber immediately takes it - For Apex.OS specifically, some optimized execution strategies which use the
proprietary Apex.OS executor
Changed
- The default
- Significantly refactored the communicator plugins:
- Each plugin is split into an implementation of a
Publisher
and aSubscriber
, instead of a singleCommunicator
- The plugin is no longer responsible for managing the metrics, such as sample count, lost samples, and latency
- The plugin does not require any special logic to support roundtrip mode
- It is safe for the plugins to initialize their data writers and readers
at construction time, instead of delaying the initialization to the first
call of
publish()
orupdate_subscription()
- Split
publish()
intopublish_copy()
andpublish_loaned()
- Each plugin is split into an implementation of a
- Significantly refactored the runner framework:
- The runner framework is responsible for the experiment metrics
- It manages the roundtrip mode logic
- It is extensible for different execution strategies or thread configurations
- The iceoryx plugin now uses the untyped API, for improved performance
1.2.1 (2022/06/30)
Fixed
- Capture the timestamp as soon as a message is received, instead of just before storing the metrics, to reduce the reported latency to a more correct value
1.2.0 (2022/06/28)
Changed
- The CLI arguments for specifying the output type have changed:
- For console output, updated every second, add
--print-to-console
- For file output, use
--logfile my_file.csv
or--logfile my_file.json
- The type will be deduced from the file name
- If neither of these options is specified, then a warning will print, and the experiment will still run
- For console output, updated every second, add
- The linter configurations are now configured locally. This means that the output
of
colcon test
should be the same no matter the installed ROS distribution. - The
--zero-copy
arg is now valid even if the publisher and subscriber(s) are in the same processRemoved
- The publisher and subscriber loop reserve metrics are no longer recorded or reported
Fixed
- CPU usage will no longer be stuck at
0
Removed
- The pub/sub loop reserve time metrics
1.1.2 (2022/06/08)
Changed
- Use
steady_clock
for all platforms, including QNX QOS
1.1.1 (2022/06/07)
Changed
- Significant refactor to simplify the analysis pipeline
Fixed
- Add some missing definitions when Apex.OS is enabled, but the rclcpp plugins are disabled
1.1.0 (2022/06/02)
Added
- New Apex.OS Polling Subscription plugin
- Compatibility with ROS2 Humble
1.0.0 (2022/05/12)
Added
- More expressive perf_test CLI args for QOS settings
- A plugin for Cyclone DDS with C++ bindings v0.9.0b1
Changed
- CLI args for QOS settings:
--reliability <RELIABLE|BEST_EFFORT>
--durability <TRANSIENT_LOCAL|VOLATILE>
--history <KEEP_LAST|KEEP_ALL>
-
master
branch is compatible with many ROS2 distributions:- dashing
- eloquent
- foxy
- galactic
- rolling
Deprecated
- CLI flags for QOS settings:
--reliable
--transient
-
--keep-last
Removed
- The branches for specific ROS2 distributions have been deleted
Fixed
- CI jobs and Dockerfiles are decoupled from the middleware bundled with the ROS2 distribution
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
rclcpp | |
ros_environment | |
ament_cmake | |
rosidl_default_generators | |
rmw_implementation | |
rosidl_default_runtime | |
ament_cmake_gtest | |
ament_lint_auto | |
ament_lint_common |
System Dependencies
Name |
---|
git |
Dependant Packages
Name | Deps |
---|---|
performance_report |
Launch files
Messages
Services
Plugins
Recent questions tagged performance_test at Robotics Stack Exchange
performance_test package from performance_test repoperformance_report performance_test performance_test_ros1_msgs performance_test_ros1_publisher |
|
Package Summary
Tags | No category tags. |
Version | 2.3.0 |
License | Apache 2.0 |
Build type | AMENT_CMAKE |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://gitlab.com/ApexAI/performance_test.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-09-24 |
Dev Status | MAINTAINED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Apex AI, Inc.
Authors
performance_test
[TOC]
The performance_test tool tests latency and other performance metrics of various middleware implementations that support a pub/sub pattern. It is used to simulate non-functional performance of your application.
The performance_test tool allows you to quickly set up a pub/sub configuration, e.g. number of publisher/subscribers, message size, QOS settings, middleware. The following metrics are automatically recorded when the application is running:
- latency: corresponds to the time a message takes to travel from a publisher to subscriber. The latency is measured by timestamping the sample when it’s published and subtracting the timestamp (from the sample) from the measured time when the sample arrives at the subscriber (only logged when a subscriber is created)
-
CPU usage: percentage of the total system wide CPU usage (logged separately for each instance of
perf_test
) -
resident memory: heap allocations, shared memory segments, stack (used for system’s internal
work) (logged separately for each instance of
perf_test
) - sample statistics: number of samples received, sent, and lost per experiment run.
This master
branch is compatible with the following ROS 2 versions
- rolling
- jazzy
- iron
- humble
- galactic
- foxy
- eloquent
- dashing
- Apex.OS
How to use this document
- Start here for a quick example of building and running the performance_test tool with the Cyclone DDS plugin.
- If needed, find more detailed information about building and running
- Or, if the quick example is good enough, skip ahead to the list of supported middleware plugins to learn how to test a specific middleware implementation.
- Check out the tools for visualizing the results
- If desired, read about the design and architecture of the tool.
Example
This example shows how to test the non-functional performance of the following configuration:
Option | Value |
---|---|
Plugin | Cyclone DDS |
Message type | Array1k |
Publishing rate | 100Hz |
Topic name | test_topic |
Duration of the experiment | 30s |
Number of publisher(s) | 1 (default) |
Number of subscriber(s) | 1 (default) |
-
Install ROS 2
-
Install Cyclone DDS to /opt/cyclonedds
-
Build performance_test with the CMake build flag for Cyclone DDS:
source /opt/ros/rolling/setup.bash
cd ~/perf_test_ws
colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
source ./install/setup.bash
- Run with the communication plugin option for Cyclone DDS:
mkdir experiment
./install/performance_test/lib/performance_test/perf_test --communication CycloneDDS
--msg Array1k
--rate 100
--topic test_topic
--max-runtime 30
--logfile experiment/log.csv
At the end of the experiment, a CSV log file will be generated in the experiment folder with a name
that starts with log
.
Building the performance_test tool
For a simple example, see Dockerfile.rclcpp.
The performance_test tool is structured as a ROS 2 package, so colcon
is used to build it.
Therefore, you must source a ROS 2 installation:
source /opt/ros/rolling/setup.bash
Select a middleware plugin from this list. Then build the performance_test tool with the selected middleware:
mkdir -p ~/perf_test_ws/src
cd ~/perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
# At this stage, you need to choose which middleware you want to use
# The list of available flags is described in the middleware plugins section
# Square brackets denote optional arguments, like in the Python documentation.
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_PLUGIN=<plugin>
source install/setup.bash
Running an experiment
The performance_test experiments are run through the perf_test
executable.
To find the available settings, run with --help
(note the required and default arguments):
~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help
- The
-c
argument should match the selected middleware plugin from the build phase. - The
--msg
argument should be one of the supported message types, which are shown in the--help
output.
Single machine or distributed system?
Based on the configuration you want to test, the usage of the performance_test tool differs. The different possibilities are explained below.
For running tests on a single machine, you can choose between the following options:
- Intraprocess means that the publisher and subscriber threads are in the same process.
perf_test <options> --num-sub-threads 1 --num-pub-threads 1
- Interprocess means that the publisher and subscriber are in different processes. To test interprocess communication, two instances of the performance_test must be run, e.g.
# Start the subscriber first
perf_test <options> --num-sub-threads 1 --num-pub-threads 0 &
sleep 1 # give the subscriber time to finish initializing
perf_test <options> --num-sub-threads 0 --num-pub-threads 1
On a distributed system, testing latency is difficult, because the clocks are probably not perfectly synchronized between the two devices. To work around this, the performance_test tool supports relay mode, which allows for a round-trip style of communication:
# On the main machine
perf_test <options> --roundtrip-mode Main
# On the relay machine:
perf_test <options> --roundtrip-mode Relay
In relay mode, the Main machine sends messages to the Relay machine, which immediately sends the messages back. The Main machine receives the relayed message, and reports the round-trip latency. Therefore, the reported latency will be roughly double the latency compared to the latency reported in non-relay mode.
Single machine, single thread
An intra-thread configuration is experimentally supported, in which a publisher and subscriber both operate in the same thread. The publisher writes a messages, and the subscriber immediately takes it.
perf_test <options> -e INTRA_THREAD
Notes:
- This is only available when zero copy transfer is enabled
- This requires exactly one publisher and one subscriber
- This is not compatible with roundtrip mode
Middleware plugins
The performance test tool can measure the performance of a variety of communication solutions from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines.
The performance_test tool implements an executor that runs the publisher(s) and/or the subscriber(s) in their own thread.
The following plugins are currently implemented:
Eclipse Cyclone DDS
- Eclipse Cyclone DDS 0.9.0b1
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
- Communication plugin:
-c CycloneDDS
- Docker file: Dockerfile.CycloneDDS
- Available transports:
- Cyclone DDS zero copy requires
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
--shared-memory
), LoanedSamples (--zero-copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero-copy
) | UDP |
- Cyclone DDS zero copy requires
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
Eclipse Cyclone DDS C++ binding
- Eclipse Cyclone DDS C++ bindings 0.9.0b1
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS_CXX
- Communication plugin:
-c CycloneDDS-CXX
- Docker file: Dockerfile.CycloneDDS-CXX
- Available transports:
- Cyclone DDS zero copy requires the
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
--shared-memory
), LoanedSamples (--zero-copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero-copy
) | UDP |
- Cyclone DDS zero copy requires the
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
Eclipse iceoryx
- iceoryx (latest master as of Feb 13)
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=ICEORYX
- Communication plugin:
-c iceoryx
- Docker file: Dockerfile.iceoryx
- The iceoryx plugin is not a DDS implementation.
- The DDS-specific options (such as domain ID, durability, and reliability) do not apply.
- To run with the iceoryx plugin, RouDi must be running.
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |———–|———————|———————————–| | LoanedSamples | LoanedSamples | Not supported by performance_test |
eProsima Fast DDS
- FastDDS 2.6.2
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=FASTDDS
- Communication plugin:
-c FastRTPS
- Docker file: Dockerfile.FastDDS
- Available transports:
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), LoanedSamples (
--zero-copy
) | SHMEM (default), LoanedSamples (--zero-copy
) | UDP |
OCI OpenDDS
- OpenDDS 3.13.2
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=OPENDDS
- Communication plugin:
-c OpenDDS
- Docker file: Dockerfile.OpenDDS
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | TCP | TCP | TCP |
RTI Connext DDS
- RTI Connext DDS 5.3.1+
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CONNEXTDDS
- Communication plugin:
-c ConnextDDS
- Docker file: Not available
- A license is required
- You need to source an RTI Connext DDS environment.
- If RTI Connext DDS was installed with ROS 2 (Linux only):
source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
- If RTI Connext DDS is installed separately, you can source the following script to set the
environment:
source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash
- If RTI Connext DDS was installed with ROS 2 (Linux only):
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |
RTI Connext DDS Micro
- Connext DDS Micro 3.0.3
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CONNEXTDDSMICRO
- Communication plugin:
-c ConnextDDSMicro
- Docker file: Not available
- A license is required
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |
Framework plugins
The performance_test tool can also measure the end-to-end latency of a framework. In this case, the executor of the framework is used to run the publisher(s) and/or the subscriber(s). The potential overhead of the rclcpp or rmw layer is measured.
ROS 2
The performance test tool can also measure the performance of a variety of RMW implementations,
through the ROS 2 rclcpp::publisher
and rclcpp::subscriber
API.
- ROS 2
rclcpp::publisher
andrclcpp::subscriber
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=ROS2
(default) - Communication plugin:
- Callback with Single Threaded Executor:
-c rclcpp-single-threaded-executor
- Callback with Static Single Threaded Executor:
-c rclcpp-static-single-threaded-executor
-
rclcpp::WaitSet
:-c rclcpp-waitset
- Callback with Single Threaded Executor:
- Docker file: Dockerfile.rclcpp
-
Available underlying RMW implementations:
- ROS 2 Rolling is pre-configured to use
rmw_fastrtps_cpp
- Follow these instructions to use a different RMW implementation
- ROS 2 Rolling is pre-configured to use
- Available transports: depends on underlying RMW implementation
- LoanedSamples are available (
--zero-copy
) forROS_DISTRO = foxy
and above
- LoanedSamples are available (
Apex.OS
- Apex.OS
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=APEX_OS
- It is also required to
source /opt/ApexOS/setup.bash
instead of a ROS 2 distribution
- It is also required to
- Communication plugin:
-c ApexOSPollingSubscription
- Docker file: Not available
- Available underlying RMW implementations:
rmw_apex_middleware
- Available transports:
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| UDP (default), SHMEM (
--shared-memory
), LoanedSamples (--zero_copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero_copy
) | UDP |
Analyze the results
After an experiment is run with the -l
flag, a log file is recorded. Both CSV
and JSON formats are supported. It is possible to add custom data to the log
file by setting theAPEX_PERFORMANCE_TEST
environment variable before running
an experiment, e.g.
# JSON format
export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"
Plot the results
To plot the results in the JSON or CSV log files, see the plotter README.
Architecture
Apex.AI’s Performance Testing in ROS 2 white paper (available here) describes how to design a fair and unbiased performance test, and is the basis for this project.
Each middleware has a different API. Thanks to the Plugin
abstraction, the core logic of
setting up and running an experiment is completely decoupled from the implementation details
of sending and receiving individual messages.
Exactly one Plugin
implementation is selected at build time. The design is similar to the
Abstract Factory pattern.
performance_test
declares, but does not define, a static factory method in the PluginFactory
class. Each middleware provides a definition for this factory method to create a concrete Plugin
implementation, and perf_test
calls this factory method directly.
An example plugin is available here.
Performance optimizations
- On linux-based platforms,
perf_test
writes0
to/dev/cpu_dma_latency
and holds open the file handle, which will prevent the CPU from entering any idle states for the duration of the experiment. This should result in lower message latency and lower variance in that latency.
Future extensions and limitations
- Communication frameworks like DDS have a huge amount of settings. This tool only allows the most common QOS settings to be configured. The other QOS settings are hardcoded in the application.
- Only one publisher per topic is allowed, because the data verification logic does not support matching data to the different publishers.
- Some communication plugins can get stuck in their internal loops if too much data is received. Figuring out ways around such issues is one of the goals of this tool.
- FastRTPS wait-set does not support timeouts which can lead to the receiving not aborting. In that case the performance test must be manually killed.
- Using Connext DDS Micro INTRA transport with
reliable
QoS and history kind set tokeep_all
is not supported with Connext Micro. Setkeep-last
as QoS history kind always when usingreliable
.
Possible additional communication which could be implemented are:
- Raw UDP communication
Building with limited resources
When building this tool, the compiler must perform a lot of template expansion. This can be overwhelming for a system with a low-power CPU or limited RAM. There are some additional CMake options which can reduce the system load during compilation:
- This tool includes many different message types, each with many different sizes. Reduce the number of
messages, and thus the compilation load, by disabling one or more message types. For example, to build
without
PointCloud
messages, add-DENABLE_MSGS_POINDCLOUD=OFF
to the--cmake-args
. The message types, and their options for enabling/disabling, can be found here.
Changelog for package performance_test
X.Y.Z (YYYY/MM/DD)
2.3.0 (2024/09/24)
Removed
- Moved
apex_performance_plotter
to its own package here
2.2.0 (2024/05/15)
Added
- performance_test can be built with ROS 2 Iron and Jazzy
Changed
- Renamed the
--dds-domain_id
CLI arg to--dds-domain-id
- When
--dds-domain-id
is unspecified, fall back to theROS_DOMAIN_ID
environment variable -
--zero-copy
has been separated into two flags:-
--shared-memory
: Enable shared-memory transfer in the plugin. This is meant to replace the need to manually set runtime flags viaCYCLONEDDS_URI
,APEX_MIDDLEWARE_SETTINGS
, etc. -
--loaned-samples
: When publishing messages in the plugin, borrow loaned samples instead of publishing by copy -
--zero-copy
is now an alias for--shared-memory --loaned-samples
- Supported plugins include:
-c CycloneDDS
-c CycloneDDS-CXX
-c ApexOSPollingSubscription
-
-c rclcpp-*
withRMW_IMPLEMENTATION=rmw_cyclonedds_cpp
-
-c rclcpp-*
withRMW_IMPLEMENTATION=rmw_fastrtps_cpp
-
2.1.0 (2024/04/17)
Added
- Add new function
prepare()
to the Publisher and Subscriber API, intended to allow participant discovery without blocking the main threadChanged
- Change the default
--history
arg fromKEEP_ALL
toKEEP_LAST
- Change the default
--history-depth
arg from1000
to16
- If
--expected-num-pubs
is unspecified, set it to the same value as-p
- If
--expected-num-subs
is unspecified, set it to the same value as-s
Fixed
- Removed an unused variable to fix a Clang build
- Remove unused variable names in the
Plugin
abstract class - Fix a potential lockup in PublisherTask on QNX
2.0.0 (2024/03/19)
Added
- Add experimental bazel support
bazel build //performance_test --//:plugin_implementation=//path/to/a/plugin
- Add a rudimentary socket-based plugin for testing the bazel support
-
bazel run //performance_test --//:plugin_implementation=//performance_test/plugins/demo:demo_plugin -- --help
Changed
-
- Instead of enabling/disabling each plugin, you select exactly one
with a CMake string option, for example:
colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=ROS2
- Renamed the
--communication
CLI arg to--communicator
. The short-c
is unchanged.Removed
- Removed the deprecated CLI flags for QOS settings:
- Instead of
--reliable
, use--reliability RELIABLE
- Instead of
--transient
, use--durability TRANSIENT_LOCAL
- Instead of
--keep-last
, use--history KEEP_LAST
- Instead of
- Removed the obsolete
BoundedSequenceFlat
messages - Removed the superfluous
--msg-list
CLI flag. The--help
message already lists the available messages.Fixed
- Update the Apex.OS Runner to use
executor_runner::deferred
instead ofexecutor_runner::deferred_tag()
- Ensure that the first few published samples are sent at the expected rate
1.5.2 (YYYY/MM/DD)
Added
-
--prevent-cpu-idle
is available on QNXChanged
- JSON log files will contain all values in the
APEX_PERFORMANCE_TEST
dictionary, instead of the five specific values used previously - Switch to build as C++17 by default
Fixed
- Zero copy transfer is again enabled for the rclcpp publisher
1.5.0 (2023/06/14)
Added
- New CLI switch
--prevent-cpu-idle
(linux only). When specified, perf_test will use/dev/cpu_dma_latency
to request that the CPU not enter any sleep states, to potentially give more consistent results - Some smaller
Array
messages, down to 32 bits - Added support to the FastDDS plugin for bounded and unbounded sequences
Changed
- Update the README to better explain how to use this tool with Apex.OS
- In the
Runner
, allocate theAnalysisResult
s on the stack instead of usingshared_ptr
-
Subscriber
methods accept a callback parameter, instead of returning avector
of results, to reduce heap usage - Refactored the interaction between
SubscriberStats
andAnalysisResult
to remove the need for astd::vector
of latency samples, to reduce heap usage - Adjusted the
Array
message sizes to make the name match the contents - Updated
apex_os_communicator
to use the new zero-copy API
1.4.2 (2023/03/15)
Added
- Added
perfplot
support for JSON log filesChanged
- Migrate the Apex.OS target to use
rosidl_get_typesupport_target
- Preallocate the JSON logger’s string buffer to prevent reallocations after the experiment begins
1.4.1 (2023/02/23)
Changed
- Updated the iceoryx plugin to the latest master as of Feb 13
1.4.0 (2023/02/20)
Added
- New message type
BoundedSequenceFlat
- This is a
BoundedSequence
with the@flat
annotation - Sizes range from 1kB to 8MB, like
Array
andBoundedSequence
Changed
- This is a
- Messages of different types can be optionally included via CMake args:
-
-DENABLE_MSGS_ARRAY
(default ON) -
-DENABLE_MSGS_STRUCT
(default ON) -
-DENABLE_MSGS_POINT_CLOUD
(default ON) -
-DENABLE_MSGS_BOUNDED_SEQUENCE
(default OFF) -
-DENABLE_MSGS_BOUNDED_SEQUENCE_FLAT
(default OFF) -
-DENABLE_MSGS_UNBOUNDED_SEQUENCE
(default OFF) -
-DENABLE_MSGS_ALL
(default OFF)- when ON, overrides the other defaults to ON
- you can still optionally exclude some messages by explicitly setting them to OFF
Removed
-
- Removed a few messages:
- Range
- RadarTrack
- RadarDetection
- NavSatFix
Fixed
- In all cases, including loaned messages, capture the timestamp as the last step of initializing the message
1.3.7 (2023/01/04)
1.3.6 (2023/01/03)
Fixed
- Set the correct
IDL_GEN_ROOT
for rclcpp plugins
1.3.5 (2022/12/05)
Fixed
- Exit cleanly when a publisher process terminates before a subscriber process
1.3.4 (2022/11/28)
Changed
- Updated Apex.OS plugins to use the unified
LoanedSample::data()
1.3.3 (2022/11/28)
Fixed
- Implement the missing
take()
method inApexOSPollingSubscriptionSubscriber
1.3.2 (2022/11/21)
Fixed
- Capture the
this
pointer in the lambda in the iceoryx publisher
1.3.1 (2022/11/21)
Added
- New Apex.OS plugin, compatible with the
ThreadedRunner
s- The
INTER_THREAD
andINTRA_THREAD
execution strategies, combined with-c ApexOSPollingSubscription
, will use theThreadedRunner
instances - The new
APEX_SINGLE_EXECUTOR
execution strategy will add all publishers and subscribers to a single Apex.OS Executor - The new
APEX_EXECUTOR_PER_COMMUNICATOR
execution strategy will add each publisher and each subscriber to its own Apex.OS Executor instance - The new
APEX_CHAIN
execution strategy will add a publisher and subscriber as a chain of nodes to an Apex.OS ExecutorChanged
- The
- Refactored FastRTPS communicator plugin:
- Uses DDS compliant API
- Code generator updated
- Implementation for
publish_loaned()
- Dockerfile improvements
Removed
- CLI arg
--disable-async
. Synchronous / asynchronous publishing should be configured externally depending on the communication mean used.
1.3.0 (2022/08/25)
Added
- New execution strategy option:
- The default
-e INTER_THREAD
runs each publisher and subscriber in its own separate thread, which matches the previous behavior - A new
-e INTRA_THREAD
, which runs a single publisher and subscriber in the same thread. The publisher writes, and the subscriber immediately takes it - For Apex.OS specifically, some optimized execution strategies which use the
proprietary Apex.OS executor
Changed
- The default
- Significantly refactored the communicator plugins:
- Each plugin is split into an implementation of a
Publisher
and aSubscriber
, instead of a singleCommunicator
- The plugin is no longer responsible for managing the metrics, such as sample count, lost samples, and latency
- The plugin does not require any special logic to support roundtrip mode
- It is safe for the plugins to initialize their data writers and readers
at construction time, instead of delaying the initialization to the first
call of
publish()
orupdate_subscription()
- Split
publish()
intopublish_copy()
andpublish_loaned()
- Each plugin is split into an implementation of a
- Significantly refactored the runner framework:
- The runner framework is responsible for the experiment metrics
- It manages the roundtrip mode logic
- It is extensible for different execution strategies or thread configurations
- The iceoryx plugin now uses the untyped API, for improved performance
1.2.1 (2022/06/30)
Fixed
- Capture the timestamp as soon as a message is received, instead of just before storing the metrics, to reduce the reported latency to a more correct value
1.2.0 (2022/06/28)
Changed
- The CLI arguments for specifying the output type have changed:
- For console output, updated every second, add
--print-to-console
- For file output, use
--logfile my_file.csv
or--logfile my_file.json
- The type will be deduced from the file name
- If neither of these options is specified, then a warning will print, and the experiment will still run
- For console output, updated every second, add
- The linter configurations are now configured locally. This means that the output
of
colcon test
should be the same no matter the installed ROS distribution. - The
--zero-copy
arg is now valid even if the publisher and subscriber(s) are in the same processRemoved
- The publisher and subscriber loop reserve metrics are no longer recorded or reported
Fixed
- CPU usage will no longer be stuck at
0
Removed
- The pub/sub loop reserve time metrics
1.1.2 (2022/06/08)
Changed
- Use
steady_clock
for all platforms, including QNX QOS
1.1.1 (2022/06/07)
Changed
- Significant refactor to simplify the analysis pipeline
Fixed
- Add some missing definitions when Apex.OS is enabled, but the rclcpp plugins are disabled
1.1.0 (2022/06/02)
Added
- New Apex.OS Polling Subscription plugin
- Compatibility with ROS2 Humble
1.0.0 (2022/05/12)
Added
- More expressive perf_test CLI args for QOS settings
- A plugin for Cyclone DDS with C++ bindings v0.9.0b1
Changed
- CLI args for QOS settings:
--reliability <RELIABLE|BEST_EFFORT>
--durability <TRANSIENT_LOCAL|VOLATILE>
--history <KEEP_LAST|KEEP_ALL>
-
master
branch is compatible with many ROS2 distributions:- dashing
- eloquent
- foxy
- galactic
- rolling
Deprecated
- CLI flags for QOS settings:
--reliable
--transient
-
--keep-last
Removed
- The branches for specific ROS2 distributions have been deleted
Fixed
- CI jobs and Dockerfiles are decoupled from the middleware bundled with the ROS2 distribution
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
rclcpp | |
ros_environment | |
ament_cmake | |
rosidl_default_generators | |
rmw_implementation | |
rosidl_default_runtime | |
ament_cmake_gtest | |
ament_lint_auto | |
ament_lint_common |
System Dependencies
Name |
---|
git |
Dependant Packages
Name | Deps |
---|---|
performance_report |
Launch files
Messages
Services
Plugins
Recent questions tagged performance_test at Robotics Stack Exchange
performance_test package from performance_test repoperformance_report performance_test performance_test_ros1_msgs performance_test_ros1_publisher |
|
Package Summary
Tags | No category tags. |
Version | 2.3.0 |
License | Apache 2.0 |
Build type | AMENT_CMAKE |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://gitlab.com/ApexAI/performance_test.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-09-24 |
Dev Status | MAINTAINED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Apex AI, Inc.
Authors
performance_test
[TOC]
The performance_test tool tests latency and other performance metrics of various middleware implementations that support a pub/sub pattern. It is used to simulate non-functional performance of your application.
The performance_test tool allows you to quickly set up a pub/sub configuration, e.g. number of publisher/subscribers, message size, QOS settings, middleware. The following metrics are automatically recorded when the application is running:
- latency: corresponds to the time a message takes to travel from a publisher to subscriber. The latency is measured by timestamping the sample when it’s published and subtracting the timestamp (from the sample) from the measured time when the sample arrives at the subscriber (only logged when a subscriber is created)
-
CPU usage: percentage of the total system wide CPU usage (logged separately for each instance of
perf_test
) -
resident memory: heap allocations, shared memory segments, stack (used for system’s internal
work) (logged separately for each instance of
perf_test
) - sample statistics: number of samples received, sent, and lost per experiment run.
This master
branch is compatible with the following ROS 2 versions
- rolling
- jazzy
- iron
- humble
- galactic
- foxy
- eloquent
- dashing
- Apex.OS
How to use this document
- Start here for a quick example of building and running the performance_test tool with the Cyclone DDS plugin.
- If needed, find more detailed information about building and running
- Or, if the quick example is good enough, skip ahead to the list of supported middleware plugins to learn how to test a specific middleware implementation.
- Check out the tools for visualizing the results
- If desired, read about the design and architecture of the tool.
Example
This example shows how to test the non-functional performance of the following configuration:
Option | Value |
---|---|
Plugin | Cyclone DDS |
Message type | Array1k |
Publishing rate | 100Hz |
Topic name | test_topic |
Duration of the experiment | 30s |
Number of publisher(s) | 1 (default) |
Number of subscriber(s) | 1 (default) |
-
Install ROS 2
-
Install Cyclone DDS to /opt/cyclonedds
-
Build performance_test with the CMake build flag for Cyclone DDS:
source /opt/ros/rolling/setup.bash
cd ~/perf_test_ws
colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
source ./install/setup.bash
- Run with the communication plugin option for Cyclone DDS:
mkdir experiment
./install/performance_test/lib/performance_test/perf_test --communication CycloneDDS
--msg Array1k
--rate 100
--topic test_topic
--max-runtime 30
--logfile experiment/log.csv
At the end of the experiment, a CSV log file will be generated in the experiment folder with a name
that starts with log
.
Building the performance_test tool
For a simple example, see Dockerfile.rclcpp.
The performance_test tool is structured as a ROS 2 package, so colcon
is used to build it.
Therefore, you must source a ROS 2 installation:
source /opt/ros/rolling/setup.bash
Select a middleware plugin from this list. Then build the performance_test tool with the selected middleware:
mkdir -p ~/perf_test_ws/src
cd ~/perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
# At this stage, you need to choose which middleware you want to use
# The list of available flags is described in the middleware plugins section
# Square brackets denote optional arguments, like in the Python documentation.
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_PLUGIN=<plugin>
source install/setup.bash
Running an experiment
The performance_test experiments are run through the perf_test
executable.
To find the available settings, run with --help
(note the required and default arguments):
~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help
- The
-c
argument should match the selected middleware plugin from the build phase. - The
--msg
argument should be one of the supported message types, which are shown in the--help
output.
Single machine or distributed system?
Based on the configuration you want to test, the usage of the performance_test tool differs. The different possibilities are explained below.
For running tests on a single machine, you can choose between the following options:
- Intraprocess means that the publisher and subscriber threads are in the same process.
perf_test <options> --num-sub-threads 1 --num-pub-threads 1
- Interprocess means that the publisher and subscriber are in different processes. To test interprocess communication, two instances of the performance_test must be run, e.g.
# Start the subscriber first
perf_test <options> --num-sub-threads 1 --num-pub-threads 0 &
sleep 1 # give the subscriber time to finish initializing
perf_test <options> --num-sub-threads 0 --num-pub-threads 1
On a distributed system, testing latency is difficult, because the clocks are probably not perfectly synchronized between the two devices. To work around this, the performance_test tool supports relay mode, which allows for a round-trip style of communication:
# On the main machine
perf_test <options> --roundtrip-mode Main
# On the relay machine:
perf_test <options> --roundtrip-mode Relay
In relay mode, the Main machine sends messages to the Relay machine, which immediately sends the messages back. The Main machine receives the relayed message, and reports the round-trip latency. Therefore, the reported latency will be roughly double the latency compared to the latency reported in non-relay mode.
Single machine, single thread
An intra-thread configuration is experimentally supported, in which a publisher and subscriber both operate in the same thread. The publisher writes a messages, and the subscriber immediately takes it.
perf_test <options> -e INTRA_THREAD
Notes:
- This is only available when zero copy transfer is enabled
- This requires exactly one publisher and one subscriber
- This is not compatible with roundtrip mode
Middleware plugins
The performance test tool can measure the performance of a variety of communication solutions from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines.
The performance_test tool implements an executor that runs the publisher(s) and/or the subscriber(s) in their own thread.
The following plugins are currently implemented:
Eclipse Cyclone DDS
- Eclipse Cyclone DDS 0.9.0b1
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
- Communication plugin:
-c CycloneDDS
- Docker file: Dockerfile.CycloneDDS
- Available transports:
- Cyclone DDS zero copy requires
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
--shared-memory
), LoanedSamples (--zero-copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero-copy
) | UDP |
- Cyclone DDS zero copy requires
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
Eclipse Cyclone DDS C++ binding
- Eclipse Cyclone DDS C++ bindings 0.9.0b1
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS_CXX
- Communication plugin:
-c CycloneDDS-CXX
- Docker file: Dockerfile.CycloneDDS-CXX
- Available transports:
- Cyclone DDS zero copy requires the
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
--shared-memory
), LoanedSamples (--zero-copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero-copy
) | UDP |
- Cyclone DDS zero copy requires the
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
Eclipse iceoryx
- iceoryx (latest master as of Feb 13)
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=ICEORYX
- Communication plugin:
-c iceoryx
- Docker file: Dockerfile.iceoryx
- The iceoryx plugin is not a DDS implementation.
- The DDS-specific options (such as domain ID, durability, and reliability) do not apply.
- To run with the iceoryx plugin, RouDi must be running.
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |———–|———————|———————————–| | LoanedSamples | LoanedSamples | Not supported by performance_test |
eProsima Fast DDS
- FastDDS 2.6.2
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=FASTDDS
- Communication plugin:
-c FastRTPS
- Docker file: Dockerfile.FastDDS
- Available transports:
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), LoanedSamples (
--zero-copy
) | SHMEM (default), LoanedSamples (--zero-copy
) | UDP |
OCI OpenDDS
- OpenDDS 3.13.2
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=OPENDDS
- Communication plugin:
-c OpenDDS
- Docker file: Dockerfile.OpenDDS
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | TCP | TCP | TCP |
RTI Connext DDS
- RTI Connext DDS 5.3.1+
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CONNEXTDDS
- Communication plugin:
-c ConnextDDS
- Docker file: Not available
- A license is required
- You need to source an RTI Connext DDS environment.
- If RTI Connext DDS was installed with ROS 2 (Linux only):
source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
- If RTI Connext DDS is installed separately, you can source the following script to set the
environment:
source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash
- If RTI Connext DDS was installed with ROS 2 (Linux only):
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |
RTI Connext DDS Micro
- Connext DDS Micro 3.0.3
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CONNEXTDDSMICRO
- Communication plugin:
-c ConnextDDSMicro
- Docker file: Not available
- A license is required
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |
Framework plugins
The performance_test tool can also measure the end-to-end latency of a framework. In this case, the executor of the framework is used to run the publisher(s) and/or the subscriber(s). The potential overhead of the rclcpp or rmw layer is measured.
ROS 2
The performance test tool can also measure the performance of a variety of RMW implementations,
through the ROS 2 rclcpp::publisher
and rclcpp::subscriber
API.
- ROS 2
rclcpp::publisher
andrclcpp::subscriber
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=ROS2
(default) - Communication plugin:
- Callback with Single Threaded Executor:
-c rclcpp-single-threaded-executor
- Callback with Static Single Threaded Executor:
-c rclcpp-static-single-threaded-executor
-
rclcpp::WaitSet
:-c rclcpp-waitset
- Callback with Single Threaded Executor:
- Docker file: Dockerfile.rclcpp
-
Available underlying RMW implementations:
- ROS 2 Rolling is pre-configured to use
rmw_fastrtps_cpp
- Follow these instructions to use a different RMW implementation
- ROS 2 Rolling is pre-configured to use
- Available transports: depends on underlying RMW implementation
- LoanedSamples are available (
--zero-copy
) forROS_DISTRO = foxy
and above
- LoanedSamples are available (
Apex.OS
- Apex.OS
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=APEX_OS
- It is also required to
source /opt/ApexOS/setup.bash
instead of a ROS 2 distribution
- It is also required to
- Communication plugin:
-c ApexOSPollingSubscription
- Docker file: Not available
- Available underlying RMW implementations:
rmw_apex_middleware
- Available transports:
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| UDP (default), SHMEM (
--shared-memory
), LoanedSamples (--zero_copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero_copy
) | UDP |
Analyze the results
After an experiment is run with the -l
flag, a log file is recorded. Both CSV
and JSON formats are supported. It is possible to add custom data to the log
file by setting theAPEX_PERFORMANCE_TEST
environment variable before running
an experiment, e.g.
# JSON format
export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"
Plot the results
To plot the results in the JSON or CSV log files, see the plotter README.
Architecture
Apex.AI’s Performance Testing in ROS 2 white paper (available here) describes how to design a fair and unbiased performance test, and is the basis for this project.
Each middleware has a different API. Thanks to the Plugin
abstraction, the core logic of
setting up and running an experiment is completely decoupled from the implementation details
of sending and receiving individual messages.
Exactly one Plugin
implementation is selected at build time. The design is similar to the
Abstract Factory pattern.
performance_test
declares, but does not define, a static factory method in the PluginFactory
class. Each middleware provides a definition for this factory method to create a concrete Plugin
implementation, and perf_test
calls this factory method directly.
An example plugin is available here.
Performance optimizations
- On linux-based platforms,
perf_test
writes0
to/dev/cpu_dma_latency
and holds open the file handle, which will prevent the CPU from entering any idle states for the duration of the experiment. This should result in lower message latency and lower variance in that latency.
Future extensions and limitations
- Communication frameworks like DDS have a huge amount of settings. This tool only allows the most common QOS settings to be configured. The other QOS settings are hardcoded in the application.
- Only one publisher per topic is allowed, because the data verification logic does not support matching data to the different publishers.
- Some communication plugins can get stuck in their internal loops if too much data is received. Figuring out ways around such issues is one of the goals of this tool.
- FastRTPS wait-set does not support timeouts which can lead to the receiving not aborting. In that case the performance test must be manually killed.
- Using Connext DDS Micro INTRA transport with
reliable
QoS and history kind set tokeep_all
is not supported with Connext Micro. Setkeep-last
as QoS history kind always when usingreliable
.
Possible additional communication which could be implemented are:
- Raw UDP communication
Building with limited resources
When building this tool, the compiler must perform a lot of template expansion. This can be overwhelming for a system with a low-power CPU or limited RAM. There are some additional CMake options which can reduce the system load during compilation:
- This tool includes many different message types, each with many different sizes. Reduce the number of
messages, and thus the compilation load, by disabling one or more message types. For example, to build
without
PointCloud
messages, add-DENABLE_MSGS_POINDCLOUD=OFF
to the--cmake-args
. The message types, and their options for enabling/disabling, can be found here.
Changelog for package performance_test
X.Y.Z (YYYY/MM/DD)
2.3.0 (2024/09/24)
Removed
- Moved
apex_performance_plotter
to its own package here
2.2.0 (2024/05/15)
Added
- performance_test can be built with ROS 2 Iron and Jazzy
Changed
- Renamed the
--dds-domain_id
CLI arg to--dds-domain-id
- When
--dds-domain-id
is unspecified, fall back to theROS_DOMAIN_ID
environment variable -
--zero-copy
has been separated into two flags:-
--shared-memory
: Enable shared-memory transfer in the plugin. This is meant to replace the need to manually set runtime flags viaCYCLONEDDS_URI
,APEX_MIDDLEWARE_SETTINGS
, etc. -
--loaned-samples
: When publishing messages in the plugin, borrow loaned samples instead of publishing by copy -
--zero-copy
is now an alias for--shared-memory --loaned-samples
- Supported plugins include:
-c CycloneDDS
-c CycloneDDS-CXX
-c ApexOSPollingSubscription
-
-c rclcpp-*
withRMW_IMPLEMENTATION=rmw_cyclonedds_cpp
-
-c rclcpp-*
withRMW_IMPLEMENTATION=rmw_fastrtps_cpp
-
2.1.0 (2024/04/17)
Added
- Add new function
prepare()
to the Publisher and Subscriber API, intended to allow participant discovery without blocking the main threadChanged
- Change the default
--history
arg fromKEEP_ALL
toKEEP_LAST
- Change the default
--history-depth
arg from1000
to16
- If
--expected-num-pubs
is unspecified, set it to the same value as-p
- If
--expected-num-subs
is unspecified, set it to the same value as-s
Fixed
- Removed an unused variable to fix a Clang build
- Remove unused variable names in the
Plugin
abstract class - Fix a potential lockup in PublisherTask on QNX
2.0.0 (2024/03/19)
Added
- Add experimental bazel support
bazel build //performance_test --//:plugin_implementation=//path/to/a/plugin
- Add a rudimentary socket-based plugin for testing the bazel support
-
bazel run //performance_test --//:plugin_implementation=//performance_test/plugins/demo:demo_plugin -- --help
Changed
-
- Instead of enabling/disabling each plugin, you select exactly one
with a CMake string option, for example:
colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=ROS2
- Renamed the
--communication
CLI arg to--communicator
. The short-c
is unchanged.Removed
- Removed the deprecated CLI flags for QOS settings:
- Instead of
--reliable
, use--reliability RELIABLE
- Instead of
--transient
, use--durability TRANSIENT_LOCAL
- Instead of
--keep-last
, use--history KEEP_LAST
- Instead of
- Removed the obsolete
BoundedSequenceFlat
messages - Removed the superfluous
--msg-list
CLI flag. The--help
message already lists the available messages.Fixed
- Update the Apex.OS Runner to use
executor_runner::deferred
instead ofexecutor_runner::deferred_tag()
- Ensure that the first few published samples are sent at the expected rate
1.5.2 (YYYY/MM/DD)
Added
-
--prevent-cpu-idle
is available on QNXChanged
- JSON log files will contain all values in the
APEX_PERFORMANCE_TEST
dictionary, instead of the five specific values used previously - Switch to build as C++17 by default
Fixed
- Zero copy transfer is again enabled for the rclcpp publisher
1.5.0 (2023/06/14)
Added
- New CLI switch
--prevent-cpu-idle
(linux only). When specified, perf_test will use/dev/cpu_dma_latency
to request that the CPU not enter any sleep states, to potentially give more consistent results - Some smaller
Array
messages, down to 32 bits - Added support to the FastDDS plugin for bounded and unbounded sequences
Changed
- Update the README to better explain how to use this tool with Apex.OS
- In the
Runner
, allocate theAnalysisResult
s on the stack instead of usingshared_ptr
-
Subscriber
methods accept a callback parameter, instead of returning avector
of results, to reduce heap usage - Refactored the interaction between
SubscriberStats
andAnalysisResult
to remove the need for astd::vector
of latency samples, to reduce heap usage - Adjusted the
Array
message sizes to make the name match the contents - Updated
apex_os_communicator
to use the new zero-copy API
1.4.2 (2023/03/15)
Added
- Added
perfplot
support for JSON log filesChanged
- Migrate the Apex.OS target to use
rosidl_get_typesupport_target
- Preallocate the JSON logger’s string buffer to prevent reallocations after the experiment begins
1.4.1 (2023/02/23)
Changed
- Updated the iceoryx plugin to the latest master as of Feb 13
1.4.0 (2023/02/20)
Added
- New message type
BoundedSequenceFlat
- This is a
BoundedSequence
with the@flat
annotation - Sizes range from 1kB to 8MB, like
Array
andBoundedSequence
Changed
- This is a
- Messages of different types can be optionally included via CMake args:
-
-DENABLE_MSGS_ARRAY
(default ON) -
-DENABLE_MSGS_STRUCT
(default ON) -
-DENABLE_MSGS_POINT_CLOUD
(default ON) -
-DENABLE_MSGS_BOUNDED_SEQUENCE
(default OFF) -
-DENABLE_MSGS_BOUNDED_SEQUENCE_FLAT
(default OFF) -
-DENABLE_MSGS_UNBOUNDED_SEQUENCE
(default OFF) -
-DENABLE_MSGS_ALL
(default OFF)- when ON, overrides the other defaults to ON
- you can still optionally exclude some messages by explicitly setting them to OFF
Removed
-
- Removed a few messages:
- Range
- RadarTrack
- RadarDetection
- NavSatFix
Fixed
- In all cases, including loaned messages, capture the timestamp as the last step of initializing the message
1.3.7 (2023/01/04)
1.3.6 (2023/01/03)
Fixed
- Set the correct
IDL_GEN_ROOT
for rclcpp plugins
1.3.5 (2022/12/05)
Fixed
- Exit cleanly when a publisher process terminates before a subscriber process
1.3.4 (2022/11/28)
Changed
- Updated Apex.OS plugins to use the unified
LoanedSample::data()
1.3.3 (2022/11/28)
Fixed
- Implement the missing
take()
method inApexOSPollingSubscriptionSubscriber
1.3.2 (2022/11/21)
Fixed
- Capture the
this
pointer in the lambda in the iceoryx publisher
1.3.1 (2022/11/21)
Added
- New Apex.OS plugin, compatible with the
ThreadedRunner
s- The
INTER_THREAD
andINTRA_THREAD
execution strategies, combined with-c ApexOSPollingSubscription
, will use theThreadedRunner
instances - The new
APEX_SINGLE_EXECUTOR
execution strategy will add all publishers and subscribers to a single Apex.OS Executor - The new
APEX_EXECUTOR_PER_COMMUNICATOR
execution strategy will add each publisher and each subscriber to its own Apex.OS Executor instance - The new
APEX_CHAIN
execution strategy will add a publisher and subscriber as a chain of nodes to an Apex.OS ExecutorChanged
- The
- Refactored FastRTPS communicator plugin:
- Uses DDS compliant API
- Code generator updated
- Implementation for
publish_loaned()
- Dockerfile improvements
Removed
- CLI arg
--disable-async
. Synchronous / asynchronous publishing should be configured externally depending on the communication mean used.
1.3.0 (2022/08/25)
Added
- New execution strategy option:
- The default
-e INTER_THREAD
runs each publisher and subscriber in its own separate thread, which matches the previous behavior - A new
-e INTRA_THREAD
, which runs a single publisher and subscriber in the same thread. The publisher writes, and the subscriber immediately takes it - For Apex.OS specifically, some optimized execution strategies which use the
proprietary Apex.OS executor
Changed
- The default
- Significantly refactored the communicator plugins:
- Each plugin is split into an implementation of a
Publisher
and aSubscriber
, instead of a singleCommunicator
- The plugin is no longer responsible for managing the metrics, such as sample count, lost samples, and latency
- The plugin does not require any special logic to support roundtrip mode
- It is safe for the plugins to initialize their data writers and readers
at construction time, instead of delaying the initialization to the first
call of
publish()
orupdate_subscription()
- Split
publish()
intopublish_copy()
andpublish_loaned()
- Each plugin is split into an implementation of a
- Significantly refactored the runner framework:
- The runner framework is responsible for the experiment metrics
- It manages the roundtrip mode logic
- It is extensible for different execution strategies or thread configurations
- The iceoryx plugin now uses the untyped API, for improved performance
1.2.1 (2022/06/30)
Fixed
- Capture the timestamp as soon as a message is received, instead of just before storing the metrics, to reduce the reported latency to a more correct value
1.2.0 (2022/06/28)
Changed
- The CLI arguments for specifying the output type have changed:
- For console output, updated every second, add
--print-to-console
- For file output, use
--logfile my_file.csv
or--logfile my_file.json
- The type will be deduced from the file name
- If neither of these options is specified, then a warning will print, and the experiment will still run
- For console output, updated every second, add
- The linter configurations are now configured locally. This means that the output
of
colcon test
should be the same no matter the installed ROS distribution. - The
--zero-copy
arg is now valid even if the publisher and subscriber(s) are in the same processRemoved
- The publisher and subscriber loop reserve metrics are no longer recorded or reported
Fixed
- CPU usage will no longer be stuck at
0
Removed
- The pub/sub loop reserve time metrics
1.1.2 (2022/06/08)
Changed
- Use
steady_clock
for all platforms, including QNX QOS
1.1.1 (2022/06/07)
Changed
- Significant refactor to simplify the analysis pipeline
Fixed
- Add some missing definitions when Apex.OS is enabled, but the rclcpp plugins are disabled
1.1.0 (2022/06/02)
Added
- New Apex.OS Polling Subscription plugin
- Compatibility with ROS2 Humble
1.0.0 (2022/05/12)
Added
- More expressive perf_test CLI args for QOS settings
- A plugin for Cyclone DDS with C++ bindings v0.9.0b1
Changed
- CLI args for QOS settings:
--reliability <RELIABLE|BEST_EFFORT>
--durability <TRANSIENT_LOCAL|VOLATILE>
--history <KEEP_LAST|KEEP_ALL>
-
master
branch is compatible with many ROS2 distributions:- dashing
- eloquent
- foxy
- galactic
- rolling
Deprecated
- CLI flags for QOS settings:
--reliable
--transient
-
--keep-last
Removed
- The branches for specific ROS2 distributions have been deleted
Fixed
- CI jobs and Dockerfiles are decoupled from the middleware bundled with the ROS2 distribution
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
rclcpp | |
ros_environment | |
ament_cmake | |
rosidl_default_generators | |
rmw_implementation | |
rosidl_default_runtime | |
ament_cmake_gtest | |
ament_lint_auto | |
ament_lint_common |
System Dependencies
Name |
---|
git |
Dependant Packages
Name | Deps |
---|---|
performance_report |
Launch files
Messages
Services
Plugins
Recent questions tagged performance_test at Robotics Stack Exchange
performance_test package from performance_test repoperformance_report performance_test performance_test_ros1_msgs performance_test_ros1_publisher |
|
Package Summary
Tags | No category tags. |
Version | 2.3.0 |
License | Apache 2.0 |
Build type | AMENT_CMAKE |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://gitlab.com/ApexAI/performance_test.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-09-24 |
Dev Status | MAINTAINED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Apex AI, Inc.
Authors
performance_test
[TOC]
The performance_test tool tests latency and other performance metrics of various middleware implementations that support a pub/sub pattern. It is used to simulate non-functional performance of your application.
The performance_test tool allows you to quickly set up a pub/sub configuration, e.g. number of publisher/subscribers, message size, QOS settings, middleware. The following metrics are automatically recorded when the application is running:
- latency: corresponds to the time a message takes to travel from a publisher to subscriber. The latency is measured by timestamping the sample when it’s published and subtracting the timestamp (from the sample) from the measured time when the sample arrives at the subscriber (only logged when a subscriber is created)
-
CPU usage: percentage of the total system wide CPU usage (logged separately for each instance of
perf_test
) -
resident memory: heap allocations, shared memory segments, stack (used for system’s internal
work) (logged separately for each instance of
perf_test
) - sample statistics: number of samples received, sent, and lost per experiment run.
This master
branch is compatible with the following ROS 2 versions
- rolling
- jazzy
- iron
- humble
- galactic
- foxy
- eloquent
- dashing
- Apex.OS
How to use this document
- Start here for a quick example of building and running the performance_test tool with the Cyclone DDS plugin.
- If needed, find more detailed information about building and running
- Or, if the quick example is good enough, skip ahead to the list of supported middleware plugins to learn how to test a specific middleware implementation.
- Check out the tools for visualizing the results
- If desired, read about the design and architecture of the tool.
Example
This example shows how to test the non-functional performance of the following configuration:
Option | Value |
---|---|
Plugin | Cyclone DDS |
Message type | Array1k |
Publishing rate | 100Hz |
Topic name | test_topic |
Duration of the experiment | 30s |
Number of publisher(s) | 1 (default) |
Number of subscriber(s) | 1 (default) |
-
Install ROS 2
-
Install Cyclone DDS to /opt/cyclonedds
-
Build performance_test with the CMake build flag for Cyclone DDS:
source /opt/ros/rolling/setup.bash
cd ~/perf_test_ws
colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
source ./install/setup.bash
- Run with the communication plugin option for Cyclone DDS:
mkdir experiment
./install/performance_test/lib/performance_test/perf_test --communication CycloneDDS
--msg Array1k
--rate 100
--topic test_topic
--max-runtime 30
--logfile experiment/log.csv
At the end of the experiment, a CSV log file will be generated in the experiment folder with a name
that starts with log
.
Building the performance_test tool
For a simple example, see Dockerfile.rclcpp.
The performance_test tool is structured as a ROS 2 package, so colcon
is used to build it.
Therefore, you must source a ROS 2 installation:
source /opt/ros/rolling/setup.bash
Select a middleware plugin from this list. Then build the performance_test tool with the selected middleware:
mkdir -p ~/perf_test_ws/src
cd ~/perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
# At this stage, you need to choose which middleware you want to use
# The list of available flags is described in the middleware plugins section
# Square brackets denote optional arguments, like in the Python documentation.
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_PLUGIN=<plugin>
source install/setup.bash
Running an experiment
The performance_test experiments are run through the perf_test
executable.
To find the available settings, run with --help
(note the required and default arguments):
~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help
- The
-c
argument should match the selected middleware plugin from the build phase. - The
--msg
argument should be one of the supported message types, which are shown in the--help
output.
Single machine or distributed system?
Based on the configuration you want to test, the usage of the performance_test tool differs. The different possibilities are explained below.
For running tests on a single machine, you can choose between the following options:
- Intraprocess means that the publisher and subscriber threads are in the same process.
perf_test <options> --num-sub-threads 1 --num-pub-threads 1
- Interprocess means that the publisher and subscriber are in different processes. To test interprocess communication, two instances of the performance_test must be run, e.g.
# Start the subscriber first
perf_test <options> --num-sub-threads 1 --num-pub-threads 0 &
sleep 1 # give the subscriber time to finish initializing
perf_test <options> --num-sub-threads 0 --num-pub-threads 1
On a distributed system, testing latency is difficult, because the clocks are probably not perfectly synchronized between the two devices. To work around this, the performance_test tool supports relay mode, which allows for a round-trip style of communication:
# On the main machine
perf_test <options> --roundtrip-mode Main
# On the relay machine:
perf_test <options> --roundtrip-mode Relay
In relay mode, the Main machine sends messages to the Relay machine, which immediately sends the messages back. The Main machine receives the relayed message, and reports the round-trip latency. Therefore, the reported latency will be roughly double the latency compared to the latency reported in non-relay mode.
Single machine, single thread
An intra-thread configuration is experimentally supported, in which a publisher and subscriber both operate in the same thread. The publisher writes a messages, and the subscriber immediately takes it.
perf_test <options> -e INTRA_THREAD
Notes:
- This is only available when zero copy transfer is enabled
- This requires exactly one publisher and one subscriber
- This is not compatible with roundtrip mode
Middleware plugins
The performance test tool can measure the performance of a variety of communication solutions from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines.
The performance_test tool implements an executor that runs the publisher(s) and/or the subscriber(s) in their own thread.
The following plugins are currently implemented:
Eclipse Cyclone DDS
- Eclipse Cyclone DDS 0.9.0b1
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
- Communication plugin:
-c CycloneDDS
- Docker file: Dockerfile.CycloneDDS
- Available transports:
- Cyclone DDS zero copy requires
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
--shared-memory
), LoanedSamples (--zero-copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero-copy
) | UDP |
- Cyclone DDS zero copy requires
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
Eclipse Cyclone DDS C++ binding
- Eclipse Cyclone DDS C++ bindings 0.9.0b1
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS_CXX
- Communication plugin:
-c CycloneDDS-CXX
- Docker file: Dockerfile.CycloneDDS-CXX
- Available transports:
- Cyclone DDS zero copy requires the
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
--shared-memory
), LoanedSamples (--zero-copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero-copy
) | UDP |
- Cyclone DDS zero copy requires the
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
Eclipse iceoryx
- iceoryx (latest master as of Feb 13)
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=ICEORYX
- Communication plugin:
-c iceoryx
- Docker file: Dockerfile.iceoryx
- The iceoryx plugin is not a DDS implementation.
- The DDS-specific options (such as domain ID, durability, and reliability) do not apply.
- To run with the iceoryx plugin, RouDi must be running.
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |———–|———————|———————————–| | LoanedSamples | LoanedSamples | Not supported by performance_test |
eProsima Fast DDS
- FastDDS 2.6.2
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=FASTDDS
- Communication plugin:
-c FastRTPS
- Docker file: Dockerfile.FastDDS
- Available transports:
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), LoanedSamples (
--zero-copy
) | SHMEM (default), LoanedSamples (--zero-copy
) | UDP |
OCI OpenDDS
- OpenDDS 3.13.2
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=OPENDDS
- Communication plugin:
-c OpenDDS
- Docker file: Dockerfile.OpenDDS
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | TCP | TCP | TCP |
RTI Connext DDS
- RTI Connext DDS 5.3.1+
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CONNEXTDDS
- Communication plugin:
-c ConnextDDS
- Docker file: Not available
- A license is required
- You need to source an RTI Connext DDS environment.
- If RTI Connext DDS was installed with ROS 2 (Linux only):
source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
- If RTI Connext DDS is installed separately, you can source the following script to set the
environment:
source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash
- If RTI Connext DDS was installed with ROS 2 (Linux only):
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |
RTI Connext DDS Micro
- Connext DDS Micro 3.0.3
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CONNEXTDDSMICRO
- Communication plugin:
-c ConnextDDSMicro
- Docker file: Not available
- A license is required
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |
Framework plugins
The performance_test tool can also measure the end-to-end latency of a framework. In this case, the executor of the framework is used to run the publisher(s) and/or the subscriber(s). The potential overhead of the rclcpp or rmw layer is measured.
ROS 2
The performance test tool can also measure the performance of a variety of RMW implementations,
through the ROS 2 rclcpp::publisher
and rclcpp::subscriber
API.
- ROS 2
rclcpp::publisher
andrclcpp::subscriber
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=ROS2
(default) - Communication plugin:
- Callback with Single Threaded Executor:
-c rclcpp-single-threaded-executor
- Callback with Static Single Threaded Executor:
-c rclcpp-static-single-threaded-executor
-
rclcpp::WaitSet
:-c rclcpp-waitset
- Callback with Single Threaded Executor:
- Docker file: Dockerfile.rclcpp
-
Available underlying RMW implementations:
- ROS 2 Rolling is pre-configured to use
rmw_fastrtps_cpp
- Follow these instructions to use a different RMW implementation
- ROS 2 Rolling is pre-configured to use
- Available transports: depends on underlying RMW implementation
- LoanedSamples are available (
--zero-copy
) forROS_DISTRO = foxy
and above
- LoanedSamples are available (
Apex.OS
- Apex.OS
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=APEX_OS
- It is also required to
source /opt/ApexOS/setup.bash
instead of a ROS 2 distribution
- It is also required to
- Communication plugin:
-c ApexOSPollingSubscription
- Docker file: Not available
- Available underlying RMW implementations:
rmw_apex_middleware
- Available transports:
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| UDP (default), SHMEM (
--shared-memory
), LoanedSamples (--zero_copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero_copy
) | UDP |
Analyze the results
After an experiment is run with the -l
flag, a log file is recorded. Both CSV
and JSON formats are supported. It is possible to add custom data to the log
file by setting theAPEX_PERFORMANCE_TEST
environment variable before running
an experiment, e.g.
# JSON format
export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"
Plot the results
To plot the results in the JSON or CSV log files, see the plotter README.
Architecture
Apex.AI’s Performance Testing in ROS 2 white paper (available here) describes how to design a fair and unbiased performance test, and is the basis for this project.
Each middleware has a different API. Thanks to the Plugin
abstraction, the core logic of
setting up and running an experiment is completely decoupled from the implementation details
of sending and receiving individual messages.
Exactly one Plugin
implementation is selected at build time. The design is similar to the
Abstract Factory pattern.
performance_test
declares, but does not define, a static factory method in the PluginFactory
class. Each middleware provides a definition for this factory method to create a concrete Plugin
implementation, and perf_test
calls this factory method directly.
An example plugin is available here.
Performance optimizations
- On linux-based platforms,
perf_test
writes0
to/dev/cpu_dma_latency
and holds open the file handle, which will prevent the CPU from entering any idle states for the duration of the experiment. This should result in lower message latency and lower variance in that latency.
Future extensions and limitations
- Communication frameworks like DDS have a huge amount of settings. This tool only allows the most common QOS settings to be configured. The other QOS settings are hardcoded in the application.
- Only one publisher per topic is allowed, because the data verification logic does not support matching data to the different publishers.
- Some communication plugins can get stuck in their internal loops if too much data is received. Figuring out ways around such issues is one of the goals of this tool.
- FastRTPS wait-set does not support timeouts which can lead to the receiving not aborting. In that case the performance test must be manually killed.
- Using Connext DDS Micro INTRA transport with
reliable
QoS and history kind set tokeep_all
is not supported with Connext Micro. Setkeep-last
as QoS history kind always when usingreliable
.
Possible additional communication which could be implemented are:
- Raw UDP communication
Building with limited resources
When building this tool, the compiler must perform a lot of template expansion. This can be overwhelming for a system with a low-power CPU or limited RAM. There are some additional CMake options which can reduce the system load during compilation:
- This tool includes many different message types, each with many different sizes. Reduce the number of
messages, and thus the compilation load, by disabling one or more message types. For example, to build
without
PointCloud
messages, add-DENABLE_MSGS_POINDCLOUD=OFF
to the--cmake-args
. The message types, and their options for enabling/disabling, can be found here.
Changelog for package performance_test
X.Y.Z (YYYY/MM/DD)
2.3.0 (2024/09/24)
Removed
- Moved
apex_performance_plotter
to its own package here
2.2.0 (2024/05/15)
Added
- performance_test can be built with ROS 2 Iron and Jazzy
Changed
- Renamed the
--dds-domain_id
CLI arg to--dds-domain-id
- When
--dds-domain-id
is unspecified, fall back to theROS_DOMAIN_ID
environment variable -
--zero-copy
has been separated into two flags:-
--shared-memory
: Enable shared-memory transfer in the plugin. This is meant to replace the need to manually set runtime flags viaCYCLONEDDS_URI
,APEX_MIDDLEWARE_SETTINGS
, etc. -
--loaned-samples
: When publishing messages in the plugin, borrow loaned samples instead of publishing by copy -
--zero-copy
is now an alias for--shared-memory --loaned-samples
- Supported plugins include:
-c CycloneDDS
-c CycloneDDS-CXX
-c ApexOSPollingSubscription
-
-c rclcpp-*
withRMW_IMPLEMENTATION=rmw_cyclonedds_cpp
-
-c rclcpp-*
withRMW_IMPLEMENTATION=rmw_fastrtps_cpp
-
2.1.0 (2024/04/17)
Added
- Add new function
prepare()
to the Publisher and Subscriber API, intended to allow participant discovery without blocking the main threadChanged
- Change the default
--history
arg fromKEEP_ALL
toKEEP_LAST
- Change the default
--history-depth
arg from1000
to16
- If
--expected-num-pubs
is unspecified, set it to the same value as-p
- If
--expected-num-subs
is unspecified, set it to the same value as-s
Fixed
- Removed an unused variable to fix a Clang build
- Remove unused variable names in the
Plugin
abstract class - Fix a potential lockup in PublisherTask on QNX
2.0.0 (2024/03/19)
Added
- Add experimental bazel support
bazel build //performance_test --//:plugin_implementation=//path/to/a/plugin
- Add a rudimentary socket-based plugin for testing the bazel support
-
bazel run //performance_test --//:plugin_implementation=//performance_test/plugins/demo:demo_plugin -- --help
Changed
-
- Instead of enabling/disabling each plugin, you select exactly one
with a CMake string option, for example:
colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=ROS2
- Renamed the
--communication
CLI arg to--communicator
. The short-c
is unchanged.Removed
- Removed the deprecated CLI flags for QOS settings:
- Instead of
--reliable
, use--reliability RELIABLE
- Instead of
--transient
, use--durability TRANSIENT_LOCAL
- Instead of
--keep-last
, use--history KEEP_LAST
- Instead of
- Removed the obsolete
BoundedSequenceFlat
messages - Removed the superfluous
--msg-list
CLI flag. The--help
message already lists the available messages.Fixed
- Update the Apex.OS Runner to use
executor_runner::deferred
instead ofexecutor_runner::deferred_tag()
- Ensure that the first few published samples are sent at the expected rate
1.5.2 (YYYY/MM/DD)
Added
-
--prevent-cpu-idle
is available on QNXChanged
- JSON log files will contain all values in the
APEX_PERFORMANCE_TEST
dictionary, instead of the five specific values used previously - Switch to build as C++17 by default
Fixed
- Zero copy transfer is again enabled for the rclcpp publisher
1.5.0 (2023/06/14)
Added
- New CLI switch
--prevent-cpu-idle
(linux only). When specified, perf_test will use/dev/cpu_dma_latency
to request that the CPU not enter any sleep states, to potentially give more consistent results - Some smaller
Array
messages, down to 32 bits - Added support to the FastDDS plugin for bounded and unbounded sequences
Changed
- Update the README to better explain how to use this tool with Apex.OS
- In the
Runner
, allocate theAnalysisResult
s on the stack instead of usingshared_ptr
-
Subscriber
methods accept a callback parameter, instead of returning avector
of results, to reduce heap usage - Refactored the interaction between
SubscriberStats
andAnalysisResult
to remove the need for astd::vector
of latency samples, to reduce heap usage - Adjusted the
Array
message sizes to make the name match the contents - Updated
apex_os_communicator
to use the new zero-copy API
1.4.2 (2023/03/15)
Added
- Added
perfplot
support for JSON log filesChanged
- Migrate the Apex.OS target to use
rosidl_get_typesupport_target
- Preallocate the JSON logger’s string buffer to prevent reallocations after the experiment begins
1.4.1 (2023/02/23)
Changed
- Updated the iceoryx plugin to the latest master as of Feb 13
1.4.0 (2023/02/20)
Added
- New message type
BoundedSequenceFlat
- This is a
BoundedSequence
with the@flat
annotation - Sizes range from 1kB to 8MB, like
Array
andBoundedSequence
Changed
- This is a
- Messages of different types can be optionally included via CMake args:
-
-DENABLE_MSGS_ARRAY
(default ON) -
-DENABLE_MSGS_STRUCT
(default ON) -
-DENABLE_MSGS_POINT_CLOUD
(default ON) -
-DENABLE_MSGS_BOUNDED_SEQUENCE
(default OFF) -
-DENABLE_MSGS_BOUNDED_SEQUENCE_FLAT
(default OFF) -
-DENABLE_MSGS_UNBOUNDED_SEQUENCE
(default OFF) -
-DENABLE_MSGS_ALL
(default OFF)- when ON, overrides the other defaults to ON
- you can still optionally exclude some messages by explicitly setting them to OFF
Removed
-
- Removed a few messages:
- Range
- RadarTrack
- RadarDetection
- NavSatFix
Fixed
- In all cases, including loaned messages, capture the timestamp as the last step of initializing the message
1.3.7 (2023/01/04)
1.3.6 (2023/01/03)
Fixed
- Set the correct
IDL_GEN_ROOT
for rclcpp plugins
1.3.5 (2022/12/05)
Fixed
- Exit cleanly when a publisher process terminates before a subscriber process
1.3.4 (2022/11/28)
Changed
- Updated Apex.OS plugins to use the unified
LoanedSample::data()
1.3.3 (2022/11/28)
Fixed
- Implement the missing
take()
method inApexOSPollingSubscriptionSubscriber
1.3.2 (2022/11/21)
Fixed
- Capture the
this
pointer in the lambda in the iceoryx publisher
1.3.1 (2022/11/21)
Added
- New Apex.OS plugin, compatible with the
ThreadedRunner
s- The
INTER_THREAD
andINTRA_THREAD
execution strategies, combined with-c ApexOSPollingSubscription
, will use theThreadedRunner
instances - The new
APEX_SINGLE_EXECUTOR
execution strategy will add all publishers and subscribers to a single Apex.OS Executor - The new
APEX_EXECUTOR_PER_COMMUNICATOR
execution strategy will add each publisher and each subscriber to its own Apex.OS Executor instance - The new
APEX_CHAIN
execution strategy will add a publisher and subscriber as a chain of nodes to an Apex.OS ExecutorChanged
- The
- Refactored FastRTPS communicator plugin:
- Uses DDS compliant API
- Code generator updated
- Implementation for
publish_loaned()
- Dockerfile improvements
Removed
- CLI arg
--disable-async
. Synchronous / asynchronous publishing should be configured externally depending on the communication mean used.
1.3.0 (2022/08/25)
Added
- New execution strategy option:
- The default
-e INTER_THREAD
runs each publisher and subscriber in its own separate thread, which matches the previous behavior - A new
-e INTRA_THREAD
, which runs a single publisher and subscriber in the same thread. The publisher writes, and the subscriber immediately takes it - For Apex.OS specifically, some optimized execution strategies which use the
proprietary Apex.OS executor
Changed
- The default
- Significantly refactored the communicator plugins:
- Each plugin is split into an implementation of a
Publisher
and aSubscriber
, instead of a singleCommunicator
- The plugin is no longer responsible for managing the metrics, such as sample count, lost samples, and latency
- The plugin does not require any special logic to support roundtrip mode
- It is safe for the plugins to initialize their data writers and readers
at construction time, instead of delaying the initialization to the first
call of
publish()
orupdate_subscription()
- Split
publish()
intopublish_copy()
andpublish_loaned()
- Each plugin is split into an implementation of a
- Significantly refactored the runner framework:
- The runner framework is responsible for the experiment metrics
- It manages the roundtrip mode logic
- It is extensible for different execution strategies or thread configurations
- The iceoryx plugin now uses the untyped API, for improved performance
1.2.1 (2022/06/30)
Fixed
- Capture the timestamp as soon as a message is received, instead of just before storing the metrics, to reduce the reported latency to a more correct value
1.2.0 (2022/06/28)
Changed
- The CLI arguments for specifying the output type have changed:
- For console output, updated every second, add
--print-to-console
- For file output, use
--logfile my_file.csv
or--logfile my_file.json
- The type will be deduced from the file name
- If neither of these options is specified, then a warning will print, and the experiment will still run
- For console output, updated every second, add
- The linter configurations are now configured locally. This means that the output
of
colcon test
should be the same no matter the installed ROS distribution. - The
--zero-copy
arg is now valid even if the publisher and subscriber(s) are in the same processRemoved
- The publisher and subscriber loop reserve metrics are no longer recorded or reported
Fixed
- CPU usage will no longer be stuck at
0
Removed
- The pub/sub loop reserve time metrics
1.1.2 (2022/06/08)
Changed
- Use
steady_clock
for all platforms, including QNX QOS
1.1.1 (2022/06/07)
Changed
- Significant refactor to simplify the analysis pipeline
Fixed
- Add some missing definitions when Apex.OS is enabled, but the rclcpp plugins are disabled
1.1.0 (2022/06/02)
Added
- New Apex.OS Polling Subscription plugin
- Compatibility with ROS2 Humble
1.0.0 (2022/05/12)
Added
- More expressive perf_test CLI args for QOS settings
- A plugin for Cyclone DDS with C++ bindings v0.9.0b1
Changed
- CLI args for QOS settings:
--reliability <RELIABLE|BEST_EFFORT>
--durability <TRANSIENT_LOCAL|VOLATILE>
--history <KEEP_LAST|KEEP_ALL>
-
master
branch is compatible with many ROS2 distributions:- dashing
- eloquent
- foxy
- galactic
- rolling
Deprecated
- CLI flags for QOS settings:
--reliable
--transient
-
--keep-last
Removed
- The branches for specific ROS2 distributions have been deleted
Fixed
- CI jobs and Dockerfiles are decoupled from the middleware bundled with the ROS2 distribution
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
rclcpp | |
ros_environment | |
ament_cmake | |
rosidl_default_generators | |
rmw_implementation | |
rosidl_default_runtime | |
ament_cmake_gtest | |
ament_lint_auto | |
ament_lint_common |
System Dependencies
Name |
---|
git |
Dependant Packages
Name | Deps |
---|---|
performance_report |
Launch files
Messages
Services
Plugins
Recent questions tagged performance_test at Robotics Stack Exchange
performance_test package from performance_test repoperformance_report performance_test performance_test_ros1_msgs performance_test_ros1_publisher |
|
Package Summary
Tags | No category tags. |
Version | 2.3.0 |
License | Apache 2.0 |
Build type | AMENT_CMAKE |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://gitlab.com/ApexAI/performance_test.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-09-24 |
Dev Status | MAINTAINED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Apex AI, Inc.
Authors
performance_test
[TOC]
The performance_test tool tests latency and other performance metrics of various middleware implementations that support a pub/sub pattern. It is used to simulate non-functional performance of your application.
The performance_test tool allows you to quickly set up a pub/sub configuration, e.g. number of publisher/subscribers, message size, QOS settings, middleware. The following metrics are automatically recorded when the application is running:
- latency: corresponds to the time a message takes to travel from a publisher to subscriber. The latency is measured by timestamping the sample when it’s published and subtracting the timestamp (from the sample) from the measured time when the sample arrives at the subscriber (only logged when a subscriber is created)
-
CPU usage: percentage of the total system wide CPU usage (logged separately for each instance of
perf_test
) -
resident memory: heap allocations, shared memory segments, stack (used for system’s internal
work) (logged separately for each instance of
perf_test
) - sample statistics: number of samples received, sent, and lost per experiment run.
This master
branch is compatible with the following ROS 2 versions
- rolling
- jazzy
- iron
- humble
- galactic
- foxy
- eloquent
- dashing
- Apex.OS
How to use this document
- Start here for a quick example of building and running the performance_test tool with the Cyclone DDS plugin.
- If needed, find more detailed information about building and running
- Or, if the quick example is good enough, skip ahead to the list of supported middleware plugins to learn how to test a specific middleware implementation.
- Check out the tools for visualizing the results
- If desired, read about the design and architecture of the tool.
Example
This example shows how to test the non-functional performance of the following configuration:
Option | Value |
---|---|
Plugin | Cyclone DDS |
Message type | Array1k |
Publishing rate | 100Hz |
Topic name | test_topic |
Duration of the experiment | 30s |
Number of publisher(s) | 1 (default) |
Number of subscriber(s) | 1 (default) |
-
Install ROS 2
-
Install Cyclone DDS to /opt/cyclonedds
-
Build performance_test with the CMake build flag for Cyclone DDS:
source /opt/ros/rolling/setup.bash
cd ~/perf_test_ws
colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
source ./install/setup.bash
- Run with the communication plugin option for Cyclone DDS:
mkdir experiment
./install/performance_test/lib/performance_test/perf_test --communication CycloneDDS
--msg Array1k
--rate 100
--topic test_topic
--max-runtime 30
--logfile experiment/log.csv
At the end of the experiment, a CSV log file will be generated in the experiment folder with a name
that starts with log
.
Building the performance_test tool
For a simple example, see Dockerfile.rclcpp.
The performance_test tool is structured as a ROS 2 package, so colcon
is used to build it.
Therefore, you must source a ROS 2 installation:
source /opt/ros/rolling/setup.bash
Select a middleware plugin from this list. Then build the performance_test tool with the selected middleware:
mkdir -p ~/perf_test_ws/src
cd ~/perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
# At this stage, you need to choose which middleware you want to use
# The list of available flags is described in the middleware plugins section
# Square brackets denote optional arguments, like in the Python documentation.
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_PLUGIN=<plugin>
source install/setup.bash
Running an experiment
The performance_test experiments are run through the perf_test
executable.
To find the available settings, run with --help
(note the required and default arguments):
~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help
- The
-c
argument should match the selected middleware plugin from the build phase. - The
--msg
argument should be one of the supported message types, which are shown in the--help
output.
Single machine or distributed system?
Based on the configuration you want to test, the usage of the performance_test tool differs. The different possibilities are explained below.
For running tests on a single machine, you can choose between the following options:
- Intraprocess means that the publisher and subscriber threads are in the same process.
perf_test <options> --num-sub-threads 1 --num-pub-threads 1
- Interprocess means that the publisher and subscriber are in different processes. To test interprocess communication, two instances of the performance_test must be run, e.g.
# Start the subscriber first
perf_test <options> --num-sub-threads 1 --num-pub-threads 0 &
sleep 1 # give the subscriber time to finish initializing
perf_test <options> --num-sub-threads 0 --num-pub-threads 1
On a distributed system, testing latency is difficult, because the clocks are probably not perfectly synchronized between the two devices. To work around this, the performance_test tool supports relay mode, which allows for a round-trip style of communication:
# On the main machine
perf_test <options> --roundtrip-mode Main
# On the relay machine:
perf_test <options> --roundtrip-mode Relay
In relay mode, the Main machine sends messages to the Relay machine, which immediately sends the messages back. The Main machine receives the relayed message, and reports the round-trip latency. Therefore, the reported latency will be roughly double the latency compared to the latency reported in non-relay mode.
Single machine, single thread
An intra-thread configuration is experimentally supported, in which a publisher and subscriber both operate in the same thread. The publisher writes a messages, and the subscriber immediately takes it.
perf_test <options> -e INTRA_THREAD
Notes:
- This is only available when zero copy transfer is enabled
- This requires exactly one publisher and one subscriber
- This is not compatible with roundtrip mode
Middleware plugins
The performance test tool can measure the performance of a variety of communication solutions from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines.
The performance_test tool implements an executor that runs the publisher(s) and/or the subscriber(s) in their own thread.
The following plugins are currently implemented:
Eclipse Cyclone DDS
- Eclipse Cyclone DDS 0.9.0b1
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
- Communication plugin:
-c CycloneDDS
- Docker file: Dockerfile.CycloneDDS
- Available transports:
- Cyclone DDS zero copy requires
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
--shared-memory
), LoanedSamples (--zero-copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero-copy
) | UDP |
- Cyclone DDS zero copy requires
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
Eclipse Cyclone DDS C++ binding
- Eclipse Cyclone DDS C++ bindings 0.9.0b1
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS_CXX
- Communication plugin:
-c CycloneDDS-CXX
- Docker file: Dockerfile.CycloneDDS-CXX
- Available transports:
- Cyclone DDS zero copy requires the
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
--shared-memory
), LoanedSamples (--zero-copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero-copy
) | UDP |
- Cyclone DDS zero copy requires the
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
Eclipse iceoryx
- iceoryx (latest master as of Feb 13)
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=ICEORYX
- Communication plugin:
-c iceoryx
- Docker file: Dockerfile.iceoryx
- The iceoryx plugin is not a DDS implementation.
- The DDS-specific options (such as domain ID, durability, and reliability) do not apply.
- To run with the iceoryx plugin, RouDi must be running.
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |———–|———————|———————————–| | LoanedSamples | LoanedSamples | Not supported by performance_test |
eProsima Fast DDS
- FastDDS 2.6.2
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=FASTDDS
- Communication plugin:
-c FastRTPS
- Docker file: Dockerfile.FastDDS
- Available transports:
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), LoanedSamples (
--zero-copy
) | SHMEM (default), LoanedSamples (--zero-copy
) | UDP |
OCI OpenDDS
- OpenDDS 3.13.2
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=OPENDDS
- Communication plugin:
-c OpenDDS
- Docker file: Dockerfile.OpenDDS
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | TCP | TCP | TCP |
RTI Connext DDS
- RTI Connext DDS 5.3.1+
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CONNEXTDDS
- Communication plugin:
-c ConnextDDS
- Docker file: Not available
- A license is required
- You need to source an RTI Connext DDS environment.
- If RTI Connext DDS was installed with ROS 2 (Linux only):
source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
- If RTI Connext DDS is installed separately, you can source the following script to set the
environment:
source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash
- If RTI Connext DDS was installed with ROS 2 (Linux only):
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |
RTI Connext DDS Micro
- Connext DDS Micro 3.0.3
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CONNEXTDDSMICRO
- Communication plugin:
-c ConnextDDSMicro
- Docker file: Not available
- A license is required
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |
Framework plugins
The performance_test tool can also measure the end-to-end latency of a framework. In this case, the executor of the framework is used to run the publisher(s) and/or the subscriber(s). The potential overhead of the rclcpp or rmw layer is measured.
ROS 2
The performance test tool can also measure the performance of a variety of RMW implementations,
through the ROS 2 rclcpp::publisher
and rclcpp::subscriber
API.
- ROS 2
rclcpp::publisher
andrclcpp::subscriber
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=ROS2
(default) - Communication plugin:
- Callback with Single Threaded Executor:
-c rclcpp-single-threaded-executor
- Callback with Static Single Threaded Executor:
-c rclcpp-static-single-threaded-executor
-
rclcpp::WaitSet
:-c rclcpp-waitset
- Callback with Single Threaded Executor:
- Docker file: Dockerfile.rclcpp
-
Available underlying RMW implementations:
- ROS 2 Rolling is pre-configured to use
rmw_fastrtps_cpp
- Follow these instructions to use a different RMW implementation
- ROS 2 Rolling is pre-configured to use
- Available transports: depends on underlying RMW implementation
- LoanedSamples are available (
--zero-copy
) forROS_DISTRO = foxy
and above
- LoanedSamples are available (
Apex.OS
- Apex.OS
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=APEX_OS
- It is also required to
source /opt/ApexOS/setup.bash
instead of a ROS 2 distribution
- It is also required to
- Communication plugin:
-c ApexOSPollingSubscription
- Docker file: Not available
- Available underlying RMW implementations:
rmw_apex_middleware
- Available transports:
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| UDP (default), SHMEM (
--shared-memory
), LoanedSamples (--zero_copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero_copy
) | UDP |
Analyze the results
After an experiment is run with the -l
flag, a log file is recorded. Both CSV
and JSON formats are supported. It is possible to add custom data to the log
file by setting theAPEX_PERFORMANCE_TEST
environment variable before running
an experiment, e.g.
# JSON format
export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"
Plot the results
To plot the results in the JSON or CSV log files, see the plotter README.
Architecture
Apex.AI’s Performance Testing in ROS 2 white paper (available here) describes how to design a fair and unbiased performance test, and is the basis for this project.
Each middleware has a different API. Thanks to the Plugin
abstraction, the core logic of
setting up and running an experiment is completely decoupled from the implementation details
of sending and receiving individual messages.
Exactly one Plugin
implementation is selected at build time. The design is similar to the
Abstract Factory pattern.
performance_test
declares, but does not define, a static factory method in the PluginFactory
class. Each middleware provides a definition for this factory method to create a concrete Plugin
implementation, and perf_test
calls this factory method directly.
An example plugin is available here.
Performance optimizations
- On linux-based platforms,
perf_test
writes0
to/dev/cpu_dma_latency
and holds open the file handle, which will prevent the CPU from entering any idle states for the duration of the experiment. This should result in lower message latency and lower variance in that latency.
Future extensions and limitations
- Communication frameworks like DDS have a huge amount of settings. This tool only allows the most common QOS settings to be configured. The other QOS settings are hardcoded in the application.
- Only one publisher per topic is allowed, because the data verification logic does not support matching data to the different publishers.
- Some communication plugins can get stuck in their internal loops if too much data is received. Figuring out ways around such issues is one of the goals of this tool.
- FastRTPS wait-set does not support timeouts which can lead to the receiving not aborting. In that case the performance test must be manually killed.
- Using Connext DDS Micro INTRA transport with
reliable
QoS and history kind set tokeep_all
is not supported with Connext Micro. Setkeep-last
as QoS history kind always when usingreliable
.
Possible additional communication which could be implemented are:
- Raw UDP communication
Building with limited resources
When building this tool, the compiler must perform a lot of template expansion. This can be overwhelming for a system with a low-power CPU or limited RAM. There are some additional CMake options which can reduce the system load during compilation:
- This tool includes many different message types, each with many different sizes. Reduce the number of
messages, and thus the compilation load, by disabling one or more message types. For example, to build
without
PointCloud
messages, add-DENABLE_MSGS_POINDCLOUD=OFF
to the--cmake-args
. The message types, and their options for enabling/disabling, can be found here.
Changelog for package performance_test
X.Y.Z (YYYY/MM/DD)
2.3.0 (2024/09/24)
Removed
- Moved
apex_performance_plotter
to its own package here
2.2.0 (2024/05/15)
Added
- performance_test can be built with ROS 2 Iron and Jazzy
Changed
- Renamed the
--dds-domain_id
CLI arg to--dds-domain-id
- When
--dds-domain-id
is unspecified, fall back to theROS_DOMAIN_ID
environment variable -
--zero-copy
has been separated into two flags:-
--shared-memory
: Enable shared-memory transfer in the plugin. This is meant to replace the need to manually set runtime flags viaCYCLONEDDS_URI
,APEX_MIDDLEWARE_SETTINGS
, etc. -
--loaned-samples
: When publishing messages in the plugin, borrow loaned samples instead of publishing by copy -
--zero-copy
is now an alias for--shared-memory --loaned-samples
- Supported plugins include:
-c CycloneDDS
-c CycloneDDS-CXX
-c ApexOSPollingSubscription
-
-c rclcpp-*
withRMW_IMPLEMENTATION=rmw_cyclonedds_cpp
-
-c rclcpp-*
withRMW_IMPLEMENTATION=rmw_fastrtps_cpp
-
2.1.0 (2024/04/17)
Added
- Add new function
prepare()
to the Publisher and Subscriber API, intended to allow participant discovery without blocking the main threadChanged
- Change the default
--history
arg fromKEEP_ALL
toKEEP_LAST
- Change the default
--history-depth
arg from1000
to16
- If
--expected-num-pubs
is unspecified, set it to the same value as-p
- If
--expected-num-subs
is unspecified, set it to the same value as-s
Fixed
- Removed an unused variable to fix a Clang build
- Remove unused variable names in the
Plugin
abstract class - Fix a potential lockup in PublisherTask on QNX
2.0.0 (2024/03/19)
Added
- Add experimental bazel support
bazel build //performance_test --//:plugin_implementation=//path/to/a/plugin
- Add a rudimentary socket-based plugin for testing the bazel support
-
bazel run //performance_test --//:plugin_implementation=//performance_test/plugins/demo:demo_plugin -- --help
Changed
-
- Instead of enabling/disabling each plugin, you select exactly one
with a CMake string option, for example:
colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=ROS2
- Renamed the
--communication
CLI arg to--communicator
. The short-c
is unchanged.Removed
- Removed the deprecated CLI flags for QOS settings:
- Instead of
--reliable
, use--reliability RELIABLE
- Instead of
--transient
, use--durability TRANSIENT_LOCAL
- Instead of
--keep-last
, use--history KEEP_LAST
- Instead of
- Removed the obsolete
BoundedSequenceFlat
messages - Removed the superfluous
--msg-list
CLI flag. The--help
message already lists the available messages.Fixed
- Update the Apex.OS Runner to use
executor_runner::deferred
instead ofexecutor_runner::deferred_tag()
- Ensure that the first few published samples are sent at the expected rate
1.5.2 (YYYY/MM/DD)
Added
-
--prevent-cpu-idle
is available on QNXChanged
- JSON log files will contain all values in the
APEX_PERFORMANCE_TEST
dictionary, instead of the five specific values used previously - Switch to build as C++17 by default
Fixed
- Zero copy transfer is again enabled for the rclcpp publisher
1.5.0 (2023/06/14)
Added
- New CLI switch
--prevent-cpu-idle
(linux only). When specified, perf_test will use/dev/cpu_dma_latency
to request that the CPU not enter any sleep states, to potentially give more consistent results - Some smaller
Array
messages, down to 32 bits - Added support to the FastDDS plugin for bounded and unbounded sequences
Changed
- Update the README to better explain how to use this tool with Apex.OS
- In the
Runner
, allocate theAnalysisResult
s on the stack instead of usingshared_ptr
-
Subscriber
methods accept a callback parameter, instead of returning avector
of results, to reduce heap usage - Refactored the interaction between
SubscriberStats
andAnalysisResult
to remove the need for astd::vector
of latency samples, to reduce heap usage - Adjusted the
Array
message sizes to make the name match the contents - Updated
apex_os_communicator
to use the new zero-copy API
1.4.2 (2023/03/15)
Added
- Added
perfplot
support for JSON log filesChanged
- Migrate the Apex.OS target to use
rosidl_get_typesupport_target
- Preallocate the JSON logger’s string buffer to prevent reallocations after the experiment begins
1.4.1 (2023/02/23)
Changed
- Updated the iceoryx plugin to the latest master as of Feb 13
1.4.0 (2023/02/20)
Added
- New message type
BoundedSequenceFlat
- This is a
BoundedSequence
with the@flat
annotation - Sizes range from 1kB to 8MB, like
Array
andBoundedSequence
Changed
- This is a
- Messages of different types can be optionally included via CMake args:
-
-DENABLE_MSGS_ARRAY
(default ON) -
-DENABLE_MSGS_STRUCT
(default ON) -
-DENABLE_MSGS_POINT_CLOUD
(default ON) -
-DENABLE_MSGS_BOUNDED_SEQUENCE
(default OFF) -
-DENABLE_MSGS_BOUNDED_SEQUENCE_FLAT
(default OFF) -
-DENABLE_MSGS_UNBOUNDED_SEQUENCE
(default OFF) -
-DENABLE_MSGS_ALL
(default OFF)- when ON, overrides the other defaults to ON
- you can still optionally exclude some messages by explicitly setting them to OFF
Removed
-
- Removed a few messages:
- Range
- RadarTrack
- RadarDetection
- NavSatFix
Fixed
- In all cases, including loaned messages, capture the timestamp as the last step of initializing the message
1.3.7 (2023/01/04)
1.3.6 (2023/01/03)
Fixed
- Set the correct
IDL_GEN_ROOT
for rclcpp plugins
1.3.5 (2022/12/05)
Fixed
- Exit cleanly when a publisher process terminates before a subscriber process
1.3.4 (2022/11/28)
Changed
- Updated Apex.OS plugins to use the unified
LoanedSample::data()
1.3.3 (2022/11/28)
Fixed
- Implement the missing
take()
method inApexOSPollingSubscriptionSubscriber
1.3.2 (2022/11/21)
Fixed
- Capture the
this
pointer in the lambda in the iceoryx publisher
1.3.1 (2022/11/21)
Added
- New Apex.OS plugin, compatible with the
ThreadedRunner
s- The
INTER_THREAD
andINTRA_THREAD
execution strategies, combined with-c ApexOSPollingSubscription
, will use theThreadedRunner
instances - The new
APEX_SINGLE_EXECUTOR
execution strategy will add all publishers and subscribers to a single Apex.OS Executor - The new
APEX_EXECUTOR_PER_COMMUNICATOR
execution strategy will add each publisher and each subscriber to its own Apex.OS Executor instance - The new
APEX_CHAIN
execution strategy will add a publisher and subscriber as a chain of nodes to an Apex.OS ExecutorChanged
- The
- Refactored FastRTPS communicator plugin:
- Uses DDS compliant API
- Code generator updated
- Implementation for
publish_loaned()
- Dockerfile improvements
Removed
- CLI arg
--disable-async
. Synchronous / asynchronous publishing should be configured externally depending on the communication mean used.
1.3.0 (2022/08/25)
Added
- New execution strategy option:
- The default
-e INTER_THREAD
runs each publisher and subscriber in its own separate thread, which matches the previous behavior - A new
-e INTRA_THREAD
, which runs a single publisher and subscriber in the same thread. The publisher writes, and the subscriber immediately takes it - For Apex.OS specifically, some optimized execution strategies which use the
proprietary Apex.OS executor
Changed
- The default
- Significantly refactored the communicator plugins:
- Each plugin is split into an implementation of a
Publisher
and aSubscriber
, instead of a singleCommunicator
- The plugin is no longer responsible for managing the metrics, such as sample count, lost samples, and latency
- The plugin does not require any special logic to support roundtrip mode
- It is safe for the plugins to initialize their data writers and readers
at construction time, instead of delaying the initialization to the first
call of
publish()
orupdate_subscription()
- Split
publish()
intopublish_copy()
andpublish_loaned()
- Each plugin is split into an implementation of a
- Significantly refactored the runner framework:
- The runner framework is responsible for the experiment metrics
- It manages the roundtrip mode logic
- It is extensible for different execution strategies or thread configurations
- The iceoryx plugin now uses the untyped API, for improved performance
1.2.1 (2022/06/30)
Fixed
- Capture the timestamp as soon as a message is received, instead of just before storing the metrics, to reduce the reported latency to a more correct value
1.2.0 (2022/06/28)
Changed
- The CLI arguments for specifying the output type have changed:
- For console output, updated every second, add
--print-to-console
- For file output, use
--logfile my_file.csv
or--logfile my_file.json
- The type will be deduced from the file name
- If neither of these options is specified, then a warning will print, and the experiment will still run
- For console output, updated every second, add
- The linter configurations are now configured locally. This means that the output
of
colcon test
should be the same no matter the installed ROS distribution. - The
--zero-copy
arg is now valid even if the publisher and subscriber(s) are in the same processRemoved
- The publisher and subscriber loop reserve metrics are no longer recorded or reported
Fixed
- CPU usage will no longer be stuck at
0
Removed
- The pub/sub loop reserve time metrics
1.1.2 (2022/06/08)
Changed
- Use
steady_clock
for all platforms, including QNX QOS
1.1.1 (2022/06/07)
Changed
- Significant refactor to simplify the analysis pipeline
Fixed
- Add some missing definitions when Apex.OS is enabled, but the rclcpp plugins are disabled
1.1.0 (2022/06/02)
Added
- New Apex.OS Polling Subscription plugin
- Compatibility with ROS2 Humble
1.0.0 (2022/05/12)
Added
- More expressive perf_test CLI args for QOS settings
- A plugin for Cyclone DDS with C++ bindings v0.9.0b1
Changed
- CLI args for QOS settings:
--reliability <RELIABLE|BEST_EFFORT>
--durability <TRANSIENT_LOCAL|VOLATILE>
--history <KEEP_LAST|KEEP_ALL>
-
master
branch is compatible with many ROS2 distributions:- dashing
- eloquent
- foxy
- galactic
- rolling
Deprecated
- CLI flags for QOS settings:
--reliable
--transient
-
--keep-last
Removed
- The branches for specific ROS2 distributions have been deleted
Fixed
- CI jobs and Dockerfiles are decoupled from the middleware bundled with the ROS2 distribution
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
rclcpp | |
ros_environment | |
ament_cmake | |
rosidl_default_generators | |
rmw_implementation | |
rosidl_default_runtime | |
ament_cmake_gtest | |
ament_lint_auto | |
ament_lint_common |
System Dependencies
Name |
---|
git |
Dependant Packages
Name | Deps |
---|---|
performance_report |
Launch files
Messages
Services
Plugins
Recent questions tagged performance_test at Robotics Stack Exchange
performance_test package from performance_test repoperformance_report performance_test performance_test_ros1_msgs performance_test_ros1_publisher |
|
Package Summary
Tags | No category tags. |
Version | 2.3.0 |
License | Apache 2.0 |
Build type | AMENT_CMAKE |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://gitlab.com/ApexAI/performance_test.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-09-24 |
Dev Status | MAINTAINED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Apex AI, Inc.
Authors
performance_test
[TOC]
The performance_test tool tests latency and other performance metrics of various middleware implementations that support a pub/sub pattern. It is used to simulate non-functional performance of your application.
The performance_test tool allows you to quickly set up a pub/sub configuration, e.g. number of publisher/subscribers, message size, QOS settings, middleware. The following metrics are automatically recorded when the application is running:
- latency: corresponds to the time a message takes to travel from a publisher to subscriber. The latency is measured by timestamping the sample when it’s published and subtracting the timestamp (from the sample) from the measured time when the sample arrives at the subscriber (only logged when a subscriber is created)
-
CPU usage: percentage of the total system wide CPU usage (logged separately for each instance of
perf_test
) -
resident memory: heap allocations, shared memory segments, stack (used for system’s internal
work) (logged separately for each instance of
perf_test
) - sample statistics: number of samples received, sent, and lost per experiment run.
This master
branch is compatible with the following ROS 2 versions
- rolling
- jazzy
- iron
- humble
- galactic
- foxy
- eloquent
- dashing
- Apex.OS
How to use this document
- Start here for a quick example of building and running the performance_test tool with the Cyclone DDS plugin.
- If needed, find more detailed information about building and running
- Or, if the quick example is good enough, skip ahead to the list of supported middleware plugins to learn how to test a specific middleware implementation.
- Check out the tools for visualizing the results
- If desired, read about the design and architecture of the tool.
Example
This example shows how to test the non-functional performance of the following configuration:
Option | Value |
---|---|
Plugin | Cyclone DDS |
Message type | Array1k |
Publishing rate | 100Hz |
Topic name | test_topic |
Duration of the experiment | 30s |
Number of publisher(s) | 1 (default) |
Number of subscriber(s) | 1 (default) |
-
Install ROS 2
-
Install Cyclone DDS to /opt/cyclonedds
-
Build performance_test with the CMake build flag for Cyclone DDS:
source /opt/ros/rolling/setup.bash
cd ~/perf_test_ws
colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
source ./install/setup.bash
- Run with the communication plugin option for Cyclone DDS:
mkdir experiment
./install/performance_test/lib/performance_test/perf_test --communication CycloneDDS
--msg Array1k
--rate 100
--topic test_topic
--max-runtime 30
--logfile experiment/log.csv
At the end of the experiment, a CSV log file will be generated in the experiment folder with a name
that starts with log
.
Building the performance_test tool
For a simple example, see Dockerfile.rclcpp.
The performance_test tool is structured as a ROS 2 package, so colcon
is used to build it.
Therefore, you must source a ROS 2 installation:
source /opt/ros/rolling/setup.bash
Select a middleware plugin from this list. Then build the performance_test tool with the selected middleware:
mkdir -p ~/perf_test_ws/src
cd ~/perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
# At this stage, you need to choose which middleware you want to use
# The list of available flags is described in the middleware plugins section
# Square brackets denote optional arguments, like in the Python documentation.
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_PLUGIN=<plugin>
source install/setup.bash
Running an experiment
The performance_test experiments are run through the perf_test
executable.
To find the available settings, run with --help
(note the required and default arguments):
~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help
- The
-c
argument should match the selected middleware plugin from the build phase. - The
--msg
argument should be one of the supported message types, which are shown in the--help
output.
Single machine or distributed system?
Based on the configuration you want to test, the usage of the performance_test tool differs. The different possibilities are explained below.
For running tests on a single machine, you can choose between the following options:
- Intraprocess means that the publisher and subscriber threads are in the same process.
perf_test <options> --num-sub-threads 1 --num-pub-threads 1
- Interprocess means that the publisher and subscriber are in different processes. To test interprocess communication, two instances of the performance_test must be run, e.g.
# Start the subscriber first
perf_test <options> --num-sub-threads 1 --num-pub-threads 0 &
sleep 1 # give the subscriber time to finish initializing
perf_test <options> --num-sub-threads 0 --num-pub-threads 1
On a distributed system, testing latency is difficult, because the clocks are probably not perfectly synchronized between the two devices. To work around this, the performance_test tool supports relay mode, which allows for a round-trip style of communication:
# On the main machine
perf_test <options> --roundtrip-mode Main
# On the relay machine:
perf_test <options> --roundtrip-mode Relay
In relay mode, the Main machine sends messages to the Relay machine, which immediately sends the messages back. The Main machine receives the relayed message, and reports the round-trip latency. Therefore, the reported latency will be roughly double the latency compared to the latency reported in non-relay mode.
Single machine, single thread
An intra-thread configuration is experimentally supported, in which a publisher and subscriber both operate in the same thread. The publisher writes a messages, and the subscriber immediately takes it.
perf_test <options> -e INTRA_THREAD
Notes:
- This is only available when zero copy transfer is enabled
- This requires exactly one publisher and one subscriber
- This is not compatible with roundtrip mode
Middleware plugins
The performance test tool can measure the performance of a variety of communication solutions from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines.
The performance_test tool implements an executor that runs the publisher(s) and/or the subscriber(s) in their own thread.
The following plugins are currently implemented:
Eclipse Cyclone DDS
- Eclipse Cyclone DDS 0.9.0b1
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS
- Communication plugin:
-c CycloneDDS
- Docker file: Dockerfile.CycloneDDS
- Available transports:
- Cyclone DDS zero copy requires
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
--shared-memory
), LoanedSamples (--zero-copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero-copy
) | UDP |
- Cyclone DDS zero copy requires
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
Eclipse Cyclone DDS C++ binding
- Eclipse Cyclone DDS C++ bindings 0.9.0b1
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CYCLONEDDS_CXX
- Communication plugin:
-c CycloneDDS-CXX
- Docker file: Dockerfile.CycloneDDS-CXX
- Available transports:
- Cyclone DDS zero copy requires the
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
--shared-memory
), LoanedSamples (--zero-copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero-copy
) | UDP |
- Cyclone DDS zero copy requires the
RouDi
to be running.
-
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), SHMEM (
Eclipse iceoryx
- iceoryx (latest master as of Feb 13)
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=ICEORYX
- Communication plugin:
-c iceoryx
- Docker file: Dockerfile.iceoryx
- The iceoryx plugin is not a DDS implementation.
- The DDS-specific options (such as domain ID, durability, and reliability) do not apply.
- To run with the iceoryx plugin, RouDi must be running.
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |———–|———————|———————————–| | LoanedSamples | LoanedSamples | Not supported by performance_test |
eProsima Fast DDS
- FastDDS 2.6.2
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=FASTDDS
- Communication plugin:
-c FastRTPS
- Docker file: Dockerfile.FastDDS
- Available transports:
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| INTRA (default), LoanedSamples (
--zero-copy
) | SHMEM (default), LoanedSamples (--zero-copy
) | UDP |
OCI OpenDDS
- OpenDDS 3.13.2
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=OPENDDS
- Communication plugin:
-c OpenDDS
- Docker file: Dockerfile.OpenDDS
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | TCP | TCP | TCP |
RTI Connext DDS
- RTI Connext DDS 5.3.1+
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CONNEXTDDS
- Communication plugin:
-c ConnextDDS
- Docker file: Not available
- A license is required
- You need to source an RTI Connext DDS environment.
- If RTI Connext DDS was installed with ROS 2 (Linux only):
source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
- If RTI Connext DDS is installed separately, you can source the following script to set the
environment:
source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash
- If RTI Connext DDS was installed with ROS 2 (Linux only):
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |
RTI Connext DDS Micro
- Connext DDS Micro 3.0.3
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=CONNEXTDDSMICRO
- Communication plugin:
-c ConnextDDSMicro
- Docker file: Not available
- A license is required
- Available transports: | Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines | |——-|———————|——————–| | INTRA | SHMEM | UDP |
Framework plugins
The performance_test tool can also measure the end-to-end latency of a framework. In this case, the executor of the framework is used to run the publisher(s) and/or the subscriber(s). The potential overhead of the rclcpp or rmw layer is measured.
ROS 2
The performance test tool can also measure the performance of a variety of RMW implementations,
through the ROS 2 rclcpp::publisher
and rclcpp::subscriber
API.
- ROS 2
rclcpp::publisher
andrclcpp::subscriber
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=ROS2
(default) - Communication plugin:
- Callback with Single Threaded Executor:
-c rclcpp-single-threaded-executor
- Callback with Static Single Threaded Executor:
-c rclcpp-static-single-threaded-executor
-
rclcpp::WaitSet
:-c rclcpp-waitset
- Callback with Single Threaded Executor:
- Docker file: Dockerfile.rclcpp
-
Available underlying RMW implementations:
- ROS 2 Rolling is pre-configured to use
rmw_fastrtps_cpp
- Follow these instructions to use a different RMW implementation
- ROS 2 Rolling is pre-configured to use
- Available transports: depends on underlying RMW implementation
- LoanedSamples are available (
--zero-copy
) forROS_DISTRO = foxy
and above
- LoanedSamples are available (
Apex.OS
- Apex.OS
- CMake build flag:
-DPERFORMANCE_TEST_PLUGIN=APEX_OS
- It is also required to
source /opt/ApexOS/setup.bash
instead of a ROS 2 distribution
- It is also required to
- Communication plugin:
-c ApexOSPollingSubscription
- Docker file: Not available
- Available underlying RMW implementations:
rmw_apex_middleware
- Available transports:
| Pub/sub in same process | Pub/sub in different processes on same machine | Pub/sub in different machines |
|——-|———————|——————–|
| UDP (default), SHMEM (
--shared-memory
), LoanedSamples (--zero_copy
) | UDP (default), SHMEM (--shared-memory
), LoanedSamples (--zero_copy
) | UDP |
Analyze the results
After an experiment is run with the -l
flag, a log file is recorded. Both CSV
and JSON formats are supported. It is possible to add custom data to the log
file by setting theAPEX_PERFORMANCE_TEST
environment variable before running
an experiment, e.g.
# JSON format
export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"
Plot the results
To plot the results in the JSON or CSV log files, see the plotter README.
Architecture
Apex.AI’s Performance Testing in ROS 2 white paper (available here) describes how to design a fair and unbiased performance test, and is the basis for this project.
Each middleware has a different API. Thanks to the Plugin
abstraction, the core logic of
setting up and running an experiment is completely decoupled from the implementation details
of sending and receiving individual messages.
Exactly one Plugin
implementation is selected at build time. The design is similar to the
Abstract Factory pattern.
performance_test
declares, but does not define, a static factory method in the PluginFactory
class. Each middleware provides a definition for this factory method to create a concrete Plugin
implementation, and perf_test
calls this factory method directly.
An example plugin is available here.
Performance optimizations
- On linux-based platforms,
perf_test
writes0
to/dev/cpu_dma_latency
and holds open the file handle, which will prevent the CPU from entering any idle states for the duration of the experiment. This should result in lower message latency and lower variance in that latency.
Future extensions and limitations
- Communication frameworks like DDS have a huge amount of settings. This tool only allows the most common QOS settings to be configured. The other QOS settings are hardcoded in the application.
- Only one publisher per topic is allowed, because the data verification logic does not support matching data to the different publishers.
- Some communication plugins can get stuck in their internal loops if too much data is received. Figuring out ways around such issues is one of the goals of this tool.
- FastRTPS wait-set does not support timeouts which can lead to the receiving not aborting. In that case the performance test must be manually killed.
- Using Connext DDS Micro INTRA transport with
reliable
QoS and history kind set tokeep_all
is not supported with Connext Micro. Setkeep-last
as QoS history kind always when usingreliable
.
Possible additional communication which could be implemented are:
- Raw UDP communication
Building with limited resources
When building this tool, the compiler must perform a lot of template expansion. This can be overwhelming for a system with a low-power CPU or limited RAM. There are some additional CMake options which can reduce the system load during compilation:
- This tool includes many different message types, each with many different sizes. Reduce the number of
messages, and thus the compilation load, by disabling one or more message types. For example, to build
without
PointCloud
messages, add-DENABLE_MSGS_POINDCLOUD=OFF
to the--cmake-args
. The message types, and their options for enabling/disabling, can be found here.
Changelog for package performance_test
X.Y.Z (YYYY/MM/DD)
2.3.0 (2024/09/24)
Removed
- Moved
apex_performance_plotter
to its own package here
2.2.0 (2024/05/15)
Added
- performance_test can be built with ROS 2 Iron and Jazzy
Changed
- Renamed the
--dds-domain_id
CLI arg to--dds-domain-id
- When
--dds-domain-id
is unspecified, fall back to theROS_DOMAIN_ID
environment variable -
--zero-copy
has been separated into two flags:-
--shared-memory
: Enable shared-memory transfer in the plugin. This is meant to replace the need to manually set runtime flags viaCYCLONEDDS_URI
,APEX_MIDDLEWARE_SETTINGS
, etc. -
--loaned-samples
: When publishing messages in the plugin, borrow loaned samples instead of publishing by copy -
--zero-copy
is now an alias for--shared-memory --loaned-samples
- Supported plugins include:
-c CycloneDDS
-c CycloneDDS-CXX
-c ApexOSPollingSubscription
-
-c rclcpp-*
withRMW_IMPLEMENTATION=rmw_cyclonedds_cpp
-
-c rclcpp-*
withRMW_IMPLEMENTATION=rmw_fastrtps_cpp
-
2.1.0 (2024/04/17)
Added
- Add new function
prepare()
to the Publisher and Subscriber API, intended to allow participant discovery without blocking the main threadChanged
- Change the default
--history
arg fromKEEP_ALL
toKEEP_LAST
- Change the default
--history-depth
arg from1000
to16
- If
--expected-num-pubs
is unspecified, set it to the same value as-p
- If
--expected-num-subs
is unspecified, set it to the same value as-s
Fixed
- Removed an unused variable to fix a Clang build
- Remove unused variable names in the
Plugin
abstract class - Fix a potential lockup in PublisherTask on QNX
2.0.0 (2024/03/19)
Added
- Add experimental bazel support
bazel build //performance_test --//:plugin_implementation=//path/to/a/plugin
- Add a rudimentary socket-based plugin for testing the bazel support
-
bazel run //performance_test --//:plugin_implementation=//performance_test/plugins/demo:demo_plugin -- --help
Changed
-
- Instead of enabling/disabling each plugin, you select exactly one
with a CMake string option, for example:
colcon build --cmake-args -DPERFORMANCE_TEST_PLUGIN=ROS2
- Renamed the
--communication
CLI arg to--communicator
. The short-c
is unchanged.Removed
- Removed the deprecated CLI flags for QOS settings:
- Instead of
--reliable
, use--reliability RELIABLE
- Instead of
--transient
, use--durability TRANSIENT_LOCAL
- Instead of
--keep-last
, use--history KEEP_LAST
- Instead of
- Removed the obsolete
BoundedSequenceFlat
messages - Removed the superfluous
--msg-list
CLI flag. The--help
message already lists the available messages.Fixed
- Update the Apex.OS Runner to use
executor_runner::deferred
instead ofexecutor_runner::deferred_tag()
- Ensure that the first few published samples are sent at the expected rate
1.5.2 (YYYY/MM/DD)
Added
-
--prevent-cpu-idle
is available on QNXChanged
- JSON log files will contain all values in the
APEX_PERFORMANCE_TEST
dictionary, instead of the five specific values used previously - Switch to build as C++17 by default
Fixed
- Zero copy transfer is again enabled for the rclcpp publisher
1.5.0 (2023/06/14)
Added
- New CLI switch
--prevent-cpu-idle
(linux only). When specified, perf_test will use/dev/cpu_dma_latency
to request that the CPU not enter any sleep states, to potentially give more consistent results - Some smaller
Array
messages, down to 32 bits - Added support to the FastDDS plugin for bounded and unbounded sequences
Changed
- Update the README to better explain how to use this tool with Apex.OS
- In the
Runner
, allocate theAnalysisResult
s on the stack instead of usingshared_ptr
-
Subscriber
methods accept a callback parameter, instead of returning avector
of results, to reduce heap usage - Refactored the interaction between
SubscriberStats
andAnalysisResult
to remove the need for astd::vector
of latency samples, to reduce heap usage - Adjusted the
Array
message sizes to make the name match the contents - Updated
apex_os_communicator
to use the new zero-copy API
1.4.2 (2023/03/15)
Added
- Added
perfplot
support for JSON log filesChanged
- Migrate the Apex.OS target to use
rosidl_get_typesupport_target
- Preallocate the JSON logger’s string buffer to prevent reallocations after the experiment begins
1.4.1 (2023/02/23)
Changed
- Updated the iceoryx plugin to the latest master as of Feb 13
1.4.0 (2023/02/20)
Added
- New message type
BoundedSequenceFlat
- This is a
BoundedSequence
with the@flat
annotation - Sizes range from 1kB to 8MB, like
Array
andBoundedSequence
Changed
- This is a
- Messages of different types can be optionally included via CMake args:
-
-DENABLE_MSGS_ARRAY
(default ON) -
-DENABLE_MSGS_STRUCT
(default ON) -
-DENABLE_MSGS_POINT_CLOUD
(default ON) -
-DENABLE_MSGS_BOUNDED_SEQUENCE
(default OFF) -
-DENABLE_MSGS_BOUNDED_SEQUENCE_FLAT
(default OFF) -
-DENABLE_MSGS_UNBOUNDED_SEQUENCE
(default OFF) -
-DENABLE_MSGS_ALL
(default OFF)- when ON, overrides the other defaults to ON
- you can still optionally exclude some messages by explicitly setting them to OFF
Removed
-
- Removed a few messages:
- Range
- RadarTrack
- RadarDetection
- NavSatFix
Fixed
- In all cases, including loaned messages, capture the timestamp as the last step of initializing the message
1.3.7 (2023/01/04)
1.3.6 (2023/01/03)
Fixed
- Set the correct
IDL_GEN_ROOT
for rclcpp plugins
1.3.5 (2022/12/05)
Fixed
- Exit cleanly when a publisher process terminates before a subscriber process
1.3.4 (2022/11/28)
Changed
- Updated Apex.OS plugins to use the unified
LoanedSample::data()
1.3.3 (2022/11/28)
Fixed
- Implement the missing
take()
method inApexOSPollingSubscriptionSubscriber
1.3.2 (2022/11/21)
Fixed
- Capture the
this
pointer in the lambda in the iceoryx publisher
1.3.1 (2022/11/21)
Added
- New Apex.OS plugin, compatible with the
ThreadedRunner
s- The
INTER_THREAD
andINTRA_THREAD
execution strategies, combined with-c ApexOSPollingSubscription
, will use theThreadedRunner
instances - The new
APEX_SINGLE_EXECUTOR
execution strategy will add all publishers and subscribers to a single Apex.OS Executor - The new
APEX_EXECUTOR_PER_COMMUNICATOR
execution strategy will add each publisher and each subscriber to its own Apex.OS Executor instance - The new
APEX_CHAIN
execution strategy will add a publisher and subscriber as a chain of nodes to an Apex.OS ExecutorChanged
- The
- Refactored FastRTPS communicator plugin:
- Uses DDS compliant API
- Code generator updated
- Implementation for
publish_loaned()
- Dockerfile improvements
Removed
- CLI arg
--disable-async
. Synchronous / asynchronous publishing should be configured externally depending on the communication mean used.
1.3.0 (2022/08/25)
Added
- New execution strategy option:
- The default
-e INTER_THREAD
runs each publisher and subscriber in its own separate thread, which matches the previous behavior - A new
-e INTRA_THREAD
, which runs a single publisher and subscriber in the same thread. The publisher writes, and the subscriber immediately takes it - For Apex.OS specifically, some optimized execution strategies which use the
proprietary Apex.OS executor
Changed
- The default
- Significantly refactored the communicator plugins:
- Each plugin is split into an implementation of a
Publisher
and aSubscriber
, instead of a singleCommunicator
- The plugin is no longer responsible for managing the metrics, such as sample count, lost samples, and latency
- The plugin does not require any special logic to support roundtrip mode
- It is safe for the plugins to initialize their data writers and readers
at construction time, instead of delaying the initialization to the first
call of
publish()
orupdate_subscription()
- Split
publish()
intopublish_copy()
andpublish_loaned()
- Each plugin is split into an implementation of a
- Significantly refactored the runner framework:
- The runner framework is responsible for the experiment metrics
- It manages the roundtrip mode logic
- It is extensible for different execution strategies or thread configurations
- The iceoryx plugin now uses the untyped API, for improved performance
1.2.1 (2022/06/30)
Fixed
- Capture the timestamp as soon as a message is received, instead of just before storing the metrics, to reduce the reported latency to a more correct value
1.2.0 (2022/06/28)
Changed
- The CLI arguments for specifying the output type have changed:
- For console output, updated every second, add
--print-to-console
- For file output, use
--logfile my_file.csv
or--logfile my_file.json
- The type will be deduced from the file name
- If neither of these options is specified, then a warning will print, and the experiment will still run
- For console output, updated every second, add
- The linter configurations are now configured locally. This means that the output
of
colcon test
should be the same no matter the installed ROS distribution. - The
--zero-copy
arg is now valid even if the publisher and subscriber(s) are in the same processRemoved
- The publisher and subscriber loop reserve metrics are no longer recorded or reported
Fixed
- CPU usage will no longer be stuck at
0
Removed
- The pub/sub loop reserve time metrics
1.1.2 (2022/06/08)
Changed
- Use
steady_clock
for all platforms, including QNX QOS
1.1.1 (2022/06/07)
Changed
- Significant refactor to simplify the analysis pipeline
Fixed
- Add some missing definitions when Apex.OS is enabled, but the rclcpp plugins are disabled
1.1.0 (2022/06/02)
Added
- New Apex.OS Polling Subscription plugin
- Compatibility with ROS2 Humble
1.0.0 (2022/05/12)
Added
- More expressive perf_test CLI args for QOS settings
- A plugin for Cyclone DDS with C++ bindings v0.9.0b1
Changed
- CLI args for QOS settings:
--reliability <RELIABLE|BEST_EFFORT>
--durability <TRANSIENT_LOCAL|VOLATILE>
--history <KEEP_LAST|KEEP_ALL>
-
master
branch is compatible with many ROS2 distributions:- dashing
- eloquent
- foxy
- galactic
- rolling
Deprecated
- CLI flags for QOS settings:
--reliable
--transient
-
--keep-last
Removed
- The branches for specific ROS2 distributions have been deleted
Fixed
- CI jobs and Dockerfiles are decoupled from the middleware bundled with the ROS2 distribution
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
rclcpp | |
ros_environment | |
ament_cmake | |
rosidl_default_generators | |
rmw_implementation | |
rosidl_default_runtime | |
ament_cmake_gtest | |
ament_lint_auto | |
ament_lint_common |
System Dependencies
Name |
---|
git |
Dependant Packages
Name | Deps |
---|---|
performance_report |