Repository Summary
Description | |
Checkout URI | https://github.com/aws-samples/amazon-eks-autonomous-driving-data-service.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2025-06-07 |
Dev Status | UNKNOWN |
Released | UNRELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Packages
Name | Version |
---|---|
a2d2_msgs | 1.0.0 |
fordav_msgs | 1.0.0 |
fordav_pointcloud | 1.0.0 |
README
Autonomous Driving Data Service (ADDS)
Overview
ADDS is a ROS based data service for replaying selected drive scenes from multimodal driving datasets. The multimodal dataset used with ADDS is typically gathered during the development of advanced driver assistance systems (ADAS), or autonomous driving systems (ADS), and comprises of 2D image, 3D point cloud and vehicle bus data.
One common reason for replaying drive data is to visualize the data. ADDS supports visualization of replayed data using any ROS visualization tool, for example Foxglove studio.
ADDS is supported on ROS 1 noetic
, and ROS 2 humble
. ADDS is pre-configured to use Audi Autonomous Driving Dataset (A2D2) and Ford Multi-AV Seasonal Dataset. ADDS can be extended to other autonomous driving datasets.
In the following sections, we describe the ADDS logical dataset design and runtime data services. This is followed by a step-by-step tutorial for building and using ADDS with A2D2 dataset. Finally, we discuss extending ADDS to other datasets.
Logical dataset design
Any multimodal dataset used with ADDS is assumed to contain drive data gathered from a homogeneous vehicle fleet. By homogeneous, we mean that all the vehicles in the fleet have the same vehicle sensor array configuration and vehicle bus data attributes. Each vehicle in the fleet can have distinct calibration data for the sensor array configuration.
Each ADDS runtime instance serves one multimodal dataset. To serve multiple datasets, you need corresponding number of ADDS runtime instances.
Each multimodal dataset is comprised of multimodal frame and tabular data.
Frame data
The serialized multimodal data acquired in the vehicle in some file-format, for example, MDF 4, MCAP, or Ros bag, must be decomposed into discreet timestamped 2D image and 3D point cloud data frames, and the frames must be stored in an Amazon S3 bucket under some bucket prefix.
The workflow for decomposing the serialized data and storing it in S3 is not prescribed by ADDS. For many public datasets, for example A2D2: Audi Autonomous Driving Dataset, and Ford Multi-AV Seasonal Dataset, the serialized data has already been decomposed into discreet data frames. However, for these types of public datasets, one may still need to extract the discreet data frames from compressed archives (e.g. Zip or Tar files), and upload them to the ADDS S3 bucket.
For the discreet data frames stored in the S3 bucket, we need a fast retrieval mechanism so we can replay the data frames on-demand. For that purpose, we build a data frame manifest and store it in an Amazon Redshift database table: This is described in detail in drive data. The manifest contains pointers to the data frames stored in S3 bucket. For the A2D2 dataset, the extraction and loading of the drive data is done automatically during the step-by-step tutorial.
The vehicle bus data is stored in an Amazon Redshift table and is described in vehicle bus data.
Tabular data
Each logical multimodal dataset must use a distinct Amazon Redshift named schema. Below, we describe the data definition language (DDL) for creating the required Amazon Redshift tables within a given logical dataset’s schema_name
.
For the A2D2 dataset, we use a2d2
as the Redshift schema name, and all the required tables are created automatically during the step-by-step tutorial.
Amazon Redshift does not impose Primary and Foreign key constraints. This point is especially important to understand so you can avoid duplication of data in Redshift tables when you extend ADDS to other datasets.
Vehicle data
Vehicle data is stored in schema_name.vehicle
table. The DDL for the table is shown below::
CREATE TABLE IF NOT EXISTS schema_name.vehicle
(
vehicleid VARCHAR(255) NOT NULL ENCODE lzo,
description VARCHAR(255) ENCODE lzo,
PRIMARY KEY (vehicleid)
)
DISTSTYLE ALL;
The vehicleid
refers to the required unique vehicle identifier. The description
is optional.
For the A2D2 dataset, the vehicle data is automatically loaded into the a2d2.vehicle
table from vehicle.csv during the step-by-step tutorial.
Sensor data
Sensor data is stored in schema_name.sensor
table. The DDL for the table is shown below::
CREATE TABLE IF NOT EXISTS schema_name.sensor
(
sensorid VARCHAR(255) NOT NULL ENCODE lzo,
description VARCHAR(255) ENCODE lzo,
PRIMARY KEY (sensorid)
)
DISTSTYLE ALL;
Each sensorid
must be unique in a dataset and must refer to a sensor in the homogeneous vehicle sensor array configuration. The description
is optional.
The implicit sensorid
value Bus
is reserved and denotes vehicle bus: This implicit value is not stored in the schema_name.sensor
table.
For the A2D2 dataset, the sensor data is automatically loaded into the a2d2.sensor
table from sensors.csv during the step-by-step tutorial.
Drive data
Image and point cloud data frames must be stored in an Amazon S3 bucket. Pointers to the S3 data frames must be stored in the schema_name.drive_data
table. The DDL for the table is shown below:
create table
schema_name.drive_data
(
vehicle_id varchar(255) encode Text255 not NULL,
scene_id varchar(255) encode Text255 not NULL,
sensor_id varchar(255) encode Text255 not NULL,
data_ts BIGINT not NULL sortkey,
s3_bucket VARCHAR(255) encode lzo NOT NULL,
s3_key varchar(255) encode lzo NOT NULL,
primary key(vehicle_id, scene_id, sensor_id, data_ts),
FOREIGN KEY(vehicle_id) references a2d2.vehicle(vehicleid),
FOREIGN KEY(sensor_id) references a2d2.sensor(sensorid)
)
DISTSTYLE AUTO;
The scene_id
is an arbitrary identifier for a unique drive scene.
The data_ts
is the acquisition timestamp for a discreet data frame and is typically measured in international atomic time (TAI).
The s3_bucket
is the name of the Amazon S3 bucket, and the s3_key
is the key for the data frame stored in the S3 bucket.
For the A2D2 dataset, the drive data is automatically loaded into the a2d2.drive_data
table during the step-by-step tutorial.
File truncated at 100 lines see the full file