![]() |
digestible package from digestible repodigestible |
Package Summary
Tags | No category tags. |
Version | 0.1.0 |
License | Apache License 2.0 |
Build type | AMENT_CMAKE |
Use | RECOMMENDED |
Repository Summary
Description | A modern C++ implementation of a merging t-digest data structure. |
Checkout URI | https://github.com/tier4/digestible.git |
VCS Type | git |
VCS Version | main |
Last Updated | 2025-06-13 |
Dev Status | MAINTAINED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Temkei Kem
Authors
digestible
A modern C++ implementation of a merging t-digest
data structure.
t-digests
are data structures that provide highly accurate statistics for a large number of inputs in a small memory footprint. Things like minimum, maximum, average, quantile, and continuous distribution are readily calculated. Think improved histogram: compact way of accurately summarizing data where the bucket sizes need not be set in advance.
Notable features include:
- No runtime memory allocation.
- Strict, user-defined upper bound on structure size.
- Written in modern C++ (c++17).
- Templated
mean
andweight
data types - save memory for smaller input ranges with a smaller data type. - Speed (average 61ns insertion time per element with a c. 2019 Intel i7)
Requirements:
- A C++17 compatible compiler (developed with
clang 7
using--std=c++17
). -
CMake
(for building utility and running tests).
Sample Code
#include "digestible/digestible.h"
using namespace std;
using namespace digestible;
// Structure will have at most 20 elements.
// Also default to values of type float and weights of type unsigned.
tdigest digest(20);
// Add some data into the structure.
digest.insert(5);
digest.insert(10);
digest.insert(4);
// digestible buffers input data; explicitly merge it before querying statistics.
digest.merge();
printf("50th quantile is: %f\n", digest.quantile(50));
Additional examples can be found in the tests
directory.
digestible
Utility
To assist in integrating digestible
in your implementation, please see utility
directory.
Compression Factor
As the lone parameter to the digestible
constructor, compression factor requires some discussion. Internally t-digests
use the compression factor to distribute values throughout the input range. When combined with an internal distribution function a t-digest
will distribute data such that the head and tail have fewer points and the middle has significantly more. Higher compression factors will yield better accuracy at the expense of increased memory usage. The digestible utility
can be used to help strike this balance.
Data Structure Size
For a given compression factor f
, value type size v
, and weight type size w
, the approximate t-digest
size in bytes is:
f * max(v, w) * 8 + 64
Acknowledgments
This work would not be possible without the original algorithm and reference implementation by tdunning. A local copy of the original t-digest
paper can be found here.