You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Opening a parent issue to track Metrics SDK improvements for Stable release.
Background
The primary function of the Metrics SDK is to accept a number <T> along with a slice of KeyValue pairs <T>, &[KeyValue], aggregating these measurements in memory and exporting the aggregated values to Readers/Exporters as needed. Our main goals are ensuring correctness, thread-safety, memory-efficiency and high performance, particularly on the "hot path" where measurement reporting occurs, as this demands the utmost efficiency. Correctness/thread-safety/memory-efficiency requires extensive testing via unit tests and stress testing.
Performance issues
Cloning and allocation on hot path - A significant portion of the overhead involves cloning the incoming slice to prepare AttributeSet. We can avoid this in many cases by using a thread-local Vec, which would reduce memory allocations, but still requires copy. Copy as well can be avoided by carefully designing temp data structures to hold references only.
Sorting of the Keys - This is a "identity" requirement, so cannot be avoided entirely. However, it is possible to avoid this in the common case, by storing both sorted and original orders for quick lookups.
De-deduplication of Attributes with same Keys - This is another "identify" requirement, but similar idea as above can be used to avoid this in the common case.
Contention - The use of a Mutex around the HashMap for aggregations leads to heavy contention. Replacing it with a RwLock and applying interior mutability could lessen this issue, though sharding may be necessary for further scalability improvements as demonstrated here.
Some issues like calculating hash outside of lock, special casing 0-attributes etc. were addressed already. Also, a lot of ideas were discussed in the past (Community meetings, PRs, issues). I have attempted prototyping several of them here: https://github.com/cijothomas/metrics-mini/tree/main/metrics/src. A lot of the issues from 1,2,3, part of 4 has been addressed in the prototype, giving huge performance improvements. I plan to incorporate them to this repo soon.
It is unlikely that we fix all performance issues for 1.0, but the goal is to ensure that the fixes can be continued even after 1.0 without any breaking changes. This requires trimming off unnecessary public APIs, and also to avoid exposing any internals to readers/exporters.
Correctness issues:
Lack of adequate testing - The existing test suite does not sufficiently confirm the accuracy of aggregations. Although a few tests have been introduced to demonstrate known issues (see this and this), a lot more thorough testing is required.
There are virtually no tests in multi-thread setup. While Rust compiler protects from some issues, it cannot ensure correctness in anyway, and those require carefully orchestrated tests.
For memory efficiency tests also, stress tests should be leveraged.
Most of the correctness issues can fixed via better test coverage. One thing to note is that "Views" feature expands the testing matrix significantly due to its capability to alter aggregations/attributes and produce multiple metrics streams from a single measurement. This is the main reason to remove "Views" from the scope of 1st stable release.
The text was updated successfully, but these errors were encountered:
Opening a parent issue to track Metrics SDK improvements for Stable release.
Background
The primary function of the Metrics SDK is to accept a number
<T>
along with a slice of KeyValue pairs<T>, &[KeyValue]
, aggregating these measurements in memory and exporting the aggregated values to Readers/Exporters as needed. Our main goals are ensuring correctness, thread-safety, memory-efficiency and high performance, particularly on the "hot path" where measurement reporting occurs, as this demands the utmost efficiency. Correctness/thread-safety/memory-efficiency requires extensive testing via unit tests and stress testing.Performance issues
AttributeSet
. We can avoid this in many cases by using a thread-local Vec, which would reduce memory allocations, but still requires copy. Copy as well can be avoided by carefully designing temp data structures to hold references only.RwLock
and applying interior mutability could lessen this issue, though sharding may be necessary for further scalability improvements as demonstrated here.Some issues like calculating hash outside of lock, special casing 0-attributes etc. were addressed already. Also, a lot of ideas were discussed in the past (Community meetings, PRs, issues). I have attempted prototyping several of them here: https://github.com/cijothomas/metrics-mini/tree/main/metrics/src. A lot of the issues from 1,2,3, part of 4 has been addressed in the prototype, giving huge performance improvements. I plan to incorporate them to this repo soon.
It is unlikely that we fix all performance issues for 1.0, but the goal is to ensure that the fixes can be continued even after 1.0 without any breaking changes. This requires trimming off unnecessary public APIs, and also to avoid exposing any internals to readers/exporters.
Correctness issues:
Most of the correctness issues can fixed via better test coverage. One thing to note is that "Views" feature expands the testing matrix significantly due to its capability to alter aggregations/attributes and produce multiple metrics streams from a single measurement. This is the main reason to remove "Views" from the scope of 1st stable release.
The text was updated successfully, but these errors were encountered: