Releases: streamingfast/substreams
v1.6.0
Compatibility
Note As of graph-node release v0.35.0, substreams that use "index modules" are not yet supported and cannot be used for Substreams-powered-Subgraphs
Upgrading
Note Upgrading to v1.6.0 will require changing the tier1 and tier2 versions concurrently, as the internal protocol has changed.
Highlights
Index Modules and Block Filter
- Index Modules and Block Filter can now be used to speed up processing and reduce the amount of parsed data.
- When indexes are used along with the
BlockFilter
attribute on a mapper, blocks can be skipped completely: they will not be run in downstreams modules or sent in the output stream, except in live segment or in dev-mode, where an empty 'clock' is still sent. - See https://github.com/streamingfast/substreams-foundational-modules for an example implementation
- Blocks that are skipped will still appear in the metering as "read bytes" (unless a full segment is skipped), but the index stores themselves are not "metered"
Scheduling / speed improvements
- The scheduler no longer duplicates work in the first segments of a request with multiple stages.
- Fix all issues with running a substreams where modules have different "initial blocks"
- Maximum Tier1 output speed improved for data that is already processed
- Tier1 'FileWalker' now polls more aggressively on local filesystem to prevent extra seconds of wait time.
Fixed
- Fix a bug in the
gui
that would crash when trying tor
estart the stream. - fix total read bytes in case data already cache
- Fixed issues when processing modules with different initialBlocks
Added
- New environment variable
SUBSTREAMS_WORKERS_RAMPUP_TIME
can specify the initial delay before tier1 will reach the number of tier2 concurrent requests. - Add 'clock' output to
substreams run
command, useful mostly for performance testing or pre-caching - (alpha) Introduce the
wasip1/tinygo-v1
binary type.
Changed / Removed
- Disabled
otelcol://
tracing protocol, its mere presence affected performance. - Previous value for
SUBSTREAMS_WORKERS_RAMPUP_TIME
was4s
, now set to0
, disabling the mechanism by default.
v1.5.6
Fixes
- Fix bug where substreams tier2 would sometimes write outputs with the wrong tag (leaked from another tier1 request)
Remove
- Removed MaxWasmFuel since it is not supported in Wazero
v1.5.5
Fixes
- bump wazero to fix issue with certain substreams causing the server process to freeze
Add
- add
substreams_tier1_worker_retry_counter
metric to count all worker errors returned by tier2 - add
substreams_tier1_worker_rejected_overloaded_counter
metric to count only worker errors with string "service currently overloaded"
v1.5.4
Fixes
- fix a possible panic() when an request is interrupted during the file loading phase of a squashing operation.
- fix a rare possibility of stalling if only some fullkv stores caches were deleted, but further segments were still present.
- fix stats counters for store operations time
v1.5.3
Performance, memory leak and bug fixes
Server
- fix memory leak on substreams execution (by bumping wazero dependency)
- prevent substreams-tier1 stopping if blocktype auto-detection times out
- allow specifying blocktype directly in Tier1 config to skip auto-detection
- fix missing error handling when writing output data to files. This could result in tier1 request just "hanging" waiting for the file never produced by tier2.
- fix handling of dstore error in tier1 'execout walker' causing stalling issues on S3 or on unexpected storage errors
- increase number of retries on storage when writing states or execouts (5 -> 10)
- prevent slow squashing when loading each segment from full KV store (can happen when a stage contains multiple stores)
Gui
- prevent 'gui' command from crashing on 'incomplete' spkgs without moduledocs (when using --skip-package-validation)
v1.5.2
- Fix a context leak causing tier1 responses to slow down progressively
v1.5.1
- Fix a panic on tier2 when not using any wasm extension.
- Fix a thread leak on metering GRPC emitter
- Rollback scheduler optimisation: different stages can run concurrently if they are schedulable. This will prevent taking much time to execute when restarting close to HEAD.
- Add
substreams_tier2_active_requests
andsubstreams_tier2_request_counter
prometheus metrics - Fix the
tools tier2call
method to make it work with the new 'generic' tier2 (added necessary flags)
v1.5.0
Operators
- A single substreams-tier2 instance can now serve requests for multiple chains or networks. All network-specific parameters are now passed from Tier1 to Tier2 in the internal ProcessRange request.
Important
Since the tier2
services will now get the network information from the tier1
request, you must make sure that the file paths and network addresses will be the same for both tiers.
Tip
The cached 'partial' files no longer contain the "trace ID" in their filename, preventing accumulation of "unsquashed" partial store files. The system will delete files under '{modulehash}/state' named in this format{blocknumber}-{blocknumber}.{hexadecimal}.partial.zst
when it runs into them.
v1.4.0
Client
-
Implement a
use
feature, enabling a module to use an existing module by overriding its inputs or initial block. (Inputs should have the same output type than override module's inputs).
Check a usage of this new feature on the substreams-db-graph-converter repository. -
Fix panic when using '--header (-H)' flag on
gui
command -
When packing substreams, pick up docs from the README.md or README in the same directory as the manifest, when top-level package.doc is empty
-
Added "Total read bytes" summary at the end of 'substreams run' command
Server performance in "production-mode"
Some redundant reprocessing has been removed, along with a better usage of caches to reduce reading the blocks multiple times when it can be avoided. Concurrent requests may benefit the other's work to a certain extent (up to 75%!)(MISSING)
-
All module outputs are now cached. (previously, only the last module was cached, along with the "store snapshots", to allow parallel processing). (this will increase disk usage, there is no automatic removal of old module caches)
-
Tier2 will now read back mapper outputs (if they exist) to prevent running them again. Additionally, it will not read back the full blocks if its inputs can be satisfied from existing cached mapper outputs.
-
Tier2 will skip processing completely if it's processing the last stage and the
output_module
is a mapper that has already been processed (ex: when multiple requests are indexing the same data at the same time) -
Tier2 will skip processing completely if it's processing a stage that is not the last, but all the stores and outputs have been processed and cached.
-
The "partial" store outputs no longer contain the trace ID in the filename, allowing them to be reused. If many requests point to the same modules being squashed, the squasher will detect if another Tier1 has squashed its file and reload the store from the produced full KV.
-
Scheduler modification: a stage now waits for the previous stage to have completed the same segment before running, to take advantage of the cached intermediate layers.
-
Improved file listing performance for Google Storage backends by 25%!
Operator concerns
-
Tier2 service now supports a maximum concurrent requests limit. Default set to 0 (unlimited).
-
Readiness metric for Substreams tier1 app is now named
substreams_tier1
(was mistakenly calledfirehose
before). -
Added back deadiness metric for Substreams tiere app (named
substreams_tier2
). -
Added metric
substreams_tier1_active_worker_requests
which gives the number of active Substreams worker requests a tier1 app is currently doing against tier2 nodes. -
Added metric
substreams_tier1_worker_request_counter
which gives the total Substreams worker requests a tier1 app made against tier2 nodes.