-
Notifications
You must be signed in to change notification settings - Fork 2
mabel.data.writers.stream_writer
joocer edited this page Jun 4, 2021
·
4 revisions
Create a Data Writer to write data records into partitions.
-
dataset - string (optional)
The name of the dataset - this is used to map to a path -
schema - mabel.validator.Schema (optional)
Schema used to test records for conformity, default is no schema and therefore no validation - format - string (optional)
- jsonl: raw json lines - lzma: lzma compressed json lines - zstd: zstandard compressed json lines (default) - parquet: Apache Parquet
-
idle_timeout_seconds - integer (optional)
The number of seconds to wait before evicting writers from the pool for inactivity, default is 30 seconds -
writer_pool_capacity - integer (optional)
The number of writers to leave in the writers pool before writers are evicted for over capacity, default is 5 -
blob_size - integer (optional)
The maximum size of blobs, the default is 32Mb -
inner_writer - BaseWriter (optional)
The component used to commit data, the default writer is the NullWriter
- Different inner_writers may take or require additional parameters.
Append a new record to the Writer
-
record - dictionary
The record to append to the Writer
-
integer
The number of records in the current blob
Writer Pool Management
This file has been automatically generated, it is not the truth. If in doubt the code will tell you unambiguously what it does.