Skip to content

mabel.data.writers.stream_writer

joocer edited this page Jun 4, 2021 · 4 revisions

CLASS: StreamWriter ()

Create a Data Writer to write data records into partitions.

Parameters

  • dataset - string (optional)
    The name of the dataset - this is used to map to a path
  • schema - mabel.validator.Schema (optional)
    Schema used to test records for conformity, default is no schema and therefore no validation
  • format - string (optional)
  • jsonl: raw json lines - lzma: lzma compressed json lines - zstd: zstandard compressed json lines (default) - parquet: Apache Parquet
  • idle_timeout_seconds - integer (optional)
    The number of seconds to wait before evicting writers from the pool for inactivity, default is 30 seconds
  • writer_pool_capacity - integer (optional)
    The number of writers to leave in the writers pool before writers are evicted for over capacity, default is 5
  • blob_size - integer (optional)
    The maximum size of blobs, the default is 32Mb
  • inner_writer - BaseWriter (optional)
    The component used to commit data, the default writer is the NullWriter

Note

  • Different inner_writers may take or require additional parameters.

append (record)

Append a new record to the Writer

Parameters

  • record - dictionary
    The record to append to the Writer

Returns

  • integer
    The number of records in the current blob

finalize ()

pool_attendant ()

Writer Pool Management


This file has been automatically generated, it is not the truth. If in doubt the code will tell you unambiguously what it does.

Clone this wiki locally