-
Notifications
You must be signed in to change notification settings - Fork 2
how_to_read_a_dataset
For information on how to call the reader see the reader documentation.
A Dataset is a set of records, these fall into three broad categories:
- Static Data Data that doesn't change and is always the same - this is generally reference data, like conversion rates between units.
- Batch Data Data that you're able to get a complete copy of, but the copy may only be true for a period of time, like an employee list; you can get a complete list of employees but there are joiners and leavers so the list may change daily.
- Stream Data Data that doesn't end, it's just a constant feed of new information, like readings from a thermometer; you just append any new readings to the existing set of readings.
There are two main ways to filter data on reading; filtering by date and filtering by attributes.
All reads are filtered by date unless the reader is created with the parameter
raw_path
set to True
Attribute filters are written using lists of tuples. Each tuple has the format:
(key
, op
, value
). When run the filter will extract the key
field
from the dictionary and compare to the value
using the operator op
.
The supported op
values are: =
or ==
, !=
, <
, >
, <=
, >=
, in
,
!in
(not in), contains
, !contains
(doesn't contain) and like
.
If the op
is in
or !in
, the value
must be a collection such as a
list, a set or a tuple. like
performs similar to the SQL operator; %
is a multi-character wildcard and _
is a single character wildcard.
Lists of filters are ANDed together, lists of lists are ORed together:
('name', '==', 'jupiter')
[('name', '==', 'jupiter')]
Both these variations return records where the name
field is jupiter
.
These are both single-condition filters.
[('name', '==', 'jupiter'), ('size', '>', '1000000')]
Returns records where the name
field is jupiter
AND the size
field is
greater than 1 million.
This is a list of conditions that are ANDed together.
[[('name', '==', 'jupiter')], [('name', '==', 'saturn')]]
Returns records where the name
field is jupiter
OR the name
field is
saturn
This is a list of conditions that are ORed together.