-
Notifications
You must be signed in to change notification settings - Fork 11
JSON specs
A JSON spec encodes all of the instructions for processing HXL data in a single JSON object tree. Here is a simple example:
{
"input": "http://example.org/data.csv",
"recipe": [
{
"filter": "with_rows",
"queries": "#org=unicef"
},
{
"filter": "count",
"tags": "#adm1+name"
}
]
}
This spec will load a HXL-hashtagged dataset from the fictitious URL "http://example.org/data.csv", apply a row filter, then apply a count filter.
The top-level spec object has the following properties (only input is required):
- input: the input source, which may be a URL, local filename, or nested JSON spec.
- allow_local: if 1 (true), allow local filenames (default is 0/false).
- sheet_index: the zero-based index of a sheet in an Excel workbook (default is to use the first sheet with HXL hashtags, or if none has HXL hashtags, the first sheet in the workbook).
- timeout: the number of seconds to wait before timing out an HTTP connection
- verify_ssl: if 0 (false), do not verify SSL certificates; this can be useful for working with self-signed certs. The default is 1/true.
- http_headers: a list of HTTP header/value pairs, e.g. for authorization.
- encoding: the input character encoding to use (default is the encoding returned with an HTTP response, or "utf-8").
- tagger: a spec for adding HXL hashtags to non-tagged data (see below).
- recipe: a list of HXL filters to apply to the data (see below).
The value of the top-level tagger property is a JSON object specifying how to apply HXL hashtags to a non-hashtagged dataset. Here is a simple example:
"tagger": {
"match_all": true,
"specs": {
"country": "#country+name",
"iso3": "#country+code",
"cluster": "#section+cluster",
"organisation": "#org"
}
}
}
Properties:
- match_all: if 1 (true), require that the entire text header match.
- specs: a JSON object where each property name is a string to match against text headers in the dataset (case- and whitespace-insensitive), and the property value is the HXL hashtag and attributes to add. If match_all is true, then the entire header must match after case and space normalisation; otherwise, the string simply must appear somewhere in the header.
The value of the top-level recipe property is a list of JSON objects, each configuring a HXL filter to apply to the data, in the order specified. Here is a simple example:
"recipe": [
{
"filter": "sort"
}
]
In this case, there are no extra properties, so the filter will simply do a default sort on the data. Most filters include (or require) additional properties in their configuration objects. Here is a more-complex example of a single filter in a recipe:
{
"filter": "clean",
"date": "#date+start",
"date_format": "%Y-%m"
}
The filter property is the only required one. See each filter's wiki page for more information about the properties that it supports (additional required properties in bold):
filter property | Filter | Additional properties |
---|---|---|
add_columns | Add columns filter | specs, before |
append | Append datasets filter | append_sources, add_columns, queries |
append_external_list | Append datasets filter (external list) | source_list_url, add_columns, queries |
cache | Cache filter | max_rows |
clean_data | Clean data filter | whitespace, upper, lower, date, date_format, number, number_format, latlon, purge, queries |
count | Count rows filter | patterns, aggregators, queries |
dedup | Deduplicate rows filter | patterns, queries |
expand_lists | Expand lists filter | patterns, separator, correlate, queries |
explode | Explode data filter | header_attribute, value_attribute |
fill_data | Fill data filter | patterns, queries |
implode | Implode data filter | label_pattern, value_pattern |
jsonpath | JSONPath filter | path, patterns, queries |
merge_data | Merge columns filter | merge_source, keys, tags, replace, overwrite, queries |
rename_columns | Rename columns filter | specs |
replace_data | Replace data filter | original, replacement, pattern, use_regex, queries |
replace_data_map | Replace data filter (external map) | map_source, queries |
sort | Sort rows filter | keys, reverse |
with_columns | Cut columns filter | includes |
with_rows | Select rows filter | queries, mask |
without_columns | Cut columns filter | excludes, skip_untagged |
without_rows | Select rows filter | queries, mask |
Standard: http://hxlstandard.org | Mailing list: [email protected]