Select rows filter

TODO: update to include Python and the HXL Proxy as well.

The hxlselect command-line tool creates a new copy of a HXL dataset including only the rows that match certain criteria. You can use this tool to create (for example) a dataset containing only information for a specific geographical region, sector, beneficiary group, etc.

This command is especially useful in pipelines: you can use it to select only certain rows for other commands to act on. If you provide multiple filters, they act as a logical OR; use hxlselect again in a pipeline to get a logical AND.

For example, to create a GeoJSON map layer of only rows related to the WASH sector, you can use a pipeline like

hxlselect --query sector=WASH my-data.csv | \
  hxl2geojson > map-layer.json

There are many more similar examples in the HXL cookbook.

Usage

usage: hxlselect [-h] [-q tag=value] [-v] [infile] [outfile]

Filter rows in a HXL dataset.

positional arguments:
  infile                HXL file to read (if omitted, use standard input).
  outfile               HXL file to write (if omitted, use standard output).

optional arguments:
  -h, --help            show this help message and exit
  -q tag=value, --query tag=value
                        expression for filtering (use multiple times for
                        logical OR): <hashtag><op><value>, where <op> is =,
                        !=, <, <=, >, >=, ~, or !~
  -v, --invert          Show only lines *not* matching criteria

Note that empty fields always fail a comparison. That means that --query adm1!=XXX will not match an empty field, only one that has an explicit value.

Comparison operators

hxlselect supports the following comparison operators:

`=`	The hashtag value is identical to the value provided.
`!=`	The hashtag value is not identical to the value provided.
`<`	The hashtag value is less than the value provided (may be unpredictable for non-numeric values).
`<=`	The hashtag value is less than or equal to the value provided (may be unpredictable for non-numeric values).
`>`	The hashtag value is greater than the value provided (may be unpredictable for non-numeric values).
`>=`	The hashtag value is greater than or equal to the value provided (may be unpredictable for non-numeric values).
`~`	The hashtag value matches the [regular-expression pattern](https://docs.python.org/2/library/re.html#regular-expression-syntax) provided.
`!~`	The hashtag value does not match the regular-expression pattern provided.

Examples

Starting dataset:

Implementing organisation	Cluster or sector	District	Beneficiaries
#org	#sector	#adm1	#targeted_num
Org1	Health	Coast	10,000
Org1	Education	Coast	25,000
Org2	Health	Mountains	3,500

Example 1: restricting to a single #org

Generate a new dataset, including only the rows where the #org is "Org1":

hxlselect --query org=Org1 \
  <dataset-in >dataset-out

Result:

Implementing organisation	Cluster or sector	District	Beneficiaries
#org	#sector	#adm1	#targeted_num
Org1	Health	Coast	10,000
Org1	Education	Coast	25,000

Example 2: excluding a #sector

Generate a new dataset where the rows are only those that don't contain a #sector equal to "Education":

hxlselect --invert --query sector=Education \
  <dataset-in.csv >dataset-out.csv

Result:

Implementing organisation	Cluster or sector	District	Beneficiaries
#org	#sector	#adm1	#targeted_num
Org1	Health	Coast	10,000
Org2	Health	Mountains	3,500

Example 3: limiting by count

Include only the rows where #target_num is greater than or equal to 10,000:

hxlselect --query 'targeted_num>=10000' \
  <dataset-in.csv >dataset-out.csv

Implementing organisation	Cluster or sector	District	Beneficiaries
#org	#sector	#adm1	#targeted_num
Org1	Health	Coast	10,000
Org1	Education	Coast	25,000

Standard: http://hxlstandard.org | Mailing list: [email protected]

Home

For everyone
- Installation
- Command-line tools
For coders
Building blocks
HXL cookbook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly