Cut columns filter

TODO: update to include Python and the HXL Proxy as well.

The hxlcut command-line tool creates a new copy of a HXL dataset with some of the columns removed. You can use this tool in an batch script to remove columns with personally-identifiable information (such as #email) before each public release of a dataset, for example.

There are two ways to use this command:

Provide a whitelist of HXL hashtags for columns to include — only the listed columns will appear in the output.
Provide a blacklist of HXL hashtags for columns to exclude — everything except the listed columns will appear in the output.

If security is a major concern, the whitelist approach ensures that any new columns you add to your source dataset won't accidentally leak out, because you have to add them to the whitelist explicitly. If robustness is a major concern, the blacklist approach ensures that any new columns you add to your source dataset won't be omitted from the output.

Note that if there are multiple columns with the same HXL hashtag, this command operates on all of them.

Usage

usage: hxlcut [-h] [-c tag,tag...] [-C tag,tag...] [infile] [outfile]

Cut columns from a HXL dataset.

positional arguments:
  infile                HXL file to read (if omitted, use standard input).
  outfile               HXL file to write (if omitted, use standard output).

optional arguments:
  -h, --help            show this help message and exit
  -i tag,tag..., --include tag,tag...
                        Comma-separated list of column tags to include
  -x tag,tag..., --exclude tag,tag...
                        Comma-separated list of column tags to exclude

Examples

Starting dataset:

Implementing organisation	Contact	Cluster or sector	District
#org	#email	#sector	#adm1
Org1	[email protected]	Health	Coast
Org1	[email protected]	Education	Coast
Org2	[email protected]	Health	Mountains

Example 1: using a whitelist

You want to produce a dataset containing only #sector and #adm1, no matter what additional columns appear:

hxlcut --include sector,adm1 MyData.csv

Result:

#sector	#adm1
Health	Coast
Education	Coast
Health	Mountains

Example 2: using a blacklist

You want to remove the #email column for privacy reasons, but retain any other columns in the source dataset.

hxlcut --exclude email MyData.csv

Result:

#org	#sector	#adm1
Org1	Health	Coast
Org1	Education	Coast
Org2	Health	Mountains

Standard: http://hxlstandard.org | Mailing list: [email protected]

Home

For everyone
- Installation
- Command-line tools
For coders
Building blocks
HXL cookbook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cut columns filter

Usage

Examples

Example 1: using a whitelist

Example 2: using a blacklist

Clone this wiki locally