-
Notifications
You must be signed in to change notification settings - Fork 11
Count rows filter
TODO: update to include Python and the HXL Proxy as well.
The hxlcount command-line tool creates a new HXL dataset that is an aggregation of a larger dataset: it counts how often certain combinations of values appear for the HXL tags you specify. This command is especially useful for three purposes:
- To generate reports (e.g. how many activities are taking place in each district for each sector).
- To anonymise data for privacy or security (e.g. rolling up numbers to a higher administrative level).
- To perform quality control by checking the different values for a tag (e.g. "WASH" vs "Wash" vs "Water and Sanitation" vs "Water & Sanitation").
For use within a Python application, see hxl.filters.count (module).
usage: hxlcount [-h] [-t tag,tag...] [infile] [outfile]
Generate aggregate counts for a HXL dataset
positional arguments:
infile HXL file to read (if omitted, use standard input).
outfile HXL file to write (if omitted, use standard output).
optional arguments:
-h, --help show this help message and exit
-t tag,tag..., --tags tag,tag...
Comma-separated list of column tags to include in
aggregated output
These examples all use an actual 3W (aid activity) dataset set from OCHA ROWCA.
List the organisations in a dataset along with the number of times each occurs:
hxlcount -t org MyData.csv
Result:
#org | #x_total_num |
---|---|
Agency for Technical Cooperation and Development | 12 |
Agronomes et Vétérinaires Sans Frontières | 5 |
Handicap International | 8 |
International Organization for Migration | 35 |
International Rescue Committee | 43 |
OXFAM | 3 |
United Nations Children's Fund | 5 |
United Nations Entity for Gender Equality and the Empowerment of Women | 4 |
United Nations High Commissioner for Refugees | 13 |
United Nations Population Fund | 4 |
World Food Programme | 112 |
This is a more-complex example, that shows chaining the hxlselect command together with hxlcount in a pipeline, to count only part of a dataset (in this case, the number of activities for each organisation working in Tombouctou, Mali).
hxlselect -q adm1=Tombouctou MyData.cxv | hxlcount -t org
Result:
#org | #x_total_num |
---|---|
Agronomes et Vétérinaires Sans Frontières | 2 |
Handicap International | 8 |
International Organization for Migration | 3 |
United Nations Children's Fund | 1 |
United Nations High Commissioner for Refugees | 3 |
This example counts the unique organisation/sector combinations for activities in Mali (note that an organisation appears twice if it's carrying out activities in two sectors):
hxlselect -q country=Mali MyData.csv | hxlcount -t org,sector
Result:
#org | #sector | #x_total_num |
---|---|---|
Agency for Technical Cooperation and Development | Water Sanitation & Hygiene | 12 |
Agronomes et Vétérinaires Sans Frontières | Water Sanitation & Hygiene | 5 |
Handicap International | Water Sanitation & Hygiene | 8 |
International Organization for Migration | Protection | 35 |
International Rescue Committee | Protection | 36 |
International Rescue Committee | Water Sanitation & Hygiene | 7 |
OXFAM | Water Sanitation & Hygiene | 3 |
United Nations Children's Fund | Education | 3 |
United Nations Children's Fund | Water Sanitation & Hygiene | 2 |
United Nations Entity for Gender Equality and the Empowerment of Women | Protection | 4 |
United Nations High Commissioner for Refugees | Protection | 13 |
United Nations Population Fund | Protection | 4 |
Standard: http://hxlstandard.org | Mailing list: [email protected]