Skip to content
David Megginson edited this page Oct 11, 2016 · 3 revisions

TODO: update to include Python and the HXL Proxy as well.

The hxlcount command-line tool creates a new HXL dataset that is an aggregation of a larger dataset: it counts how often certain combinations of values appear for the HXL tags you specify. This command is especially useful for three purposes:

  1. To generate reports (e.g. how many activities are taking place in each district for each sector).
  2. To anonymise data for privacy or security (e.g. rolling up numbers to a higher administrative level).
  3. To perform quality control by checking the different values for a tag (e.g. "WASH" vs "Wash" vs "Water and Sanitation" vs "Water & Sanitation").

For use within a Python application, see hxl.filters.count (module).

Usage

usage: hxlcount [-h] [-t tag,tag...] [infile] [outfile]

Generate aggregate counts for a HXL dataset

positional arguments:
  infile                HXL file to read (if omitted, use standard input).
  outfile               HXL file to write (if omitted, use standard output).

optional arguments:
  -h, --help            show this help message and exit
  -t tag,tag..., --tags tag,tag...
                        Comma-separated list of column tags to include in
                        aggregated output

Examples

These examples all use an actual 3W (aid activity) dataset set from OCHA ROWCA.

Example 1: count a single field

List the organisations in a dataset along with the number of times each occurs:

hxlcount -t org MyData.csv

Result:

#org #x_total_num
Agency for Technical Cooperation and Development 12
Agronomes et Vétérinaires Sans Frontières 5
Handicap International 8
International Organization for Migration 35
International Rescue Committee 43
OXFAM 3
United Nations Children's Fund 5
United Nations Entity for Gender Equality and the Empowerment of Women 4
United Nations High Commissioner for Refugees 13
United Nations Population Fund 4
World Food Programme 112

Example 2: filter and then count

This is a more-complex example, that shows chaining the hxlselect command together with hxlcount in a pipeline, to count only part of a dataset (in this case, the number of activities for each organisation working in Tombouctou, Mali).

hxlselect -q adm1=Tombouctou MyData.cxv | hxlcount -t org

Result:

#org #x_total_num
Agronomes et Vétérinaires Sans Frontières 2
Handicap International 8
International Organization for Migration 3
United Nations Children's Fund 1
United Nations High Commissioner for Refugees 3

Example 3: counting combinations of columns

This example counts the unique organisation/sector combinations for activities in Mali (note that an organisation appears twice if it's carrying out activities in two sectors):

hxlselect -q country=Mali MyData.csv | hxlcount -t org,sector

Result:

#org #sector #x_total_num
Agency for Technical Cooperation and Development Water Sanitation & Hygiene 12
Agronomes et Vétérinaires Sans Frontières Water Sanitation & Hygiene 5
Handicap International Water Sanitation & Hygiene 8
International Organization for Migration Protection 35
International Rescue Committee Protection 36
International Rescue Committee Water Sanitation & Hygiene 7
OXFAM Water Sanitation & Hygiene 3
United Nations Children's Fund Education 3
United Nations Children's Fund Water Sanitation & Hygiene 2
United Nations Entity for Gender Equality and the Empowerment of Women Protection 4
United Nations High Commissioner for Refugees Protection 13
United Nations Population Fund Protection 4