Skip to content

Append datasets filter

David Megginson edited this page Sep 1, 2015 · 4 revisions

The Append datasets HXL filter extends a dataset with additional rows from one or more other datasets, so that two 500-row datasets will create a new 1,000-row dataset (not to be confused with the Merge columns filter, which combines datasets side-by-side, e.g. by adding new columns). A typical use case is combining 3W (activity) data from different clusters into a single, merged report.

  • The order of columns in the result will always be the same as the columns in the first dataset (even if it is empty).
  • Any additional columns that appear in subsequent datasets may be appended or ignored, depending on your requirements.

The Deduplicate rows filter can be useful after this one, if some data might appear in more than one of the source datasets.

Options

You may specify whether to add or ignore extra columns in datasets after the first one. Note that matching of tags and attributes is exact: #adm1+name will not be treated as identical to #adm1.

Example

First dataset:

#org #adm1 #status
UNICEF Coast IN PROGRESS
Red Cross Coast IN PROGRESS
WHO Mountains IN PROGRESS

Second dataset:

#adm1 #status #sector #org
Coast PLANNED Health MSF
Plains COMPLETED CCCM IOM
Mountains IN PROGRESS GBV WAHO

Combined including extra columns from second dataset:

#org #adm1 #status #sector
UNICEF Coast IN PROGRESS
Red Cross Coast IN PROGRESS
WHO Mountains IN PROGRESS
MSF Coast PLANNED Health
CCCM Plains COMPLETED IOM
WAHO Mountains IN PROGRESS GBV

Combined excluding extra columns from second dataset:

#org #adm1 #status
UNICEF Coast IN PROGRESS
Red Cross Coast IN PROGRESS
WHO Mountains IN PROGRESS
MSF Coast PLANNED
IOM Plains COMPLETED
WAHO Mountains IN PROGRESS

Usage

Command line

On the command line, use the hxlappend program (hxlappend -h for help):

hxlappend -x -a file2.csv -a file3.csv file1.csv

Python

In a Python script, use the append method:

source = hxl.data(url).append(url2).append(url3)

HXL Proxy

From the HXL Proxy, choose the "Append datasets" filter type, then add the URLs of up to 10 HXL datasets to append to the current one. Use the checkbox to exclude columns that don't appear in the current dataset.

"Append datasets" filter from HXL Proxy.