-
Notifications
You must be signed in to change notification settings - Fork 11
Append datasets filter
The Append datasets HXL filter extends a dataset with additional rows from one or more other datasets, so that two 500-row datasets will create a new 1,000-row dataset (not to be confused with the Merge columns filter, which combines datasets side-by-side, e.g. by adding new columns). A typical use case is combining 3W (activity) data from different clusters into a single, merged report.
- The order of columns in the result will always be the same as the columns in the first dataset (even if it is empty).
- Any additional columns that appear in subsequent datasets may be appended or ignored, depending on your requirements.
The Deduplicate rows filter can be useful after this one, if some data might appear in more than one of the source datasets.
You may specify whether to add or ignore extra columns in datasets after the first one. Note that matching of tags and attributes is exact: #adm1+name
will not be treated as identical to #adm1
.
First dataset:
#org | #adm1 | #status |
---|---|---|
UNICEF | Coast | IN PROGRESS |
Red Cross | Coast | IN PROGRESS |
WHO | Mountains | IN PROGRESS |
Second dataset:
#adm1 | #status | #sector | #org |
---|---|---|---|
Coast | PLANNED | Health | MSF |
Plains | COMPLETED | CCCM | IOM |
Mountains | IN PROGRESS | GBV | WAHO |
Combined including extra columns from second dataset:
#org | #adm1 | #status | #sector |
---|---|---|---|
UNICEF | Coast | IN PROGRESS | |
Red Cross | Coast | IN PROGRESS | |
WHO | Mountains | IN PROGRESS | |
MSF | Coast | PLANNED | Health |
CCCM | Plains | COMPLETED | IOM |
WAHO | Mountains | IN PROGRESS | GBV |
Combined excluding extra columns from second dataset:
#org | #adm1 | #status |
---|---|---|
UNICEF | Coast | IN PROGRESS |
Red Cross | Coast | IN PROGRESS |
WHO | Mountains | IN PROGRESS |
MSF | Coast | PLANNED |
IOM | Plains | COMPLETED |
WAHO | Mountains | IN PROGRESS |
On the command line, use the hxlappend program (hxlappend -h
for help):
hxlappend -x -a file2.csv -a file3.csv file1.csv
In a Python script, use the append method:
source = hxl.data(url).append(url2).append(url3)
From the HXL Proxy, choose the "Append datasets" filter type, then add the URLs of up to 10 HXL datasets to append to the current one. Use the checkbox to exclude columns that don't appear in the current dataset.
Standard: http://hxlstandard.org | Mailing list: [email protected]