osm-extract-tags is a tool for extracting tag metadata from an OpenStreetMap (OSM) data export with the entire history of the OSM. This repo (osm-tag-aggregator) contains a tool for computing various aggregates from this data. The below aggregates are currently implemented:
For each tag state, compute the total number of map elements that are currently in that state. Only include tag states with >= 5 map elements.
Example output:
{"key":{"state":"DEL","tags":[]},"n":543}
{"key":{"state":"VIS","tags":[]},"n":266}
{"key":{"state":"VIS","tags":["2:pizzeria"]},"n":128}
{"key":{"state":"VIS","tags":["2:bench"]},"n":125}
...
For each tag state, compute the total number of map elements that at some point have been in that state.
Example output:
{"key":{"state":"DEL","tags":[]},"n":2060}
{"key":{"state":"VIS","tags":[]},"n":939}
{"key":{"state":"VIS","tags":["2:water"]},"n":455}
{"key":{"state":"VIS","tags":["1:wall"]},"n":430}
...
For each transition between tag states, compute the total number of times a map element has undergone that transition. If the same map element has transitioned multiple times all transitions are counted. Skip transitions that have ocurred <5 times.
Example output:
...
For each day and tag state, compute a delta (a positive or negative number, but not 0) giving how the number of map elements with that tag state has changed during that day. By integrating the delta:s over time one can get the number of map elements in a tag state for any given day (during OSM's recorded history).
Example output:
...
- A map element means either a node, way or a relation in the OSM data.
- element states
- Visible with tags
["0:tag_value1", "a:tag_value2"]
. The codes0
anda
refer to the list of tags selected or extraction by osm-extract-tags. - Deleted
- Visible with tags
The implementation is streaming and does not need to load the entire input file into memory.
This repo contains a git submodule with test data. To fetch this there are two options when cloning the repo:
# option 1
git clone https://github.com/tagdynamics-org/osm-tag-aggregator.git
git submodule update --init
# option 2
git clone --recurse-submodules https://github.com/tagdynamics-org/osm-tag-aggregator.git
Unit tests can be run from command line as follows (provided gradle is first installed):
gradle wrapper
./gradlew test
Import as a gradle project. During importing, the IDE may ask for the "gradle home directory". See here for instructions on how to determine this.
bash launch.sh aggregator /path/to/input.jsonl /path/to/output.jsonl
aggregator
is one ofLIVE_REVCOUNTS
, ...,PER_DAY_DELTA_COUNTS
. See above.- input file is output from the osm-extract-tags tool.
export DEBIAN_FRONTEND=noninteractive
sudo apt-get -y update
sudo apt-get -y upgrade
sudo apt-get -y install git zip mg tmux gradle
# install gradle wrapper & install dependencies
cd osm-tag-aggregator && gradle wrapper test && cd ..
# install docker: https://docs.docker.com/install/linux/docker-ce/ubuntu/
# https://hub.docker.com/r/hseeberger/scala-sbt/
sudo docker pull hseeberger/scala-sbt:8u171_2.12.6_1.1.6
sudo docker run -v `pwd`/data/:/data -v `pwd`/osm-tag-aggregator:/code -it --rm hseeberger/scala-sbt:8u171_2.12.6_1.1.6
cd /code
gradle wrapper
./gradlew test
mkdir /data/aggregates
bash launch.sh LIVE_REVCOUNTS /data/tag-metadata/tag-history.jsonl /data/aggregates/live-revcounts.jsonl
bash launch.sh TOTAL_REVCOUNTS /data/tag-metadata/tag-history.jsonl /data/aggregates/total-revcounts.jsonl
bash launch.sh PER_DAY_DELTA_COUNTS /data/tag-metadata/tag-history.jsonl /data/aggregates/per-day-deltacounts.jsonl
bash launch.sh TRANSITION_COUNTS /data/tag-metadata/tag-history.jsonl /data/aggregates/transition-counts.jsonl
- OSM input data ~5000M map elements + their version histories
- Exported revision history JSONL file: 552M lines (5528 batches @100k)
(m5.xlarge; 16G memory; 4VCPU)
LIVE_REVCOUNTS 25m
TOTAL_REVCOUNTS 28m
TRANSITION_COUNTS 27m
PER_DAY_DELTA_COUNTS 35m
(t2.medium; 4G memory; 2VCPU)
LIVE_REVCOUNTS 1h33m
TOTAL_REVCOUNTS 1h30m
TRANSITION_COUNTS ??
PER_DAY_DELTA_COUNTS 3h5m
Ideas, questions and/or contributions are welcome.
Copyright 2018 Matias Dahl. Released under the MIT license.
Please note that osm-tag-extract
and osm-tag-aggregator
are designed to process OpenStreetMap data. This data is available under the Open Database License. See also the OSMF wiki regarding OpenStreetMap
data and the GDPR.