Collect stops systematically from RIS::Stations #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As always, this turned out to be a bit less straightforward than intended...
Idea: Take stop information from RIS::Stations, because this API allows to download all stops with only a few hundred queries and contains basically the same information as HAFAS.
To build, you need to set env vars
DB_API_KEY
andDB_CLIENT_ID
, which can be obtained via the above link for free (10k requests per month). Pre-built data files for convenience are here:This results in an up-to-date dataset with 290708 stops and in theory no missing stops in Germany – conditions apply:
In general, RIS::Stations also has other quality issues (like misassigned station groups, see e.g. Elisenstraße, München). Difficult to say whether they are worse than in HAFAS.
Missing fields:
Additional fields, mainly from Stada which is additionally integrated for train stations:
The weight is now calculated based on the products, the number of children (for the main station of a group) and the priceCategory for German train stations – I think the price category is a very good indicator for the importance of a station, so much so that I included it as an exponential factor – assuming that the importance is approx. inversely proportional to the number of stations of the same importance. I tested the weighting a bit with db-hafas-stations-autocomplete and was satisfied, but feel free to play around :)
Of course, instead of merging this into this repo, we could also create a separate
db-ris-stations
, but I think this would be more confusing than helping. The only real drawback are the missing international stations which are also incomplete in the existing collection. In both cases we could set up a GitHub workflow cronjob to automatically update and push the dataset/package e.g. once a month.