Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect stops systematically from RIS::Stations #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

traines-source
Copy link

@traines-source traines-source commented Dec 17, 2024

As always, this turned out to be a bit less straightforward than intended...

Idea: Take stop information from RIS::Stations, because this API allows to download all stops with only a few hundred queries and contains basically the same information as HAFAS.

To build, you need to set env vars DB_API_KEY and DB_CLIENT_ID, which can be obtained via the above link for free (10k requests per month). Pre-built data files for convenience are here:

This results in an up-to-date dataset with 290708 stops and in theory no missing stops in Germany – conditions apply:

  • some stops outside of Germany, particularly bus stops, are not contained in RIS::Stations – i.e. they are missing, e.g. also Malaga from the issue above (in total, 46510 stops are missing compared to my last HAFAS collection)
  • disused stops are not contained anymore
  • META stations are not contained
  • Groß Gerau station (8000136) is completely missing from RIS::Stations for some reason – will report that to DB (and surely also some bus stations)

In general, RIS::Stations also has other quality issues (like misassigned station groups, see e.g. Elisenstraße, München). Difficult to say whether they are worse than in HAFAS.

Missing fields:

  • lines

Additional fields, mainly from Stada which is additionally integrated for train stations:

  • stadaId, ifoptId, ris100Ids, facilities, reisezentrumOpeningHours, priceCategory

The weight is now calculated based on the products, the number of children (for the main station of a group) and the priceCategory for German train stations – I think the price category is a very good indicator for the importance of a station, so much so that I included it as an exponential factor – assuming that the importance is approx. inversely proportional to the number of stations of the same importance. I tested the weighting a bit with db-hafas-stations-autocomplete and was satisfied, but feel free to play around :)

Of course, instead of merging this into this repo, we could also create a separate db-ris-stations, but I think this would be more confusing than helping. The only real drawback are the missing international stations which are also incomplete in the existing collection. In both cases we could set up a GitHub workflow cronjob to automatically update and push the dataset/package e.g. once a month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant