Skip to content

Latest commit

 

History

History
120 lines (93 loc) · 6.3 KB

File metadata and controls

120 lines (93 loc) · 6.3 KB

NeTEx

NeTEx is a European standard for exchanging Transit data. OTP can import NeTEx into its internal model. The XML parser support the entire NeTEx specification and is not limited to a specific profile, but not every part of it is mapped into OTP. Only a small subset of the entities are supported. When loading NeTEx data OTP should print warnings for all NeTEx data types not loaded.

OTP is tested with data from Entur which uses the Nordic NeTEx profile . If you find that some part of your import is not imported/supported by OTP you will need to add support for it in this model. NeTEx is huge, and ONLY data relevant for travel planning should be imported.

OTP assume the data is valid and as a main rule the data is not washed or improved inside OTP. Poor data quality should be fixed BEFORE loading the data into OTP. OTP will try to ignores invalid data, allowing the rest to be imported.

Design Goals

  • Import Transit data from NeTEx xml-files
  • Handle large input file sets (10 GB)
  • Allow some data to be shared and group other data together is an isolated scope
  • Support for reading data fast, multi-threaded (the design support this, but not implemented jet)
  • Warn or report issues on poor data, but keep building a graph so one "bad" line do not block the entire import.
  • The import should put any restrictions on the order of XML types in the files. If ServiceJourney comes before Authority in the xml file - that should be ok. The file-hierarchy is an optional way to group and scope data.

Design

The 2 main classes are the NetexModule and the NetexBundle. The NetexModule is a GraphBuilderModule and responsible for building all bundles, while a bundle is responsible for importing a Netex bundle, normally a zip-file with a Netex data set. You may start OTP with as many bundles as you like, and you may mix GTFS and NeTEx bundles in the same build.

Design overview

The Netex files are xml-files and one data set can be more than 5 GB in size. There is no fixed relationship between file names and content like it is in GTFS, where for example stops.txt contains all stops. Instead, OTP import Netex data based one a file hierarchy.

Netex File Bundle

As seen above the netex-file-bundle is organized in a hierarchy. This is done to support loading large data set, and to avoid keeping XML DOM entities in memory. Also, the hierarchy prevent references from different files at the same level to reference each other. The hierarchy allow OTP to go through the steps of parsing xml data into Netex POJOs, validating the relationships and mapping these POJOs into OTPs internal data model for each set/group of files.

The general rule is that entities referencing other entities, should be in the same file or placed at a lover level in the hierarchy, so the referenced object already exist when mapping an entity. There are exception to this. For example trip-to-trip interchanges.

The shared data si available during the entire mapping process. Then group data is kept in memory for the duration of parsing and mapping each group. Data in one group is not visible to another group.

Within each group there is also shared-group-data and group-files (leaf-files).

  • Entities in group-files can reference other entities in the same file and entities in the shared-group-files and in the global shared-files, but not entities in other group-files.
  • Entities in shared-group-files can reference other entities in the same file and entities in the same group of shared-group-files and in the global shared-files, but not entities in any group-files.
  • Entities in global shared-files can reference other entities in the same file and entities in other global shared-files.

✅ Note! You can configure how your data files are grouped into the 3 levels above using regular expressions in the build-config.json.

Load entities, validate and map into the OTP model

For each level in the hierarchy and each group of files OTP perform the same steps:

  1. Load XML entities (NeTEx XML DOM POJOs). See NetexDataSourceHierarchy
  2. Parse xml file and insert XML POJOs into the index. See NetexXmlParser
  3. Validate relationships. See Validator
  4. Map XML entities to OPT internal model. See NetexMapper

OTP load entities into a hierarchical NetexEntityDataIndex before validating and mapping each entity. Entities may appear in any order in the xml-files. So, doing the validation in a separate step ensure all entities is available when doing the validation. If an entity or a required relation is missing the validator should remove the invalid entity. This make the mapping easier, because the mapper can assume all required data and entities exist.

Collaboration diagram

Here is an outline of the process including the file-hierarchy traversal and the steps at each level:

  1. Load shared-data-files into index.
  2. Validate loaded entities
  3. Map shared-data-entries
  4. For each group:
    1. Load group-shared-files into index
    2. Validate loaded entities
    3. Map group-shared-entries
    4. For each leaf group-file file:
      1. Load group-file into index
      2. Validate loaded entities
      3. Map group-entries
      4. Clear leaf data from index
    5. Remove group data from index

The NetexBundele repeat the exact same steps for each group/set of files. To emulate navigation in the hierarchy both the NetexEntityDataIndex and the NetexMapper persist data in a "Stack" like structure. The NetexBundle call the push() and pop() on the index and the mapper to enter and exit each file set at a given level. Entities loaded at a given level is in the local scope, while entities loaded at a higher level is in the global scope. The index has methods to access both local and global scoped entities, but it is only possible to add entities at the local scope.

Package dependencies

Package dependencies