Differences between Belgian PT operators about data publishing

by multimob — written on 2024-09-06


It has been almost two years since we started digging into the structure of GTFS files provided by all 4 Belgian PT operators. We observe different practices in the way they publish data.

Time period

NMBS/SNCB creates schedules that are made to last for almost a year. It is a well-established process in the sector that train schedules—that are fairly complicated to elaborate because of international trains—are computed first and shared with regional operators so that they can build their own timetables to create adequate connexions.

TEC will often publish timetables well in advance, that include future route changes. STIB/MIVB and De Lijn seem to stick to a regular publication frequency.

Nevertheless, we should expect every operator to change timetables fairly abruptly, in case of disruptions.

Stops

Location

STIB/MIVB has the highest accuracy for stop location. However, they have been known for deliberately providing misleading stop locations (by a few metres) in order to make them look good on Google Maps.

TEC and De Lijn provide fairly accurate stop location, usually less than 10 metres from the real location, but rarely very accurate. Another article illustrates occasional fantasies with them.

Names

All the operators provide full names of stops.

NMBS/SNCB and STIB/MIVB operate on multilingual territories. The base name is exclusively in French, but translations in other languages is provided in a separate file.

In bilingual territories or in Flanders, TEC uses a compound name mixing the two languages; we rework it before uploading to OSM. De Lijn has some bilingual stop signs outside Flanders too but the data they publish does not provide those translations.

Most operators use the stop name in multiple interfaces, e.g. displaying the next stop inside the vehicle, printing the destination of the bus on the front, printing timetables, mobile apps… and some of those softwares have a fixed limit of the number of characters. For this reason, some names are truncated and are not exactly what is printed on stop signs. For instance, a TEC stop truncates as "Verlaine-sur-Ourthe Ch de la Pyramide" instead of writing the full name as "Verlaine-sur-Ourthe Chaussée de la Pyramide".

The stop name is sometimes written in a strange manner. TEC capitalises the town name, like this: "VERVIERS Hôtel de Ville". On a few occasions, unexperienced mappers started rewriting all the names on OSM to match that weird spelling convention, resulting in a screaming map. Worse, TEC ignores accented characters on capital letters, e.g. Liège becomes LIEGE on their data feed, or Belœil gets transcoded at BELOEIL, with one extra character. We solved this through the "dictionary" module in mmgtfs, i.e. a list of text strings used to restore the correct names of cities when importing data into OSM.

Both TEC and De Lijn often add an extra part to the stop name, indicating the platform number ("Dendermonde Station perron 4") or some details about the stop ("afstaphalte", "vervanghalte"). The dictionary module is also used to identify those suffixes and store them separately, in order to present a clean version of the name to end users, while ensuring that the full name is still preserved.

Finally, NMBS/SNCB adds a country prefix for stations outside Belgium: "Aachen Hbf (DE)", "Paris Nord (FR)".

Uniqueness of stops and active/inactive/temporary stops

By definition, a GTFS feed contains every stop on a separate line, each with a unique stop id.

The problem is: Do those codes match real data? And do we have a 1:1 match with what we can observe?

This is often the difficult part with public transport operators. Some local configurations are particularly complicated: two separate platforms could have one single stop id. On the contrary, TEC often simplifies stations: there are 3 bus platforms along Andenne Station, each with a different bus route, but those are only 2 stop id numbers in GTFS, and it does not match. Braine-l’Alleud Collège AB station has multiple platforms, each with its own dedicated route, though GTFS considers the entire bus station to be one single point. We found many cases like this.

De Lijn introduced Flex bus service in 2023. Flex bus stops are ordinary stops with a normal id number, and some normal stops are served by ordinary bus routes and Flex buses. This is why the best way is to treat "Flex" as a regular bus route, despite there is no itinerary. Also, Flex buses are bound to one small territory: you can call a Flex bus to connect two Flex stops of the same territory but you cannot reach a Flex stop on another territory. They provide no list of Flex stops, the only way is to query De Lijn’s website, the information is usually reliable to get a yes/no value but there is no good system to automate this.

Because of this limitation, there is no way to determine whether a stop that is found in GTFS is really active: some obviously are because they are found in timetables, but there are plenty of stops that aren’t. They may either be a Flex stop, or they are a future stop that is already there but not yet in use, or they might be stops that are temporarily not served but which are still physically there, or they might have been recently removed. It is not possible to determine which case applies for all those stops.

Any person who states that: "You just have to take GTFS data and import it to OSM, that is so easy" is someone who lacks information about what GTFS data really contains and how it sometimes misrepresents reality (and how directly importing GTFS into OSM would lead to a low-quality map).

Routes

The list of routes is complete. However, NMBS/SNCB does not provide the numbering of IC and L routes, only the destination. It also includes a long list of bus route relations for bus shuttles, each with the same reference number: "BUS", even though each one of them operates in a separate area.

When the network structure changes, STIB/MIVB piles up multiple iterations of the same route into the same block, route id numbers never change. De Lijn and TEC do the opposite and they spawn a new route for every change, even for minor ones. This brings a few issues, explained in another article.

Stops per route

The sequence of stops per route is usually very reliable.

We have reports of TEC adding extra stops on routes. This is mentioned on their website but not reflected in GTFS data. There is a high risk that whenever we regenerate a full list of stops for a bus relation in OSM we revert to a previous version without those stops. This requires inspecting the history of the route relation manually to see if there is a mention of something new. This slows down the import process.

Interesting example in Verviers: a bridge collapsed during the July 2021 floods, obviously making it impossible to serve the stop just on the other side of the river. This lasted for more than 2 years, yet TEC consistently provided normal timetables, as if every stop in the area was served normally, and without any information on their website.

We also have reports of both De Lijn and TEC removing stops… despite they still show up normally in GTFS. This can cause a potential data corruption problem in OSM, should we automatically restore bus stops in good faith, despite they no longer exist. Here is an example: a local user erased Heverlee Parkdallaan bus stop in July 2024 suggesting that the stop was definitely removed, but as of September 2024 the stop still appears totally unchanged in GTFS data. When querying De Lijn’s website, the stop is still there but they have a notice that it is temporarily not served until January 2025. Acquiring local knowledge about every stop takes time. How many cases like this could there be for the 78,000 stops operated by De Lijn and TEC?

Temporary changes

In general, GTFS data does not discriminate between the normal itinerary or a temporary change. Except for clear cases, such as extra bus routes that are prominently labelled as being a shuttle route, the only way is to inspect the official network map if it exists and compare stops manually.

Shapes and service variants

STIB/MIVB provides an incredibly huge number of variants of the same route. They have a dedicated team of employees whose primary task is to monitor daily changes every day, such as small rerouting actions, and encode them on their own map. This information is made available in the GTFS feed, as a separate variant of the line. Over a period of 15 days, there can be a fairly large number of such variants, because they combine short-term changes on various parts of the same line, each for a different period of time, but many routes have extra variants for shortened trips that leave the route before going back to the depot: sometimes the bus turns around the corner and uses a different stop to drop passengers; this results in having an extra variant for this specific trip. So, if you take an easy line, such as Bus 71, which has all its trips from one end to the other and has the bus depot on one end: instead of the 2 variants you would expect, you get 10.

NMBS/SNCB only provides the location of stations but no shapefiles. Fortunately, railway tracks rarely change and have been mapped for a long time. Yet, this can sometimes be challenging because there might be several ways to approach large stations and without trusted information from the operator, we must guess which switch was used.

TEC and De Lijn provide shapefiles that are 98% correct. On a few occasions, those shapefiles show impossible itineraries, such as a bus crossing a canal where there is no bridge, a bus crossing the woods. The most common mistake is buses entering a small road, reversing immediately somewhere in the middle and starting in the opposite direction despite aerial imagery suggests it is absolutely impossible to do that. Thoses cases usually consume some time to collect information about the expected itinerary. And every discrepancy between GTFS data and OSM data gets in the way of validation scripts, which will endlessly warn us about a possible mistake in OSM data because it does not match what GTFS believes to be true. The more discrepancies, the more time it will take to review the network.

Timetables

In general, timetable information correctly reflects the real level of service. Tags about whether people are permitted to board or alight is sometimes incorrect, especially for the railways (first and last stops sometimes bring surprises). TEC sometimes duplicates stops in timetables, next to each other, when they wait for interchange or simply spend some idle time at a stop. This is incorrect use of GTFS data, it probably comes from a limitation of their own software, which they use to generate GTFS data; so far we correct it manually when importing a route because the number of occurrences is low. A fix for this can probably be automated in the future.

In a previous article, we identified ghost trips in Charleroi, namely trips that only exist in official data. This is a rare case. We are not aware of mirror cases, i.e. trips that lack in GTFS data but actually exist every day, though it probably exists.


Permalink: https://blog.multimob.be/zzmie3nuir.htm

Back to the index

Screenshots with maps are © OpenStreetMap contributors