Comparing stop locations against GTFS

by multimob — written on 2024-05-01


In a perfect world, GTFS data provides the real location of stops. One of your first task when mapping public transport in OSM is to ensure that all stops have the correct location.

De Lijn network includes about 45,950 stops, each with their unique stop id. This will require some level of automatisation.

We will never stress this enough: do not take GTFS data for granted. Operators make mistakes every now and then. Especially for large networks. There is a complicated chain of command. In Flanders, municipalities are in charge of building bus bays and platforms, their public works departments will care to plan for this and bus stops can be relocated at random dates. It may take some time before a supervisor will notice, and that the new location is recorded in planning database or in the internal mapping system of the operator. There are several documented cases of stops relocations which were overlooked for years because no-one really noticed or simply had forgot to update all the systems… or cases where stop features from an old file erased new data. Let us say it otherwise: always try to find a second source, e.g. aerial imagery.

The DLimport/DLcompare scripts can do that. DLimport reads a plain text file from JOSM and selects relevant stops to feed a local database. DLcompare will compare this local database with the local copy of GTFS files and try to match the stops based on their stop id.

The kind of output looks like this.

Sint-Margriete Groeneweg (202234) (node 2251292884)
[  OK  ] name
[ WARN ] distance = 34.6 ↙ DB has 51.2466 3.53828
[  OK  ] route_ref:De_Lijn = 62

In this article we will only focus on the output line about the distance.

This is a simple comparison between the lat/lon data obtained from OSM and the lat/lon found in the GTFS data. A simple calculation gives the distance in meters. The script can be customised to say "OK" for good matches (typically less than 25 metres), "WARN" for possible discrepancies (more than 25 but less than 100), and "FAILED" for large discrepancies (more than 100 metres). Values can be customized in the script.

The little arrow has proven to be really useful. It shows the general direction where you should move the stop to get closer to where GTFS thinks it is. The lat/lon coordinates can also be copied directly to the "Move Node" window in JOSM. Always verify and adjust manually afterwards.

False negatives

The most common false negative will be in the following case: there are two bus stops facing each other, on each side of the road… but the ref codes are inverted. The stops look normal because they have the same name. Supposing that GTFS has a perfectly accurate location, both stops are reported with a small discrepancy, about 7 to 10 metres at most. This is something that will easily be overlooked. In the current situation, we will focus on the largest discrepancies, over 100 metres.

This has nasty consequences because once in a while we find relations that use a stop on the wrong side of the road. Those relations were automatically imported by our DLgen script, and of course they use the correct ref for all stops in the sequence.

So far, we haven't found a reliable process for this. As usual, some tech enthusiasts might want to create algorithms that calculate the orientation of the road and derive the expected location for each pair of stops. Not sure that it will work often, given the large number of wrong locations in source data.


Permalink: https://blog.multimob.be/zz4ys8fo1e.htm

Back to the index

Screenshots with maps are © OpenStreetMap contributors