Towards a general PT monitoring system

by multimob — written on 2023-12-12


Putting stop platforms, stop position nodes, tagging relations and adding them to the map is one thing. But route relations may break any time, or stops can be relocated or erased, sometimes incorrectly.

In this article, we try to explore all the avenues of what could go wrong on public transport data in OSM and what one can do to keep data up to date.

Short preamble: we started mapping public transport in early 2022 but it took a lot of time before significant changes could be implemented, mostly because we needed to have a process to filter source data from errors, or to ensure we'd have a robust tagging system. In particular all the fringe cases: how shall we deal with stops where the platform name belongs to the name, bilingual territories, stops with several operators, stops in neighbouring countries, and more. As of December 2023 only one part of Flanders is reasonably clean but many legacy route relations are still on the map even if those routes no longer exist. We expect to intensify the efforts in the next 4-6 months and get a much cleaner map by the summer.

Monitoring stop platforms

In theory, this is the simplest case because stop platforms are simple objects (nodes, ways or multipolygons) with a unique ID in OSM and a unique ID in our local database.

Checks should occur on the name, the location, and possibly various tags (network, operator, zone:*). Also maybe on the absence of some tags. This is particularly true for TEC: imports are made difficult because many TEC stops in OSM have several ref:TEC* keys whereas they should only have one. This is a problem with legacy data, so it should flag an error.

The easiest solution should be to harvest OSM data with the Overpass and compare each element of the dataset against local data. We do this already with DLimport and DLcompare.

Missing stops should be also identified, in both directions, i.e. stops that are missing in OSM but also stops that exist in OSM but do not exist in official data.

We feel it is still too early to automate this process, especially if we want to handle an entire operator (De Lijn or TEC) at once. This is because we still have a few thousands stops that fill in the "missing in one dataset" category. And location or tagging errors yield way too many flags. Better clean the network manually, even if it takes months, before launching automated checks, otherwise we will be overwhelmed with false positives.

Monitoring master relations

We can generate a full set of tags for those. One of the problems is that primary key for relations are impossible to obtain. STIB/MIVB and TEC have no need for one because they use a unique route number per network. De Lijn can have dozens of routes with the same number; they have a primary key which we tag as ref:De_Lijn but there is no automated system to obtain those, the only method is to query their website manually, find the desired route in a list of suggestions and extract the value from the URL.

We could set the following tests:

One more problem here: make sure that the master relation is complete. We see to easy way to check whether a route relation still exists but has been removed from its master relation. Checking for an orphan relation is easy, of course, but the problem is to determine with which master relation to match it (see above why it is not trivial).

Monitoring route relations

We can start with general Q/A checks about whether the relation is valid.

And now an analysis of the content, i.e. members of the relation, this is to honour the PTv2 rules.

Finally, a deeper analysis.

But the hardest part is to match relations in the local database with relations on OSM. A route can have several dozens variants and we only map the most important ones. We have developped a system to identify the subset relations—which we can ignore—but that still makes many relations left. Some of them have the same from and to tags, and there is no universal system to determine how to use via in a non-ambiguous way. With an agnostic system there is probably no good system to say: "This relation in OSM should be this relation in our local database." We will probably end up with a list of candidates, all with the same tags, which we must test.

The idea will be to test the list of stops and see how it matches.

We might get a few false positives there, for instance relations that have the same stops but a different itinerary. An example is a route with a marked every Thursday: buses must go round the block because a street is closed, but no stop is lost. Therefore, there are two different route relations with the same list of stops. They will differ if you look at the shape of their ways, which is something we can evaluate. This might be possible once we develop a module to extract shapefiles (first task, easy) and after that a module to compare two shapes that have different base points (second task, hard).

There are multiple Q/A tests that will be hard to approach in a simple manner. For instance, what is there is an identical or almost identical route relation with the same route number, or a duplicate master relation?


Permalink: https://blog.multimob.be/zzgo3ab8th.htm

Back to the index

Screenshots with maps are © OpenStreetMap contributors