Towards a general PT monitoring system

by multimob — written on 2023-12-12

Putting stop platforms, stop position nodes, tagging relations and adding them to the map is one thing. But route relations may break any time, or stops can be relocated or erased, sometimes incorrectly.

In this article, we try to explore all the avenues of what could go wrong on public transport data in OSM and what one can do to keep data up to date.

Short preamble: we started mapping public transport in early 2022 but it took a lot of time before significant changes could be implemented, mostly because we needed to have a process to filter source data from errors, or to ensure we'd have a robust tagging system. In particular all the fringe cases: how shall we deal with stops where the platform name belongs to the name, bilingual territories, stops with several operators, stops in neighbouring countries, and more. As of December 2023 only one part of Flanders is reasonably clean but many legacy route relations are still on the map even if those routes no longer exist. We expect to intensify the efforts in the next 4-6 months and get a much cleaner map by the summer.

Monitoring stop platforms

In theory, this is the simplest case because stop platforms are simple objects (nodes, ways or multipolygons) with a unique ID in OSM and a unique ID in our local database.

Checks should occur on the name, the location, and possibly various tags (network, operator, zone:*). Also maybe on the absence of some tags. This is particularly true for TEC: imports are made difficult because many TEC stops in OSM have several ref:TEC* keys whereas they should only have one. This is a problem with legacy data, so it should flag an error.

The easiest solution should be to harvest OSM data with the Overpass and compare each element of the dataset against local data. We do this already with DLimport and DLcompare.

Missing stops should be also identified, in both directions, i.e. stops that are missing in OSM but also stops that exist in OSM but do not exist in official data.

We feel it is still too early to automate this process, especially if we want to handle an entire operator (De Lijn or TEC) at once. This is because we still have a few thousands stops that fill in the "missing in one dataset" category. And location or tagging errors yield way too many flags. Better clean the network manually, even if it takes months, before launching automated checks, otherwise we will be overwhelmed with false positives.

Monitoring master relations

We can generate a full set of tags for those. One of the problems is that primary key for relations are impossible to obtain. STIB/MIVB and TEC have no need for one because they use a unique route number per network. De Lijn can have dozens of routes with the same number; they have a primary key which we tag as ref:De_Lijn but there is no automated system to obtain those, the only method is to query their website manually, find the desired route in a list of suggestions and extract the value from the URL.

We could set the following tests:

Check that all the expected tags are there
Loose validation on the operator tag, as De Lijn does operate one cross-border line with Connexion.
Only whitelist a few tags, such as wikidata or wikipedia and warn if there are extra tags
Check that the relation only contains route relations, with the same network, operator and ref tag
Check that the relation does not contains other elements, we sometimes find roads or stops inside master relations.

One more problem here: make sure that the master relation is complete. We see to easy way to check whether a route relation still exists but has been removed from its master relation. Checking for an orphan relation is easy, of course, but the problem is to determine with which master relation to match it (see above why it is not trivial).

Monitoring route relations

We can start with general Q/A checks about whether the relation is valid.

Check that all the expected tags are there
If there are only two relations in the master relations, via tags are not supposed to be there, issue a warning if you find some
Only whitelist a few tags, such as fixme, note, via, wikidata or wikipedia and warn if there are extra tags
Check that it belongs to a master relation

And now an analysis of the content, i.e. members of the relation, this is to honour the PTv2 rules.

Members with role=platform have public_transport=platform
Members with role=stop have public_transport=stop_position
There can be no two consecutive members with role=stop
Members with no role have highway=*
Once we have hit a member with no role, there may not be a member later in the list with a role

Finally, a deeper analysis.

All the members with role=platform have the operator name of the relation and the network tag correctly set
The name tag of the first stop of the list has a perfect match with the from tag of the relation
The name tag of the last stop of the list has a perfect match with the to tag of the relation
Test the relation between a stop position node and the platform immediately after it in the list.
- If they belong to a stop area relation and with the same roles it is okay
- Otherwise, calculate the distance between them. Issue a warning above 50 metres.
Every way has access permissions that are sufficient to allow buses. Remember that in Belgium buses are legally allowed in highway=pedestrian or highway=living_street. Two challenges here: do we have to make different rules for the parts outside Belgium, and what do we do with all the roads that people retagged as highway=construction
Determine whether ways are properly chained and if buses use them in the forward or backward direction.
Check for gaps
Check for oneway and oneway:bus tag, issue a warning if a bus uses a road in the wrong direction
Calculate the distance between the first node of the first way and the first stop platform. Issue a warning if it is too far.
Calculate the distance between the last node of the last way and the last stop platform. Issue a warning if it is too far.

But the hardest part is to match relations in the local database with relations on OSM. A route can have several dozens variants and we only map the most important ones. We have developped a system to identify the subset relations—which we can ignore—but that still makes many relations left. Some of them have the same from and to tags, and there is no universal system to determine how to use via in a non-ambiguous way. With an agnostic system there is probably no good system to say: "This relation in OSM should be this relation in our local database." We will probably end up with a list of candidates, all with the same tags, which we must test.

The idea will be to test the list of stops and see how it matches.

We might get a few false positives there, for instance relations that have the same stops but a different itinerary. An example is a route with a marked every Thursday: buses must go round the block because a street is closed, but no stop is lost. Therefore, there are two different route relations with the same list of stops. They will differ if you look at the shape of their ways, which is something we can evaluate. This might be possible once we develop a module to extract shapefiles (first task, easy) and after that a module to compare two shapes that have different base points (second task, hard).

There are multiple Q/A tests that will be hard to approach in a simple manner. For instance, what is there is an identical or almost identical route relation with the same route number, or a duplicate master relation?

Permalink: https://blog.multimob.be/zzgo3ab8th.htm

Back to the index