Towards a general PT monitoring system
by multimob — written on 2023-12-12
Putting stop platforms, stop position nodes, tagging relations and adding them to the map is one thing. But route relations may break any time, or stops can be relocated or erased, sometimes incorrectly.
In this article, we try to explore all the avenues of what could go wrong on public transport data in OSM and what one can do to keep data up to date.
Short preamble: we started mapping public transport in early 2022 but it took a lot of time before significant changes could be implemented, mostly because we needed to have a process to filter source data from errors, or to ensure we'd have a robust tagging system. In particular all the fringe cases: how shall we deal with stops where the platform name belongs to the name, bilingual territories, stops with several operators, stops in neighbouring countries, and more. As of December 2023 only one part of Flanders is reasonably clean but many legacy route relations are still on the map even if those routes no longer exist. We expect to intensify the efforts in the next 4-6 months and get a much cleaner map by the summer.
Monitoring stop platforms
In theory, this is the simplest case because stop platforms are simple objects (nodes, ways or multipolygons) with a unique ID in OSM and a unique ID in our local database.
Checks should occur on the name, the location, and possibly various tags (network
, operator
, zone:*
). Also maybe on the absence of some tags. This is particularly true for TEC: imports are made difficult because many TEC stops in OSM have several ref:TEC*
keys whereas they should only have one. This is a problem with legacy data, so it should flag an error.
The easiest solution should be to harvest OSM data with the Overpass and compare each element of the dataset against local data. We do this already with DLimport and DLcompare.
Missing stops should be also identified, in both directions, i.e. stops that are missing in OSM but also stops that exist in OSM but do not exist in official data.
We feel it is still too early to automate this process, especially if we want to handle an entire operator (De Lijn or TEC) at once. This is because we still have a few thousands stops that fill in the "missing in one dataset" category. And location or tagging errors yield way too many flags. Better clean the network manually, even if it takes months, before launching automated checks, otherwise we will be overwhelmed with false positives.
Monitoring master relations
We can generate a full set of tags for those. One of the problems is that primary key for relations are impossible to obtain. STIB/MIVB and TEC have no need for one because they use a unique route number per network. De Lijn can have dozens of routes with the same number; they have a primary key which we tag as ref:De_Lijn
but there is no automated system to obtain those, the only method is to query their website manually, find the desired route in a list of suggestions and extract the value from the URL.
We could set the following tests:
- Check that all the expected tags are there
- Loose validation on the operator tag, as De Lijn does operate one cross-border line with Connexion.
- Only whitelist a few tags, such as
wikidata
orwikipedia
and warn if there are extra tags - Check that the relation only contains route relations, with the same
network
,operator
andref
tag - Check that the relation does not contains other elements, we sometimes find roads or stops inside master relations.
One more problem here: make sure that the master relation is complete. We see to easy way to check whether a route relation still exists but has been removed from its master relation. Checking for an orphan relation is easy, of course, but the problem is to determine with which master relation to match it (see above why it is not trivial).
Monitoring route relations
We can start with general Q/A checks about whether the relation is valid.
- Check that all the expected tags are there
- If there are only two relations in the master relations,
via
tags are not supposed to be there, issue a warning if you find some - Only whitelist a few tags, such as
fixme
,note
,via
,wikidata
orwikipedia
and warn if there are extra tags - Check that it belongs to a master relation
And now an analysis of the content, i.e. members of the relation, this is to honour the PTv2 rules.
- Members with
role=platform
havepublic_transport=platform
- Members with
role=stop
havepublic_transport=stop_position
- There can be no two consecutive members with
role=stop
- Members with no role have
highway=*
- Once we have hit a member with no role, there may not be a member later in the list with a role
Finally, a deeper analysis.
- All the members with
role=platform
have the operator name of the relation and thenetwork
tag correctly set - The
name
tag of the first stop of the list has a perfect match with thefrom
tag of the relation - The
name
tag of the last stop of the list has a perfect match with theto
tag of the relation - Test the relation between a stop position node and the platform immediately after it in the list.
- If they belong to a stop area relation and with the same roles it is okay
- Otherwise, calculate the distance between them. Issue a warning above 50 metres.
- Every way has access permissions that are sufficient to allow buses. Remember that in Belgium buses are legally allowed in
highway=pedestrian
orhighway=living_street
. Two challenges here: do we have to make different rules for the parts outside Belgium, and what do we do with all the roads that people retagged ashighway=construction
- Determine whether ways are properly chained and if buses use them in the forward or backward direction.
- Check for gaps
- Check for
oneway
andoneway:bus
tag, issue a warning if a bus uses a road in the wrong direction - Calculate the distance between the first node of the first way and the first stop platform. Issue a warning if it is too far.
- Calculate the distance between the last node of the last way and the last stop platform. Issue a warning if it is too far.
But the hardest part is to match relations in the local database with relations on OSM. A route can have several dozens variants and we only map the most important ones. We have developped a system to identify the subset relations—which we can ignore—but that still makes many relations left. Some of them have the same from
and to
tags, and there is no universal system to determine how to use via
in a non-ambiguous way. With an agnostic system there is probably no good system to say: "This relation in OSM should be this relation in our local database." We will probably end up with a list of candidates, all with the same tags, which we must test.
The idea will be to test the list of stops and see how it matches.
We might get a few false positives there, for instance relations that have the same stops but a different itinerary. An example is a route with a marked every Thursday: buses must go round the block because a street is closed, but no stop is lost. Therefore, there are two different route relations with the same list of stops. They will differ if you look at the shape of their ways, which is something we can evaluate. This might be possible once we develop a module to extract shapefiles (first task, easy) and after that a module to compare two shapes that have different base points (second task, hard).
There are multiple Q/A tests that will be hard to approach in a simple manner. For instance, what is there is an identical or almost identical route relation with the same route number, or a duplicate master relation?
Permalink: https://blog.multimob.be/zzgo3ab8th.htm