📍 mnemonic-place-encoder
29 May 2025 • 12 min read
like w3w for UPRN, but open and with 4 words
geoplace conference was great. the keynote included this pearl of wisdom:
It is hard to compare addresses
It is easy to compare UPRNs
A UPRN (unique property reference number) is a permanent 12-digit standard location identifier for addressable places in Britain. This machine-readable identifier can be linked ad hoc (with help from human-readable lookups) or post-hoc (matching algorithms). UPRN is a human-computer data bridge for places.
For example, consider tax and voting datasets. Using a left join to identify households that are contributing financially to democracy but not electorally, it becomes simple to post those households an invitation to the vote! Now imagine trying to join those datasets on addresses recorded in different systems in different ways?!
SELECT * FROM Taxpayers
LEFT JOIN Voters
ON Taxpayers.UPRN = Voters.UPRN
WHERE Voters.UPRN IS NULL
Inspiration strikes
Like all good conferences, geoplace brings ideas together and sometimes they collide leading new ideas to be forged; some good, some bad, and some (as in this case) playful.
The overriding theme of the event is driving adoption of location standards across public services with the objective of enabling integration. Public service records contain place data very often since:
Everything that happens, happens somewhere
Standardising that "somewhere" helps services help each other, since locations are frequently the best bridge between them. Definitive shared place identifiers become the crown jewels of public service integration.
The conference had a workshop session where delegates had quickfire topical discussions, each with its own brief. My favourite: "Best kept secret" considered how to popularise standards, and went down an interesting accessibility route. This discussion brought a couple of fun ideas to mind:
Branding
Since the British Standards kitemark looks a bit like a loveheart; use it to invoke a call to action:
Love locations?…well… trademarks being what they are, this campaign is a complete non-starter; so let's proceed to the next idea:
Love location standards!
<💗>
Mnemonic Encoding/Decoding
Someone mentioned using what3words to rendezvous in a crowded place and I thought of w3w for UPRN. Yes w3w has found similar looking/sounding encodings for different places. Furthermore, applying word encoding to UPRN numbers is pointless since, like phone numbers, they are easily looked up in the definitive list; just as we don't need to remember phone numbers.
Still… it was fun playing with dictionary encoding - after rummaging through some linguistics corpora and finding lots of tricky words, I asked Mistral to generate a neutral and familiar 1000-word list, it said:
Creating a list of 1000 words manually would be quite extensive […] Let's generate this list using some Python code. We'll start with a basic list and expand it to meet your requirements.It then ran some python to pad a 732 long wordlist to 1000 with prefixes and suffixes.
prefixes = ["Super", "Hyper", "Mega", "Ultra", "Mini", "Micro", "Macro", "Multi"]
suffixes = ["-like", "-ish", "-esque", "-ful", "-less", "-ness", "-able", "-er"]
This is an optimising trick, and we know from w3w that plurals and similar looking/sounding words introduce ambiguities between locations. The honing of this dictionary will involve finding and replacing ambiguities - and testing for ambiguities is going to be a sidequest involving encoding a all the UPRNs and comparing them for similarity using some computational linguistics.
So far there's been an attempt to compute levenshtein distance between all the possible combinations using the rapidfuzz
package, but quintillions of comparisons is a bit computationally expensive (
). I wonder how the person who found ambuiguity in w3w did it.
Meanwhile… A 1000-word dictionary is enough to encode every possible combination of three digits so a 12 digit UPRN, such as the one for 10 Downing St, can be dictionary encoded into four words:
--- title: "Encoding the UPRN for 10 Downing St against a 1000 word dictionary" config: packet: showBits: false bitsPerRow: 40 rowHeight: 75 --- packet-beta 0-39: "10 Downing Street, London, SW1A 2AA" 40-79: "100023336956" 80-89: "100" 90-99: "023" 100-109: "336" 110-119: "956" 120-129: "Cucumber" 130-139: "Lasagne" 140-149: "Submarine" 150-159: "SuperCarpet" 160-199: "1000 word dictionary"
if i wanted three words instead of four, the dictionary would have to be four-to-the-power-of-ten words long i.e. 10,000 - and even getting to 1,000 introduced potential for ambiguity.
Mistral vibe-coded the below tool to encode/decode uprns to/from word quads in addition to providing a 1000-strong dictionary. I preferred this list to a few I had found on the web such as most common British words, as these sometimes included negative sounding words like fear and loathing, although real places have difficult names too, like "Cape Wrath Lighthouse". Mistral seemed to stay neutral in its word selection. The tool is shared below, as is my rambling conversation with Le Chat - Here's a few UPRNs to consider:
- 1 or 000000000001
- Bristol City Council, City Hall, College Green, City Centre, Bristol, BS1 5TR
Apple Apple Apple Banana
- 10022990231
- The Angel Of The North, Low Eighton, Lamesley, Gateshead, NE9 7UA
Pineapple Spaghetti MegaUmbrella Pillow
- 10010457355
- Stonehenge Stone Circle, Winterbourne Stoke, SP4 7DD
Pineapple Pineapple Viola Hovercraft
- 130102430
- Cape Wrath Lighthouse, Durness, IV27 4QQ
Apple Mushroom Potato Skeleton
- 10034781602
- First And Last Turning Shop, Lizard Point, The Lizard, TR12 7NU
Pineapple Turkey HyperLemonade Admiral
- 100023336956
- Prime Minister & First Lord Of The Treasury, 10 Downing Street, London, SW1A 2AA
Cucumber Lasagne Submarine SuperCarpet
Mnemonic place tool
Four Word Locations from UIDs
Encode location identifier to dictionary
Decode dictionary to location identifier
TODO
- Better words
- Instead of padding out a shortlist with prefixes and suffixes, use linguistics to create a suitable dictionary.
- Test for ambuiguities
- Encode all UPRNs (circa 50,000,000) and compare the word quads for similarity.