The NHS and local authorities use our address-matching algorithm "ASSIGN" to link de-identified data on ‘people’ and ‘places’, informing powerful health research and better place-based care.
Using residential addresses as part of health research can help us to understand how a person’s health may be influenced by social and environmental factors, like the characteristics of their household, or local air pollution levels. But patient addresses are entered into NHS records as free text, so the same address can be written in different ways. Without standardised addresses, we can only reliably analyse health data by postcodes or small areas, both of which include multiple households that may not share the same characteristics.
A team of researchers, led by CEG and Endeavour Health Charitable Trust, developed an algorithm that assigns Unique Property Reference Numbers (UPRNs) to patient records. UPRNs are unique identifiers that are routinely allocated to every property and managed in an Ordnance Survey database. The algorithm, known as ASSIGN (AddreSS MatchInG to Unique Property Reference Numbers), compares addresses in patient health records with the Ordnance Survey database, one element at a time, and decides whether there is a match. It mirrors human pattern recognition to allow for character swaps, spelling mistakes and abbreviations. The algorithm has been proven to be very accurate – correctly matching 98.6 per cent of patient addresses at 38,000 records per minute.
ASSIGN has unlocked the potential of UPRNs for place-based health analysis and research. Importantly, the patient records and UPRNs are de-identified which keeps addresses and patient identities hidden from researchers. The algorithm is open source, quality assured and transparent, and is available for use under a Creative Commons licence.
We can now link and analyse de-identified health data at household and small area levels. This opens exciting new opportunities to understand health inequalities and the effectiveness of policies to reduce them. Using our ASSIGN algorithm and de-identified UPRNs, patient identities and addresses remain hidden, while our researchers and analysts can build a rich picture of the social and environmental factors that affect health at a population scale.— Carol Dezateux, Professor of Clinical Epidemiology and Health Data Science
Assigning UPRNs to the addresses in health records enables two key things: linking people who share a household at a point in time to understand variations in household health, and linking to other data sources, such as property information and local authority records, to study other wider determinants of health. The algorithm makes bulk address-matching with UPRNs scalable and fast, using a rigorously tested and standardised method.
CEG worked with NHS North East London to assign UPRNs in real time to every patient address in the GP health records. Our researchers are now using the de-identified data, sometimes linked with other datasets, to investigate the health impacts of household overcrowding and household clustering of people affected by multiple long-term conditions, and to show areas of measles susceptibility as part of our drive to prevent future outbreaks. We are also working with the NHS in Wales and Scotland and with local authorities in London to leverage the benefits of ASSIGN to improve population health and inform policy.
MMR map with key [PDF 90KB]
Dr Gill Harper, Prof Carol Dezateux, Zaheer Ahmed, Dr Kelvin Smith and Dr John Robson. In collaboration with David Stables and Paul Simon of Endeavour Health.
ASSIGN was developed by CEG and Endeavour Health Charitable Trust, with support from Barts Charity (MGU0419) and HDRUK. Dr Gill Harper is supported by a UKRI Ernest Rutherford Fellowship.