dimanche 1 mars 2015

How to use an Address Gazatteer in an application that contains addresses


Say I had access to the post office address file (http://ift.tt/ZlK0pd) and wanted to cleanse about one million addresses. I am trying to think of an "algorithm" to do this. Say the address table in the core system (that needs cleansing) looks like this:



create table address (addressline varchar(1000), housenumber varchar(10), town varchar(100), county varchar(1000),postcode(20))


Say this table contains:



1) Addresses with postcodes that are missing
2) Addresses with postcodes that do not contain the correct town, county etc
3) Addresses without postcodes


I was thinking about doing something like this:


I postcode is valid then cleanse the addressline, housenumber, town and city. If the postcode is missing or does not exist then lookup the post code using the addressline,town and city. Does this sound like a valid approach? Perhaps I should look at using fuzzy lookups for address lines, towns, counties etc that are spelt incorrectly?





Aucun commentaire:

Enregistrer un commentaire