Accidentally Collecting 30 Million Land Records

07 June 2016

We’ve just finished an interesting, experimental project to open up cadastral data, and the results are too delightful to keep quiet (despite that we have nothing to show for the project right now except a GitHub repository).

U.S. Open Data is a long-time supporter of the Open Addresses project, a volunteer-run project that aggregates government-published address datasets to create a global repository of the coordinates of street addresses. Anecdotally, project volunteers had noticed that a fair number of the data sources contained not just the latitude and longitude of an address, but the boundaries of the parcel. That raised the question of how many of the indexed 257 million addresses might include boundary data that was going unused. Could we have accidentally collected millions of cadastral records?

So we hired PostLight to figure it out for us. Developer Bryan Bickford spent a little over a week creating a Python-based tool to find and extract parcel data from OpenAddresses’ records.

Bryan’s work gave us a hard number: of the 1,511 data sources ingested by OpenAddresses, 383 include parcel boundaries (or 25%). There are a total of 30,461,769 parcels included. Although OpenAddresses has address data for dozens of countries, it turns out that only 5 commingle addresses with parcel boundaries: Brazil, Canada, Singapore, Ukraine, and the United States. 95% of those included parcels are in the United States (29,033,215 in total). That’s a lot of land parcels to have collected accidentally.

If the idea of a comprehensive cadastral map sounds familiar, that’s because this is exactly what Loveland is doing. They’ve put together a collection of over 100 million parcels. We’ll figure out the extent to which their 100 million parcels overlap with our 30 million parcels, to ensure that we can fill in any gaps that we’ve identified in Loveland’s records. The first 10% of what Loveland does is a lot like what OpenAddresses does, but they go so much farther in building atop the data that they collect, instead of just collecting and publishing metadata.

The OpenAddresses project is working on figuring out what to do with parcel records, now that it can be known that it has so many of them. That might mean doing nothing (beyond sharing this data), and it might mean creating an ongoing offshoot of the project to publish this data. It remains to be seen.

Our thanks to the Shuttleworth Foundation for funding this project.