We Must Identify Sustainable Government Data Sources

To create sustainable sources of open government data, it’s essential that we find, employ, and promote data held by government that is needed by another part of government, that has measurable financial value to government. This ensures that the data will continue to be shared, making it sustainable in a way that would otherwise be very difficult.

The response to our December blog entry about open corporate data has been revelatory. Like all states, Virginia registers corporations at a state level. And like many states, most Virginia municipalities charge a business license tax—a tiny percentage of businesses’ revenue, usually with a floor of a nominal fee, like $50/year. Also like many states, Virginia’s list of registered business isn’t shared with municipalities, so the municipalities have no way to audit their records, to find out what businesses aren’t registered. We devised a system to provide this data to municipalities, and demonstrated its substantial financial value to localities throughout Virginia. In the three months since, tax collectors, local elected officials, members of the state legislature, political groups, and citizen activists responded extremely enthusiastically. Now that localities know that this data exists, and that it can generate millions of dollars in income, there is no stopping this data. It will be published so long as Virginia registers businesses and localities charge business license taxes, as will serve as a model for other states.

There exists throughout government, at all levels, data that one part of government has, that another part of government needs, but where nobody has yet connected those dots. Identifying these datasets is the most powerful lever to opening government data. It’s important to recognize that government employees are rational actors—this is the rare basis for publishing open data that makes sense for individual government employees to participate in. It’s crucial that we find more of these datasets, study how to make them useful, write whatever minimal code is necessary to transform the data to make it possible for government to consume it, study its resulting value, and then tell that story.

Finding these datasets is not easy. It requires a deep familiarity with minutiae of government processes combined with a lot of experience working with data. In practice, this necessitates discussion and collaboration with a great many people at all levels of government—empathic, questioning discussion. Over time, it becomes possible to pull together threads from many conversations, to identify how one agency’s little-noted data source can be of crucial benefit to another agency’s mission.

Of course, a significant benefit of this process is that everybody gets this data, not just the government that needs it. It’s not rational for government employees to publish open data merely because it could be of benefit to the private sector, but by leading with a government-first approach, the private sector gets the data just the same, though sustainably.

Such use cases do not exist for every type of data, nor should they have to. But it’s important that there be large number of use cases like this, applicable at the local, state, and federal levels, to help to establish open data as a sensible norm. The infrastructure that governments create to support these use cases, and the experience that they gain from that work, will later serve to facilitate the publication of data that doesn’t have an immediate financial value to government.

Open data practitioners should work to identify, use, and tell others about government data sources that are of value to other parts of government, so that we can ensure that open data will advance under its own, unstoppable momentum.