The New Trend of Regional Data Centers

Across the U.S., there are a half dozen efforts underway to establish regional open data centers for municipal governments. The hurdles to publishing open data are too high and too numerous for the vast majority of municipal governments. By banding together, the minimum viable product for municipal open data can be simplified substantially.

The Problem

It’s a practical impossibility for most municipal governments to publish open data. There are 39,044 local governments in the U.S., but those with open data programs number in the dozens. To pick an quasi-arbitrary cap, for governments with fewer than perhaps 100,000 citizens—or 98.7% of subcounty municipal governments—it’s beyond their capabilities. They lack in-house technical expertise. They lack a budget for specialized staff, a data repository, ETL solutions, etc. They’re saddled with lousy, specialized software that has no ability to export data in an open format. Worst of all, they lack clear business cases for why they should open their data holdings.

By banding together, municipalities can work around these obstacles. A central, coordinating entity can determine what data it would be mutually beneficial for them to share, establish norms for that data, and provide shared infrastructure at a viable per-municipality price point.

Normally when a group of geographically proximate local governments band together to establish and share common standards, we call it a “state.” But states haven’t shown leadership in the open data space, despite the extensive amounts of data that they collect from localities. Perhaps their priorities lie elsewhere, perhaps they too lack the technical expertise, or perhaps they find the challenge of working with every municipality in their state too daunting. Whatever the reason, states are sitting this out.

The Solution

Governmental, non-profit, and educational organizations are stepping into this leadership vacuum. (Governmental organizations like community service boards and metropolitan planning organizations are in a particularly good position to fill this role.) Of the half-dozen regional data centers under development, all are in various stages of planning, none are launched yet, and most of them aren’t public knowledge.

The Pittsburgh Regional Data Center is the farthest along. Under new mayor Bill Peduto, the city of Pittsburgh has made a strong push for open data in the past year. True to form, they’ve teamed up with the surrounding Allegheny County, the University of Pittsburgh, and Carnegie Mellon University to create a regional data center, supported by a $1.8M grant from the Richard King Mellon Foundation.

The University of Pittsburgh’s Center for Social and Urban Research (USCUR) is spearheading the project, after spending the better part of the past decade as a data intermediary in the region. I talked with UCSUR’s Bob Gradeck, who is heading up the project for the organization. He explained that with 120-odd municipalities in the greater Alleghany County area, the challenge is substantial, but so is the potential payoff. They hope to have 10 local governments as a part of their program in a year, working with those governments to find ways that sharing data can help them. The idea isn’t to lead government, but instead to provide the infrastructure and resources to support existing government initiatives with better data. There are people already doing good work who just need support.

The Regional Data Center will provide the technical and administrative support to establish a data repository for each participating municipality. They’ll identify the best data practices that exist in the region, and promote them to other municipal governments. Perhaps most important, they’ll establish a regional data center that can serve as a model for other regions throughout the country.

Looking Ahead

Each of the planned regional data centers are a little different. They serve larger or smaller areas, they have different goals and methods, they’re run by different kinds of organizations. Although it’s possible that one approach will turn out to be the right one, it’s more likely that different types of data centers will be better in different regions. If the regional data centers are open about their methods, and frank about their successes and failures, their approaches may be a good short-term solution to the seemingly insurmountable obstacles to publishing open data that are faced by municipalities.

In the medium term, states should take over this role, as the cost/technical challenge of running a repository and ETL is driven down. It is the proper role of states to establish the standards for the exchange of data between localities and between the state and localities, and the sooner that they get involved in that, the more control that they’ll have in establishing standards and practices that are to their liking.