U.S. Open Data Blog

Announcing an OpenAddresses Bounty

01 August 2014 Waldo Jaquith

At the US Open Data Institute, we’re big fans of OpenAddresses. They’re working to piece together a address-level map of the entire world, county by county, city by city, town by town. Here in the United States, that data is often published on municipalities’ websites. (For example, see Baltimore, Albuquerque, or Atlanta.) Stitching together this data will ultimately create a map of every addressable property in the country, as entirely open data, which is essential fuel for innovation.

Today we’re announcing a bounty for contributions to OpenAddresses. We’ll pay $10 for each new United States municipality that’s added from now through September 30. All you have to do is file a pull request on the project’s GitHub repository, and the team has to accept your pull request. Come October, we’ll tally up your contributions, get in touch with you to ask about how to pay you, and then we’ll send you money. It’s that easy.

There are a few rules and restrictions. Qualifying contributions must contain data, website, license, compression (where applicable), type, and note fields, and the note field must provide basic information, such as whether the data provides points or polygons for each parcel, and what the name of the columns with address information are. Payments can only be made to people in countries where we can send you money legally (e.g., Cuba is probably off the table). We’re capping cumulative payouts at $5,000, and we’ll provide public notice here and on the OpenAddresses repository if that cap is hit before October 1. There is no per-person minimum or maximum—we’ll send $10 for one municipality or we’ll send $1,000 for one hundred municipalities.

Ready to get started? See the “Contributing to OpenAddresses” guide and start filing pull requests. Let’s go create a national geoparcel database.

Restaurant Inspection Data

29 July 2014 Christopher Whitaker

When we go out to eat, we assume that the restaurants we eat at are clean, safe, and healthy. State and local governments across the county inspect them on a regular basis to make sure that restaurants are maintaining safe practices.

Unfortunately, results of restaurant health inspections aren’t always published prominently. And when they are, those reports don’t always provide detailed information about the inspection. Did it have any faults? Were they serious? What makes something an “A” or “Pass” rating versus a “C” or “Fail” rating?

Across the country, cities like San Francisco, Chicago, New York, and Boston are making this data available in bulk format.

The problem is that it’s not always easy for consumer to find this information—particularly when people are out and about, trying to find a place to eat. Thankfully, there’s some great movement towards using this information to help consumers make informed decisions.

HDScores

HDScores is an in-alpha app for the iOS and Android systems that lets users get information on health inspections at nearby restaurants. Impressively, HDScores gathers health inspection data from across the country. It incorporates data from 530,000 establishments, and has access to over three million inspection reports.

The app uses a series of bots to collect the data and import it into HDScores’ database. Its developers use a variety of methods including XPath and regular expressions to build a custom program for each jurisdiction. This allows the team to get data regardless of the system being used. They also convert data from older CSV and Flash-based systems to the new LIVES (Local Inspector Value-entry Specification) standard.

HDScores will first focus on the consumer market. Currently in private alpha, the app helps users find restaurant inspection data using their phones. When you open the app, it gives you the option to search for an establishment by name or see a list of establishments near you. Clicking on a location gives you information about the restaurant’s inspection history, and clicking on the each inspection will give you detailed information about it, including violations, the notes from the inspector, and if any of the violations were critical. The app also allows you to save your favorite restaurants for easier access.

Eventually, the team would also like to cater to both governments and businesses. Because they have access to data from multiple cities, the team believes they can provide valuable predictive analytics tools to local health departments. Additionally, because they’re gathering information from health departments as soon as they’re updated, they can also provide business referral services. For example, if a health inspector finds roaches at a restaurant and includes that in the report, HDScores could alert a pest control company, allowing them to promote their service to a business that needs it.

HDScores’ iOS app will launch ad free in mid-August, with the Android app launching this fall.

OpenHealthInspection.com

OpenHealthInspection.com is a web app that shows health inspection data around Virginia. The app was built by the Code for America Hampton Roads Brigade.

When you open the website, it displays a searchable map of nearby restaurants. It also lists nearby establishments. Clicking on a restaurant yields detailed information about the last few inspections. The app also uses helpful icons to distinguish between critical violations, repeat violations, or violations that were corrected during the inspection. The site is also designed responsively, and can use the GPS on the user’s mobile device to find nearby establishments.

The website is open source, and comprised of a modular trio of programs: a scraper, an API, and the website itself, which is built atop the public API. Their scraper extracts data from HealthSpace, a vendor to health departments across the United States and Canada, so it would be trivial to pop up similar websites for other HealthSpace clients, which notably includes the states of Ohio, Tennessee, and Wisconsin.

The work of HDScores and the Code for America Hampton Roads Brigade is poised to do a great deal to help people make informed decisions about where to eat, based on doing the hard work of turning open information into open data.

Let’s Reframe Open Data Rhetoric

06 June 2014 Waldo Jaquith

The arguments that persuade the public to support open data can be actively harmful when used to persuade government employees to support open data. We need to reframe how we describe the benefits of open data within government by understanding and accommodating the needs of the people who comprise our governments.

This Isn’t Working

There are two arguments that are often used in favor of open data:

“We, the taxpayers, have paid for this data to be generated and stored, and we have a right to have it.”
“A democratic government has an obligation to be transparent. In the 21st century, that means providing open data.”

These arguments tend to be persusive to citizens and elected officials. I often use both of these arguments. But there’s an important group to whom they’re not persuasive: government employees. The people who actually open data. Imagine how you‘d feel if somebody marched into your office and told you that you needed to do something unfamiliar, confusing, fraught, and not in your job description, because you owe it to her if you love America. You’d probably gin up a reason not to do it.

Government employees are rational actors. They have a job to do, and “open data” appears in the job description of maybe a few dozen of them. In an economy in recovery, government agencies are short on staff and on funding—dedicating time and money to tossing data into the ether in hopes that somebody will do something useful with it…well, that’s not very sensible. We need to emphasize reality-based reasons why governments should produce open data.

Let’s Try This

Government agencies frequently need to exchange data with other government agencies and between departments within the same agency. Local agencies communicate with their state counterparts, state agencies exchange data with other state agencies, federal government needs to aggregate data from state governments, and so on. The Microsoft SharePoint document management system is widely used in state and large municipal government, and it provides several methods of sharing data with third parties. Sometimes there might be a shared server used within a single government, with the data perhaps found on the L: drive in the form of an Access database. But in thousands of municipal governments and state agencies, data is shared by simply e-mailing around Excel files. (Or, often, by walking down the hall and knocking on the door of the person who has the file.) More modern agencies might use Google Docs or a similar cloud-based service, and share access to spreadsheets there. Because of these varying practices, a single state agency may well receive data in a different fashion from each of dozens or hundreds of municipalities.

This presents an opportunity for open data. If the staff list is simply a spreadsheet on a website, the real benefit of open data to government might be that Jim from HR won’t have to send the same e-mail every other Friday, asking for any changes in staffing—he can just download the spreadsheet. This simplifies things for both for Jim and for the people in 20 different agencies who are tired of getting the same request from Jim every week.

There are a lot of efficiencies waiting to be realized based on intra- and inter-governmental open data. Many municipal agencies have a state counterpart. Many state agencies have a federal counterpart. Agencies collaborate with other agencies within the same government. Divisions of large agencies need to share data. And data needs move between all of these, as indeed it does, but often it does so awkwardly, at best. A non-trivial portion of agency IT departments’ work is to facilitate sharing data.

When confidentiality isn’t at issue, sharing governmental data in the open using a common protocol (HTTP) and a common format (CSV, JSON, XML, GeoJSON, etc.) can increase intra- and intergovernmental access to that data, provide more tools for producing and consuming that data, and has the happy side-effect of emitting open data for the benefit of the public.

Results So Far

I’ve only been talking about this for about a couple of months, but it seems to resonate with audiences. When giving a talk in which I propose thinking of open government data in these terms, I can tell who works for government, because their eyes light up, they nod vigorously, and then they all come up and talk to me when I’m done. I have adopted this new tack in the course of advising state and local governments, and I’ve seen how it changes the minds of the holdouts at the table.

This is just one attempt to describe the benefits of open data to government employees. There are certainly others (e.g., reducing the volume of FOIA requests), any number of which might prove to be more compelling to bureaucrats than simplifying internal data sharing. The important thing is that, when pitching open data to government, we start to frame it in terms of how it will benefit them—how it will make their lives easier or contribute to their professional success—instead of framing in terms of what we want, or how it will benefit unidentified third parties.

Just Share Your Data

23 May 2014 Waldo Jaquith

In the world of open data advocacy, we talk a lot about standards, platforms, and best practices. These are all important goals, but not one of them should be an obstacle to government publication of open data.

We Have High Standards

It’s true that in the long run, adherence to standards is crucial. That means providing data as CSV, not Excel; JSON, not binary; Markdown, not Word; and so on. That means explicitly releasing data under an open license or, for governments, into the public domain. That means committing to keeping data up-to-date, instead of letting it languish. That means using open data catalog software (like CKAN, Socrata, DKAN, or Junar) to house published datasets. And that means enacting an open data policy, so that everybody in your organization (governmental or otherwise) knows when and how to publish open data, and third parties know what to expect.

These are all important things, but they shouldn’t be obstacles to starting to publish data. That is especially true for small units of government: municipalities, state agencies, and special-purpose governments (e.g., school districts, transit authorities, planning organizations, etc.) Small governments generally lack the IT staff and budget that are necessary to make a wholesale committment to open data. As a result, movement towards open data is often the product of just a few people’s interest, with no agency-wide committment. An insistence on adherance to best practices makes it impossible for such efforts to succeed.

Just Do It

Publishing open data can be really very easy, provided that one isn’t a purist about it. Follow these simple steps:

Inventory the data that you already publish. Look through your website and make a list of all of the data, in any format: PDFs of budgets, Word files of agendas, Excel files of expenses, Shapefiles of zoning maps, and so on. One way to do this is to use Google’s advanced search to search just your site (the “site or domain” field), and use the “file type” field to try searching for PDFs, Excel files, etc.
Create a new page on your website (suggestion: http://example.gov/data/). List each of these files, providing the title, a description, and a link to the file. When possible, provide a date for when the file was last updated and how often it changes.
Tell people what you made.

This is not the best approach to publishing open data. It fails all kinds of tests about accuracy and licensing and standards. But that doesn’t really matter right now, because it’s data that people have access to that they didn’t have access to before, and that’s the important part. And if the experiment goes well, then the government might be willing to commit some actual resources to open data efforts: dedicated staff time, funding, or a change in policy.

For small governments to publish open data, they just need to go ahead and publish open data. The rest will follow.

The Role of Vendors in Opening Government Data

22 April 2014 Waldo Jaquith

The prominent role that commercial software plays in government agencies at local, state, and federal levels makes them a powerful force in opening data (or, far more often, not opening data). This is a fact rarely acknowledged in the open government data efforts. We should capitalize on the latent power of vendors, regarding their software as a crucial vector for the expansion of open data.

At some point in discussions about how government should go about opening more data, the word “just” makes its first appearance. As in “they should just publish SQL dumps to the web and let the private sector take it from there” or “they should just export the data as CSV files, since that would be better than nothing.” Generally, this is true—in the abstract, the simple, suggested steps would be easy wins for the sharing of data. But “just” glosses over a real obstacle—legal and practical impossibility. Such suggestions are akin to suggesting that homeowners just cut off the lock on the PG&E power meter on the side of their house and wire in a Zigbee transmitter to monitor their home energy usage, or open up their Comcast-owned cable box to enable a debug mode to monitor signal degradation over time.

There are a handful of reasons why “just” tasks are often much more difficult than it seems like they should be, but the most important one is this: many agencies rely on commercial software that they have neither the tools (the source code) nor the authority (licensing) to change. There are lots of vendors, large and small, who license records management software—e.g., SAS, Competitive Edge, RBDMS, and CDP, to select a few results from Google. The agencies who use that software are rarely in a position to dictate new features, so “just exporting data as CSV” isn’t actually a thing.

We can pass open data laws, enact open data policies, and appoint Chief Innovation Officers all day long, but if the software relied on by government cannot produce open data, then that’s all for nothing.

Imagine if the major records management vendors included open data functionality. Imagine that it’s enabled by default for all types of data without security risks or personally identifying information, creating bulk downloads and an API for each dataset. The number of municipal, state, and federal datasets that would become available would be larger than those resulting from any other initiative.

There is no possible number of Open Data Institutes, no number of Code for America cities, no expansion of the Sunlight Foundation that could have an impact anywhere close to the impact that would result from a handful of commercial vendors adding a File → Export as JSON menu option.

How do we get commercial software vendors to add functionality that faciliates the provision of open data? I don’t have the faintest idea. Let’s get to work on figuring that out.

Previous Page: 8 of 9 Next