In Praise of CSV

10 March 2015

Comma Separated Values is the file format that open data advocates love to hate. Compared to JSON, CSV is clumsy; compared to XML, CSV is simplistic. Its reputation is as a tired, limited, it’s-better-than-nothing format. Not only is that reputation is undeserved, but CSV should often be your first choice when publishing data.

It’s true—CSV is tired and limited, though superior to not having data, but there’s another side to those coins. One man‘s tired is another man’s established. One man’s limited is another man’s focused. And “better than nothing” is in, fact, better than nothing, which is frequently the alternative to producing CSV.

There’s a lot about CSV that makes it a great format:

Again, some of these strengths are weaknesses. The simplicity of the file format makes it terrible for rendering complex data, especially nested data. The lack of typing makes schemas generally impractical, and as a result validation of field contents is also generally impractical. There are vast swaths of data that can’t be reasonably represented as CSV, because they’re too complex.

For many datasets that can be represented as CSV, they should be represented as CSV. CSV lowers the barriers to both producing and consuming open data, and it’s crucial that we continue to drive down the minimum viable product for open data. So knock CSV if you must, but please also produce CSV, to make sure that your data can be used widely and easily.