Every Sufficiently Large Open Dataset…

…deserves a curator,
…needs a promoter,
…requires crowdsourcing, and
…smells like money.

by Robert L. Read, Michelle Hertzfeld, and Hillary Hartley

Data.gov publishes free of charge and with no strings attached 87,000 different datasets. Some big datasets, like satellite images and weather data, have become the basis of entire industries, and need no promotion.

Most datasets are not so exciting. Many are small, some contain errors, most need interpretation, and all require searching.

But every dataset, even the most boring, deserves a little love and attention from a curator who catalogs it and makes it easier to understand. Therefore, every Open Dataset published at data.gov represents a business opportunity in one way or another.  Finding the people who are willing to pay for, or at least receive ads when viewing, the datasets is of course a problem–and therefore a business opportunity.

In addition, private companies can add value to the data more easily than the government can. For example, the government can provide basic search functionality, but it has trouble providing more advanced context-specific and content-specific searching very effectively. Why? Firstly, the government employs very few programmers compared to its needs. Secondly, the government is not motivated by making money. Finally, the government is bound by laws of privacy, security, and fairness in ways that private firms are not.

For example, imagine that the government attempted to crowdsource some data cleanup and rating for a large dataset. It might have to:

All of these things not only add to the expense of a government hosted web or mobile app presence, but also stagnate the all-important process of rapid, agile development that is responsive to user feedback.

In contrast, what would a private company have to do to create a similar data cleanup process? A private firm could build a crowdsourcing platform quickly or could simply provide a website that presents a curated view of one or more datasets. More likely, they would reuse an existing platform which could be stood up in one day. And they should, because the American people deserve not just to see the data, but to see the data in an unobscured and comprehensible way.

This begs the question: how much money can really be made? How much are people willing to pay for a little extra added value? One is tempted to say: not much. But it also begs another question: how much does it cost to provide basic curation, editing, commenting, crowdsourced analysis, searching, and sorting? The answer: not much. Or at least not much using modern, off-the-shelf open-source software, which allows extraordinarily rapid prototyping and high reuse. There are 87,000+ datasets at Data.gov right now, representing opportunities big and small. Do the math.

7 Responses to “Every Sufficiently Large Open Dataset…”

  1. ashraf

    That’s great article and very useful for us , please keep writing for us always , we will be big fans of your great blog.You know , you have a great writing style , i love it indeed and i hope to be a good writer like you

  2. Sunil

    Hello, Mr. Hillary Hartley and Michelle Hertzfeld, this is Sunil, yes I agree that private firms could build crowdsourcing platform instantly and can provides websites simply that can present curated thoughts of more datasets.

    I really likes your all posts…

  3. Mark

    I was amazed data.gov can provide free

  4. Desconto Ricardo Eletro

    This is very important for the research.

  5. Abraham Becker

    In addition, as public servants, and as data providers, the larger part of the responsibility to provide *quality* data within a given format (csv, xml, etc) rests solely on government body producing the data. If a large amount of data is produced, but it is not in a consumable format, or contains numerous errors, its usefulness is minimal. No amount of data mining or programatic genius can extract useful results from poorly curated data.

    If the US Government are serious about providing useful data, efforts like enforcing a metadata schema, as mentioned in the Project Open Data Page (http://project-open-data.github.io/schema/), to catalog available data sets, represents a small step in the much larger process of ensuring data quality in addition to availability.

    Data.gov is a fantastic first step in what needs to be a much larger effort to democratize the data that the American people receive as proof that their government takes accountability and integrity seriously. See (http://en.wikipedia.org/wiki/Data_Quality_Act)

    • Robert L. Read

      Those are very good points! Thanks for posting your comments and those valuable linkis.

  6. animal care clickbank

    Awesome post.

Comments are closed.