Under the Hood of the Open Data Engine

            Data.gov started almost four years ago with a simple idea—opening up government data helps citizens make better-informed decisions and empowers businesses to be more innovative.  In the years since then, Data.gov has grown from 47 datasets to 400,000, from a few agencies to 180, and from providing only the data to also providing context, community, and conversations around the data. 

            The new Executive Order and Open Data Policy are ground-breaking in their requirement for agencies to open up new data and information, present those in human- and machine-readable formats, and will help to usher in the next stage of open data innovation.

            We are seeking your great ideas and constructive criticism as we move forward to the next phase of Data.gov. We want to scale up the quality and quantity of data, be more helpful to American businesses and entrepreneurs looking to use government data and research, more clearly support learning in classrooms, get government data in front of researchers and journalists, and bring the power of open data to American citizens. 

  • What can be done differently and better to make open government data more useful to you? 
  • What features do you want to see?
  • What topics are missing or incomplete?
  • What ways can we better connect with your community?

            It’s all about getting you to the data you need as quickly as possible in a variety of machine-readable formats with better search, more APIs, easier ways to share data, more data resources federated.  You’ve told us via forums, list serves, hack-a-thons, blogs, and meetups around the country that we need to have more and better capabilities for developers and innovators.  We are listening.

            So, what’s new and different?  

            Search.  We’ve taken a lesson from other open data sites, and built our new catalogs on a great tool, CKAN.  This new catalog harvests data from all the US federal agencies, as well as other organizations that are part of the government geospatial community.  The improved search lets you search and browse options from simple keyword search to filtering and faceting by tags, formats, publishers, and locations.  The geospatial search allows you to draw your own custom bounding box. Found some great data? You can save and share your search results via each dataset’s persistent, unique URL for linked data fans and easy reference by researchers and data journalists. Check out a sneak peek at our new combined catalog. You can compare this to the old separate catalogs for “raw” and geospatial data.   We are finalizing a few things on the catalog, so let us know what needs to be different.  Need local data? Check out the data published by Cities, Counties, and States.Data.gov.

            APIs.  What is the most often heard phrase at meet ups and hackdays? “Give me an API and get out of my way.”  We hear you. As more and more agencies launch developer portals, an API catalog is under construction to provide an automated, filterable catalog of all APIs across government.  While leaders like the Labor Department and Census Bureau already offer a range of advanced APIs, we recognize that other agencies are newer to this.  To help, we’ve been scaling out a range of tools and resources to empower all federal agencies to adopt an “API first” model that will grow ever more quickly the web services that developers can use to further their innovation.  The new catalog comes with a full RESTful JSON API to all metadata fields so everything in the web interface can also be done via the API (from search queries to downloading data files).

            Data publishing.  Soon, gone will be the days where agencies have to input their metadata into a Data.gov form or send over a spreadsheet (yikes!).  Later this month we will start harvesting JSON files from agencies that are publishing catalogs.

            Data pages. We are running through some options for new designs and formats to enhance the usability on the site, including the pages for the datasets.  We’ll open these ideas up to you as they evolve, but stay tuned for suggestions developers have given us to show what is related to the dataset you are viewing:

  • News results
  • Related datasets
  • Ideas from you on how to use the data and comments about the data
  • Apps and services that are using that dataset
  • Questions and answers

            Open source. Data.gov has also gone open source.  Want to download and use the code or, better yet, contribute extensions, code, and ideas? Jump over to Github and start hacking.  You’ll notice multiple forks here contributed from our international partners in open data as part of the Open Government Platform (OGPL).  The Government of India contributes a Drupal 6 and Drupal 7 code base, Canada is contributing their Web Experience Toolkit, and the Open Knowledge Foundation in the United Kingdom provides CKAN 2.0.  You can contribute directly to one of these code bases, to OGPL overall, or create a new fork.

            Open questions.  We are encouraging the government data owners to chat with you in a new Open Data community at StackExchange (coming next week) and talk about and improve the quality of the data. This way, questions about open data also become a form of the open data itself.

            So if you’re passionate about the possibilities of open data and what new frontiers need to be explored or what barriers need to be demolished, share your ideas publicly or one on one!  We will be launching some new features this month and throughout the summer and fall as we hear back from you.  Help us put the data to work.  Data liberación!    


                                                            The Data.gov Team

16 Responses to “Under the Hood of the Open Data Engine”

  1. Clinton Johnson


  2. Ron Dales

    This is useful.

  3. Anonymous

    Looks good! Watch out though, your link to stack exchange contains a referrer id, so when users go to the site it gives us a friendly “TurtlePowered thought you’d be interested in this proposed Q&A site.”
    I look forward to seeing what changes come to data.gov over the following year!

    • jholm@jpl.nasa.gov

      Thanks for the pointer. Stack Exchange should provide a robust place for questions and answers and we are almost through the launch process with them.  I’ll be sure to update the link when we get our live site.

    • Jaydles

      Jeanne, the current link is to the Area 51 proposal – now that the site’s in public beta, it should be going either to the site itself:


      or to the “About” page for the site:


      • sally.bourrie@gsa.gov

        Thank you for letting us know. I’ve corrected the link.

  4. clintpickens@libertyresources.org

    I am so glad to be part of this crowd.

  5. baoveplus

    I’ll be sure to update the link when we get our live site.

    Thank you for sharing Awesome!

  6. Ann Landers

    I believe this internet site contains some really great information for everyone. “Philosophy triumphs easily over past evils and future evils but present evils triumph over it.” by La Rochefoucauld.

  7. love1911.giaybupbe

    Thank so much this is the best site I’ve used.

  8. hainguyen

    I see model that will grow ever more quickly the web services that developers can use to further their innovation.

  9. Anonymous

    I am impressed by the quality of information on this website. That’s all i can say..useful information shared… i am very happy to read this article.. thanks for giving us nice info. fantastic walk-through. i appreciate this post

  10. TMLutas

    APIs are useless when they don’t actually work. You need a test suite you periodically run to verify that the growth in the site hasn’t broken things like getting a listing of your dataset. Such tests should be automated and painless enough to run frequently, like daily.

    That’s a gentle hint that as of writing the site’s broken.

    Here’s an example command of what’s not working
    curl -O http://catalog.data.gov/api/3/action/package_list -d ‘{}’
    while this shorter result similar command works just fine
    curl -O http://catalog.data.gov/api/3/action/group_list -d ‘{}’

  11. visit web site

    Nice article, thank you very much! This is impossible for me to find further readings on the topic. Would you please point me to more readings? Thanks.

  12. Sims Obrien

    It’s a important thing in current time. All the data are stored in open data engine.

  13. Max Riley

    I’m impressed by this site! The data definitely helped me to draw comparisons to my site when I am doing stats up between my country and US!

    Especially when there’s data on minimum wage and we don’t have any minimum wage in our country.

Comments are closed.