The Web has fundamentally changed our society, becoming one of the most important technologies in modern times. Despite its incredible success, however, the Web is still limited in many ways. One of these is that when it comes to working with data, whether in a database, a spreadsheet on your computer, or even in a table on a Web page, you simply don’t get the sort of support from the Web you get from documents. You cannot easily search for data, you cannot easily move data from one application to another, and you cannot easily understand the information that the data conveys without a lot of applications programming support. As Tim Berners-Lee, the inventor of the Web, put it in an article in 2002 “for the data in our lives we are still pre-Web!”
However, in the past few years, since those words were written, we have come a long way. Data providers, such as the many government agencies that have published the many thousands of datasets available through data.gov, have made much more information available. Visualization tools on the Web have been created by companies such as Google, Microsoft and IBM (among many others) to help make it possible to see that data. And most importantly of all, a set of technologies known collectively as the Semantic Web has made it possible to link that data together on the Web and to use it in new and exciting ways. Below, we explain in some detail how this technology works, and why we are so excited about bringing it to data.gov to help people create new and exciting data mashups from the many datasets available on this site.
Semantic Web Value Proposition
At its heart, the Semantic Web is really about extending standard Web technologies to better deal with data on the Web. Its fundamental approach is to create a uniform data model and a simple grammar that are realized through a Web language known as the RDF standard. Following the well known RESTful architectural style (which underlies almost all of today’s major Web applications) RDF provide a means to give Web addresses to data elements. Essentially the resources associated with subjects, predicates (or properties) and objects (sometimes described using perhaps the more familiar ‘entity, attribute, value’) all are given HTTP dereferenceable URI’s, with different serialized representations of these resources available to suit machine (apps) and human (browser) user agent preferences. Combining standard networking protocols, a standard application protocol, a standard uniform data modeling language, and its corresponding standard query language, results in an overall standards-based API for open government data. By normalizing the way we access, process, persist, and visualize data across open government info domains, we can simplify mashup creation and maintenance to enhance productivity and unleash more innovation powered by democratized data. Whether you’re a developer or their manager, you should consider how these technologies can help you get more done while doing less work.
Where We Are Today
We’ve deployed some open source infrastructure tools into our existing data.gov environment that reflect our standards based approach and provides a query pointfor the mashup Tetherless World Constellation collaborators at the Rensselaer Polytechnic Institute. It’s no coincidence that their leaders are co-inventors of the Semantic Web, along with Sir Tim Berners-Lee and others. As an IPA’d expert to the Data.gov PMO, Professor Jim Hendler and a team of students have created the featured applications that leverage this pattern using open government data available on data.gov, along with a number of interesting demo appsand tutorials on their site that demonstrate new technology standards we’re interested in utilizing and other functional capabilities we’re moving toward, like faceted browsing, semantic search, automating distributed catalog maintenance, and more. The existing data.gov datasets they’ve converted into RDF has created over six billion triples, which makes data.gov now one of the largest open sets of RDF datain the world!
Where We’re Going
The Linked Data holds tremendous promise for transparent sharing, participatory linking, and collaborative curation of government information online. To establish open government data as a service platform and a subsystem of the Internet Operating System in the Linked Data cloud, there are some basics that we need to address to enable a federation of distributed data publishers and consumers. To maintain the desired level of autonomy and healthy diversity across this self organizing value chain, continue maximizing the efficacy of the link as the fundamental value measure on this scale free network – yet still enable meaningful aggregation and analysis of the whole as some of its parts – we need to turn our focus now to common and distinct vocabularies and policy backed conventions for persistent URI schemes.
Fortunately, the international Linked Data community of government agency practitioners, open government data advocates, research institutes and leading academic institutions in collaboration with industry and voluntary consensus standards development organizations have been busy blazing the trail we’re on for quite a while. Many existing and emerging cross domain vocabulary standards for data catalogs, linked datasets, publishing, geospatial, concept schemes, provenance, licensing, versioning, and many more are specified using standard languages for vocabulary and ontology design and enjoy popular use or engaged experimentation. Regardless of your agency mission, most will need to express these common concepts in their published open government data.
In addition to common metadata concerns, unique agency mission areas will also require domain specific vocabularies. Many examples with deep roots in semantic technologies already exist, so we’ll be working to bring these into the Web of Data, with an initial emphasis on areas of national and international concern and interest, like health and the environment. We believe that data quality and value are emergent properties literally determined by the network effect. In light of this, we’ll begin to explore social media tools to facilitate the creation of metadata vocabularies and curation of their corresponding instance datasets, with both seen as objects of social collaboration. We think that the ‘Social Data Web’, where we combine the features and capabilities of the Social Web and the Web of Data, is a powerful idea that will lower our coordination costs and allow independent evolution and interlinking across government information domains. There are inspiring examples of socially managed data sites and many existing and emerging tools with strong Semantic Web support to leverage.
We appreciate how our friends in the UK have been demonstrating their approach to named graphs and URI schemes to establish authority and enable linking in their corresponding groundbreaking initiative, and we’ll continue to collaborate with them and others that believe the Linked Data platform is an open government game-changer that will help us achieve our shared goal of linking the worlds data. Our data.gov birthday release reflected the current status of our US community around the concepts and related standards, tools and techniques on the Web of Data. Although we see a long and winding road ahead, we’re working eight days a week to inject these ideas and technologies into our contributions to the global information ecosystem of open government data. We’ll be establishing more community enabling sites and tools in the very near future, and we hope you’ll join us on this exciting journey.