Through my travels working on Linked Data projects in the Department of Health and Human Services (HHS), and collaborating with other federal agencies pursuing Linked Data through Data.gov’s semantic community, we frequently leverage the work of many talented international contributors to the Linked Data community. It turns out that many of them share something in common–often they’re affiliated with the Digital Enterprise Research Institute (DERI).
Like our ongoing collaboration with RPI’s Tetherless World Constellation, the Linked Data rock stars at DERI deserve some Data.gov love for the great work they do, and the many contributions they have and continue to make. Their work touches so many aspects of those in the US who, like me, are in the business of helping to realize Government Linked Data, in conjunction with voluntary consensus standards organizations like the W3C, which is of course central to this open data mission. This post in an overview of just some of the ways we appreciate DERI.
When the Centers for Medicare and Medicaid Services (CMS) decided to publish their Clinical Quality Linked Data on Healthdata.gov, we made extensive use of DERI’s RDF extension for Google Refine, helping to design the RDF Schemas we used to define the metadata to capture a controlled vocabulary for Hospital Compare. We did our first schema pass with Refine+DERI, using it to do rapid prototyping, leveraging the capabilities it provides for mapping data sources in csv/tsv formats to an instance of what resulted in our RDFS (resource description framework schema) vocabularies, which provided a quick and easy way to see whether the triples that were generated from the mapping looked like what we wanted. Usually, we ended up polishing our schemas with powerful IDE-based RDF editors, like the popular Top Braid Composer from TopQuadrant.
Once we launched our Virtuoso-powered Clinical Quality Linked Data site, Refine+DERI proved useful with even more powerful capabilities, such as reconciliation services that leverage our SPARQL endpoint, allowing us to resolve the string-based attributes of health domain entities like hospitals from multiple publication sites, against the CMS published URI’s that globally disambiguate the identity of those hospitals, enabling data from disparate publications to automatically aggregate around that identity.
After we’d created some simple RDFS vocabularies that formed the metadata foundation of our newly published Linked Data, we wanted a way to publish those vocabularies to make them easy to access and understand. Here we leveraged DERI’s work again, by creating a catalog of our metadata schemas in use, leveraging their Neologism tool, that runs on the Drupal open source content management system already leveraged by Data.gov, to stand up the vocabulary catalog site at vocab.data.gov.
At HHS in the Office of the CIO where I work, there’s interesting stuff going on that also benefits from DERI’s contributions and collaborations.
We’ve been experimenting with indexing triples using the Siren extensions to Solr, working towards Web-scale metadata indexing features demonstrated by real-time semantic infrastructure sites like Sindice and its crazy cool interfaces like Sig.ma, which is just one example of how innovative startups help to augment government and industry understanding of why Linked Data + Big Data is such as compelling pair, pushing market leaders to embrace the utility of stronger metadata for other than just their application specific purposes sooner rather than later.
We’ve also been working on combining some contributions from our ever inspiring UK friends, who’ve invented what they call the Linked Data API (LDA), and implemented it on data.gov.uk, with the Privacy Preference Ontology and related privacy management web applications from DERI’s Social Software Unit. I recently refered to the LDA as a “Web 3.0 API” at the excellent #FCCDevDay, and will continue to promote the LDA/PPO combination as one Linked Data realization of what the PCAST HIT report describes as “data element access services” (DEAS) at this year’s Health Datapalooza. (Unfortunately I can’t be in two places at one time, otherwise I would’ve enjoyed the opportunity to present the Towards Patient Controlled Privacy session at SemTechBizSF with our DERI collaborators.)
More behind the scenes work that routinely benefits from substantial DERI engagement includes an ongoing contribution to the creation and promulgation of open standards related to open government data catalogs and communities. But DERI doesn’t stop there, they put these new standards into practice through enhancements to Drupal 7 core, helping make it an even more powerful publishing and visualization tool for the emerging Web of Data. We hope to leverage all of these features and capabilities in our current and ongoing Healthdata.gov modernization efforts. They also create lots of other useful tools and pen helpful blog posts that promote the proper use and integration of standards. Futhermore, DERI folks are active in many other efforts to promote structured data using open standards and help to clarify best practices that will ultimately lead to better integration of international government statistics.
So all this adds up to a great big thank you to DERI, on behalf of Linked Data practitioners and enthusiasts here in the U.S. Please keep up the inspiring and trail blazing leadership that benefits the entire Linked Data community. I’m sure I’ve left out additional significant value-added contributions from DERI individuals that I’ve neglected to mention and that others know about–please add those in your comments!