Data Harvesting
Data.gov is organized around metadata published by government offices. This metadata is harvested from external websites and aggregated on Data.gov so that it’s easier to browse and search. However, some applications may want to consume this metadata programatically and there are two ways of doing this explained below.
Option 1: Harvest Aggregate Metadata
The simplest option is to access metadata in aggregate as it exists on catalog.data.gov. This can be done via our CKAN API or our CSW endpoint. We do not currently provide a single aggregate file of all metadata, but we hope to provide this in the future. Until then, you can follow this GitHub issue for instructions on using the CKAN API to crawl or filter metadata.
Option 2: Harvest From Upstream Harvest Sources
Another option is to go directly to the metadata source. Every harvested source of metadata is listed at https://catalog.data.gov/harvest and via our CKAN API using this filter. As part of Project Open Data most government offices have transitioned to make all of their metadata available via a standard schema packaged as a data.json file. These are treated just as any other harvest source and you can use the CKAN API to filter for only these harvest sources.