Some institutions may wish to host archives or mirrors of data listed on Data.gov. We recommend archives and mirrors use the metadata provided by data publishers to manage these collections and to retain complete metadata and data provenance. This metadata will also help to more systematically manage these large collections and facilitate automated updates and crawls. Information on systematically harvesting the metadata can be found on the Data Harvesting page.
Avoid Orphaned Datasets
For archives and mirrors that have been developed without the associated source metadata, we recommend making every attempt to associate the metadata provided by the publishers with the copy of the data and for both to be made available together. This can be done by matching the URL that the dataset was downloaded from with the distribution URL listed in the metadata. For the main Federal agencies that follow policies under Project Open Data, the identifier field is a required field, but it’s currently only required to be unique for the agency and may not be globally unique. With that caveat in mind, these unique identifiers should be helpful for managing archives and mirrors. Data.gov also generates a globally unique identifier for each metadata record which can be used to manage updates from our aggregate collection.
Report Unlisted Datasets
For datasets that have been downloaded and have no known metadata record on Data.gov, please contact us to report this as an unlisted dataset. For instances where metadata is available, but the correct URL of the downloadable file is not included as part of the metadata record, please report this as a new “Data Issue” using the orange button on the metadata listing on Data.gov.
Please contact Data.gov if you are hosting a large archive. We are exploring ways to support and coordinate archiving efforts.