15 Responses to “Cleaning Up Metadata Messiness”

  1. ted.habermann@noaa.gov

    Consistency in names across metadata collections is very important for the reasons that Chris describes. An approach that has worked fairly well in the Global Change Community was developed at NASA’a Global Change Master Directory (GCMD).  It merges a slash delimited list of acronyms with spelled out names of agencies and groups within them. For example, the GCMD name for NOAA’s Natioanl Geophysical Data Center is DOC/NOAA/NESDIS/NGDC > National Geophysical Data Center, NESDIS, NOAA, U.S. Department of Commerce.  This approach acknowledges the structure of Federal Agencies while supporting acronym and full-text searches. GCMD uses this approach with other keyword sets as well.

    • Chris Taylor

      This works for those agencies that have been around for a while and haven’t had a name or total reorg recently. The last time I looked BSEE and BOEM were not on the list (its been about a year since I looked though.) But even if it is there, should there be a cross reference to the old name(s) so that when someone searches BOEM and specifies a date range predating the actual change that they should also get records for BOEMRE and MMS?

  2. williamlerwin@gmail.com

    Chris – have you looked at FIPS 95-1, Codes for the Identification of Federal and Federally-Assisted Organizations?

  3. scarebearsclan2@gmail.com

    Faceted search is highly dependent on metadata quality. In order for us to be able to provide a way to filter results by agency, we need a standard way of describing agency names, or a way to map the different labels representing the same thing.

  4. muasamvui.com@gmail.com

    Improving search is so important to help users finding data more quickly.

  5. Anonymous

    As Chris notes one of the problems is in the naming conventions. Unless globaly we conform to a particular set of data we will never achieve perfection. And to top that off you have to spell it correctly. That is the biggest problem with boolean searches. Things are getting better but words by themselves have no real meaning. There is no "data" behind them. This is a huge challenge especially in hour health care system. Keep up the research!

    Bob@scconcierge.net
    http://scconcierge.net

     

  6. Anonymous

    This community will help to everyone to collect the data. This is great resource.

  7. Anonymous

    The structure of metadata needs to be standardized for this to become more efficient and useable.

    -Jay

  8. Anonymous

    I really appreciate the work you guys are doing!

  9. Anonymous

    it is very difficult to get data from government site. is it possible to collect data from govt sites?

  10. Anonymous

    I will say that its great having ready access to such large data sets.  I just wish that there was an API or some other way for developers to access it.  That way we could easily build a website and make it available to thousands of users.

  11. Anonymous

    Faceted search is highly dependent on metadata quality.And to top that off you have to spell it correctly. That is the biggest problem with boolean searches

  12. Anonymous

    I wonder if one way of creating a standard for agency names (and thus data coming from them) is to model on NYSE and NASDAQ by having unique 4 or 5 letter tickers…USGS and NOAA for example are set up already. Of course it would require that sub-sets of those agencies get on board.

     

  13. isangil@lternet.edu

    Structure of the metadata can help just that much.  Even if you were to adopt a more granular standard you will run into trouble — it is all about how we populate the structure.

    Instead of one placeholder for an organization name, or for an individual (the <origin> or <publisher> placeholders in the FGDC's profiles, and ESRI schemas) you may use children elements to help aleviate ambiguitiy — but still — metadata providers often submit erroneous information, let alone use best practices or simple conventions (like conventions to designate the reference to a federal agency)

    To avert this problem, some of us used controlled names and even synomyns (in Drupal taxonomies). We curtail the "creativity" in naming agencies, but we end up with a more consistent metadata records. 

    Curating legacy metadata is also 'fun'.  We faced this problem while building the Oak Ridge based metadata repository – based in Mercury.  The "origin" field contained every possible interpretation of what a "record creator" can accept.  See this excerpt:

    U.S. Fish & Wildlife Service,    1
    U.S. Fish & Wildlife Service, National Wetlands Inventory    162
    U.S. Fish & Wildlife Service, National Wetlands Inventory (NWI)    1
    U.S. Fish & Wildlife Service, Refuges & Wildlife, Region 6    3
    U.S. Fish & Wildlife Service, Refuges & Wildlife, Region 6, Division of Realty    91
    U.S. Fish & Wildlife Service, Refuges & Wildlife, Region 6, Division of Realty (Realty)    1
    U.S. Fish & Wildlife Service, Refuges and Wildlife, Region 6    1
    U.S. Fish & Wildlife Service, Refuges and Wildlife, Region 6, Division of Planning originated the Categorical Exclusion used for the August 2004 update of the Red Rock Lakes NWR digital approved acquisition boundary    1
    U.S. Fish & Wildlife Service, Refuges and Wildlife, Region 6, Division of Realty    67
    U.S. Fish & Wildlife Service, Refuges and Wildlife, Region 6, Division of Realty.    32
    U.S. Fish & Wildlife Service, Refuges and Wildlife, Region 6, Division of Realty. Deeds and other documents and maps with
    [[……]]

    the number to the right is the occurrences of the particular instance in the main node of the clearinghouse, circa 2008.

    This was and is a big problem, we took a stab a it, but the USGS project folded unceremonoiusly (NBII anyone?)

    I feel your pain, our Drupal based solutions attempt to reduce this entropy of naming, with some success.

    Cheers, inigo

  14. Michael Ierardi

    Chris:
    Thanks for the analysis. A little analysis can go a long way… and communication among users is a powerful tool. I’m not surprised most Fed agencies are using the FGDC compliant schema, since they probably have been utilizing metadata a bit longer than most other agencies. I will implement a consistent naming convention for the tag throughout our metadata to help eliminate the problem with syncing to data.gov. Though my metadata changes are immediate on the node – my concerns are how well your harvesting mechanism on geodata will pick up the changes. Not sure how you tracked me down via this email list, but be sure to keep me informed and updated on any new metadata status you come across. I have implemented an internal Waterportal and through this interface I have identified existing concerns within some of our historical metadata created 2-3 metadata master back that we are updating.

Comments are closed.