Maintaining Quality Metadata: toward effective metadata management.
University of North Texas library Digital Initiatives program.
Started a number of collaborative projects with other state libraries and museums.
Congressional Research service archive
World War Poster Archive
Electronic Theses and Dissertations
http://www.library.unt.edu for others.
Built a metadata management sustem to establish a standard for all of these different tools. Started with a qualified Dublin Core based descriptive metadata.
Saw two aspects of digital library data quality:
- The quality of the data in the objects themselves
- The quality of the metadata associated with the objects.
Focus on metadata quality.
Poor metadta quality:
- Poor recall
- Poor precision, Inconsistency of search results
Most common errors
- Incorrect data: due to letter transposition, letter omission, letter insertion, letter substitution or mis-strokes.
- Missing data: elements and values not present at all.
- Local requirements
- Objects heterogeneity
- Very diverse user group
- Digital rights issues
- Training issues - necessary expertise to create and manage metadata
All of these issues led to a metadata quality assurance mechanisms and tools.
Two main tools:
pre injust - metadata creation tools, validate the mandatory elements, establish templates, use a controlled vocabularies (UNTLBS)
These templates are applied from the outside onto the content. Uses a web template creator, a outside set of rules and ability to link each field out to guidelines for how to apply them.
Have different templates for each collection, and a set of 5 attributes that are applied to all entries regardless of the collection.
post injust - metadata valuation tools. These tools examine all items in a collection and identify which metadata fields that are not being completed. This allows them to target training and so on. These tools give them a look at how the metadata is being applied. Depending on the quality of the metadata they can establish different services on each metadata fields. Areas tracked are null values, records added per time period, by a clickable map of Texas, clicking on area brings back all items submitted in that area, tag cloud based on terms.
To implement this, be sure to look at the level of quality required, the nature of your gap and how to close it, compromise when needed.
Total size of collection is ~90,000 items.