Data remediation is one of the most important tasks we perform in our museum collection management work. Data remediation is the process of updating, correcting, or organizing data so that it can achieve its intended purpose.
This means the data already exists in a spreadsheet, database, or Collections Management System (CMS) but is missing critical elements, is inaccurate, or is in the wrong spot. We rely on data to inform the decisions we make regarding collections care, management, display, and access. When data is missing, incorrect, or not where it is supposed to be it affects our work and can put the collections at risk of damage or loss.
Data Schema and Content Standards
As museum professionals, we rely on data schema and data content standards to dictate what fields we use, which data goes to which field, and how the data should be entered into each field. It is important to be familiar with the data schema and content standards your museum uses; these are typically found in the museum CMS. The Dublin Core data schema is the most universally adopted data schema with its straightforward approach to capturing core data across all collection types. It is also the de-facto standard schema to support Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)—the ability to have collection data “harvested” and placed in a collaborative portal. As for data content standards, the Cataloging Cultural Objects (CCO) guide is commonly used to dictate descriptive standards for museum artifacts. With these two standards in mind, we can now review the most important fields in the museum catalog and the common messes we find within them.
Identifier: This is a unique identifier field. This field is intended to capture a unique identification number tied to the object being described.
Common Messes: Missing data, inaccurate or obsolete accession numbers.
Publisher: The name of the archives, heritage organization, museum, or historical society where the materials reside on a permanent basis.
Common Messes: Informal or shortened names for the organization.
Title: This is the title or name of the object. Some repositories choose to use journalism-style titling, while others prefer simple titles.
Common Messes: Inconsistent title style creation, inaccurate titles, titles that are descriptive sentences, vague titles.
Creator: The name(s) of the people and/or organization responsible for creating the item being described. Depending on the descriptive standards your heritage organization follows the creator name can be Last Name, First Name or First Name Last Name, or both; followed by the birth and death dates of the creator (if a person and if known).
Common Messes: Missing data. If the creator is unknown then “Unknown” should be used. Inconsistent creator name formation, multiple creator name variations (same creator, multiple name authority records).
Date: The date span represented by the earliest and latest dates of creation for the item being described. If the date is unknown, review the item for any context clues that could indicate an era or date approximation. The most common expressions are YYYY-MM-DD and MM-DD-YYYY. If the use of an approximate date is needed then circa should precede the estimated date, for example: circa 1990s.
Common Messes: Missing data especially when a “circa” date can be determined.
Type: The type of item or object classification, typically a hierarchical vocabulary.
Common Messes: No classification selected. There should always be a classification selected as this is obvious data that can be discerned from reviewing the object.
Format: The format of the item being described along with any accompanying dimensions.
Common Messes: Missing data. At the very least, dimensions should be captured.
Description: A description of the subjects represented within the item, the historical or cultural significance, the predominant formats of the materials used, and an overall sense of size.
Common Messes: Missing, inaccurate, or incomplete data.
Language of Materials: Indicate (as appropriate) the language(s) found within the item.
Common Messes: Missing data. Even if it seems irrelevant, any viewable language should be captured.
Subject: Select subjects that help to describe the item and assist external audiences to find the item via browsing by subject. The subject area is intended to capture the subject matter of the content depicted by the image or in the document, as well as the format of the material. Subjects are usually a mix of hierarchical controlled vocabularies (e.g. Nomenclature) as well as local vocabulary terms.
Common Messes: No subjects selected, or so few and broad subjects that they are minimally helpful in discovery searches.
Rights Management: The rights statement specific to the item or collection and the ways external audiences may or may not use it. For example, copyright or Creative Commons licenses.
Common Messes: No rights statement provided. There is language for every imaginable rights situation so there should always be a rights statement provided.
Relation: A link or citation of other collection items related to the item being described.
Common Messes: Missing links to data that exists both inside the home museum as well as related peer museums. This is the most often skipped required field.
In order to be the most efficient at our jobs we require accurate data. While data remediation can feel laborious or overwhelming, there are strategies we can use to assist us in this process. Next week will begin a review of strategies you can use to start your data cleanup.
Rachael Cristine Woody
If you’d like to learn more, please join us for “Evaluating the Shape of Museum Data”, presented by Rachael Woody TODAY, April 5, 2023 at 11 a.m. Pacific, 2 p.m. Eastern. (Can’t make it? Register anyway and we will send you a link to the recording and slides afterwards). Register now or call 604-278-6717.
Never miss another post. Subscribe today!
When starting a museum collection cataloging project, it is critically important to prioritize which sections of the collection to catalog first
Museum cataloging strategy will outline the collection information you have available, known resources, and available tools.
Guidance on initial steps in cataloging your museum’s collection; what information exists, what resources are available, what practices to implement.
The potential for AI to influence museum work in data creation and cleanup is great. However, as with all new tools, it will take time to learn.
Enjoy all of the benefits of your Lucidea solution with secure, reliable, stress free hosting
Programs & incentives
No matter your size or budget, we’ve got you covered, today and tomorrow