Ask Me Anything (AMA): How to Conduct Efficient Data Cleanup
Rachael Cristine Woody
Continuing our follow up to the “Ask Me Anything: How to Prepare for a New CMS” webinar, this post will address questions we received in order to provide additional insight, strategies, and resources regarding data clean up.
Today’s post is about how to conduct efficient data cleanup in preparation for a new museum Collections Management System (CMS). Specifically, questions regarding Microsoft Excel and similar data query and transformation tools.
Q. Are there any recommended queries to run or tools to use in Excel to help streamline the process?
First, begin with the tried-and-true Excel tools. If you’re not familiar with Excel or its more advanced functions, testing with a set of copied or dummy data will allow you to execute functions and see their effect. A lot of data transformation experts start with and continue to use test data and experimentation to hone their efficacy in data remediation (aka cleanup).
Option 1: Excel Spreadsheet
- Freeze views: will allow you to freeze columns and/or rows so that you can compare expansive data with identifying data.
- Filters: will support the filtering of data so that you can focus on specific areas of cleanup where the cleanup needs are likely the same.
- Find and replace: will help you both find and (if desired) replace data that may be inaccurate or incomplete. This is a huge timesaver for repeat data that needs to be updated.
- Remove duplicates: will help identify and give you the option to remove duplicative data.
- Sorting: will allow you to sort data into sets that share similar characteristics.
- Text to columns: will allow you to split up multiple values of data within one cell (aka catalog field).
If you’re more familiar with Excel or wish to learn about some of the more powerful functionality, then launch Excel Power Query Editor from the Excel menu and select your data spreadsheet as the source. There are hundreds, if not thousands, of helpful instructions, forums, and how to videos on all available functions. In addition to playing with test data, it’s worth learning more about each of these functions as they can do quite a bit of heavy lifting in the data remediation department.
Option 2: Excel Power Query Editor
- Create a step or query script: A complex multi-step cleanup process can be tracked and augmented at any step, as well as be saved and replicated with other data sets.
- Change the data type: will allow you to select a different type of data format, such as text, numbers, currency, date, etc.
- Group data: will support the gathering of similar data based on the isolated characteristics of the data.
- Mark data as a key: will allow the identification of a key that holds a relationship between data tables.
- Mass adjust formatting or cleanse text: will force or correct a data format change or remove control characters from the text value.
- Split data with selection of how: will allow you to split up multiple values of data within one cell (aka catalog field); however, with more controlled nuance than available in basic Excel. It will also allow you to extract text before, between, or after delimiters.
- Merge columns with selection of how: Similarly, similar data spread across several cells can be joined with a dictation of how.
- Merge queries with different join kinds: will allow the specification of which join criteria can be imposed with merging data from different tables of data.
Q. Can you recommend programs that help conduct efficient data cleanup?
In addition to Excel, consider Open Refine—an open-source tool. The functionality is somewhat similar to Excel; however there are nuances in how the data can be isolated and controlled, and you may find one tool isa better fit for your work than another.
Open Refine functionality includes:
- Filters: will support the filtering of data so that you can focus on specific areas of cleanup where the cleanup needs are likely the same.
- Cell formatting: will allow you to select a different type of data format, such as text, numbers, currency, date, etc.
- Trimming whitespace: will trim invisible but troublesome white spaces, such as extra spaces within the data.
- Identical text changes: supports the identification of similar data and offering a blanket change to all related data, not just the cell you’re working on.
- Grouping by facet: will support the gathering of similar data based on the isolated characteristics of the data.
- Split or join multi-valued cells: will allow you to split up multiple values of data within one cell (aka catalog field); however, with more controlled nuance than available in basic Excel. Similarly, similar data spread across several cells can be joined.
- Cluster and edit: facilitates the suggesting clustering of data for mass reconciliation.
- Suggested reconciliation: informed by the data, this function recommends potential data reconciliation.
Conclusion
Data cleanup is perhaps the hardest part of preparing for migration to a new CMS. It can take a long time and a lot of your focused attention. However, with the right tools the cleanup process can be made lighter. And, you may be surprised just how handy these data cleanup tools are for other areas of your spreadsheet usage.
Additional Reading
This is part of an Ask Me Anything series, a capstone to months’ worth of posts and webinars. For more details on the content we’ve covered, you may find the following Think Clearly posts of interest:
Common Museum Data Messes to Look For
Create a Plan for Museum Data Cleanup
How to Prepare for Museum Data Migration
Items Required for Successful Data Migration
Make a Plan for Non-Standard Museum Data
Rachael Cristine Woody
Rachael Woody advises on museum strategies, digital museums, collections management, and grant writing for a wide variety of clients. She has authored several titles published by Lucidea Press, including Museum Digital Projects and You. Where to Begin? Rachael is a regular contributor to the Think Clearly blog and a popular presenter.
Never miss another post. Subscribe today!
Similar Posts
Storytelling to Inspire Reflection Using Museum Collections Online
Storytelling with online collections is impactful, whether we choose online-only or as part of a hybrid approach to museum exhibitions.
Museum Collections Online: Learning Through Storytelling
Digitizing museum collections introduces new and engaging opportunities for storytelling. By leveraging digital surrogates—essentially online representations of physical objects—museums can enhance how they present narratives and information to audiences.
The Role of Museum Collections Online in Storytelling & Audience Engagement
Storytelling with museum collections online allows for a great degree of flexibility, offers additional detail, and lends a dynamism that is difficult to produce within a physical exhibition.
Examples of How Archives Can Be Used to Elevate Museum Collections
Last week we reviewed how archives can enhance museum collections online. This week will continue our work with an examination of specific examples, including what items different types of archives may contain and where to capture this data.
Leave a Comment
Comments are reviewed and must adhere to our comments policy.
0 Comments