Ask Me Anything (AMA): How to Conduct Efficient Data Cleanup
Rachael Cristine Woody
Continuing our follow up to the “Ask Me Anything: How to Prepare for a New CMS” webinar, this post will address questions we received in order to provide additional insight, strategies, and resources regarding data clean up.
Today’s post is about how to conduct efficient data cleanup in preparation for a new museum Collections Management System (CMS). Specifically, questions regarding Microsoft Excel and similar data query and transformation tools.
Q. Are there any recommended queries to run or tools to use in Excel to help streamline the process?
First, begin with the tried-and-true Excel tools. If you’re not familiar with Excel or its more advanced functions, testing with a set of copied or dummy data will allow you to execute functions and see their effect. A lot of data transformation experts start with and continue to use test data and experimentation to hone their efficacy in data remediation (aka cleanup).
Option 1: Excel Spreadsheet
- Freeze views: will allow you to freeze columns and/or rows so that you can compare expansive data with identifying data.
- Filters: will support the filtering of data so that you can focus on specific areas of cleanup where the cleanup needs are likely the same.
- Find and replace: will help you both find and (if desired) replace data that may be inaccurate or incomplete. This is a huge timesaver for repeat data that needs to be updated.
- Remove duplicates: will help identify and give you the option to remove duplicative data.
- Sorting: will allow you to sort data into sets that share similar characteristics.
- Text to columns: will allow you to split up multiple values of data within one cell (aka catalog field).
If you’re more familiar with Excel or wish to learn about some of the more powerful functionality, then launch Excel Power Query Editor from the Excel menu and select your data spreadsheet as the source. There are hundreds, if not thousands, of helpful instructions, forums, and how to videos on all available functions. In addition to playing with test data, it’s worth learning more about each of these functions as they can do quite a bit of heavy lifting in the data remediation department.
Option 2: Excel Power Query Editor
- Create a step or query script: A complex multi-step cleanup process can be tracked and augmented at any step, as well as be saved and replicated with other data sets.
- Change the data type: will allow you to select a different type of data format, such as text, numbers, currency, date, etc.
- Group data: will support the gathering of similar data based on the isolated characteristics of the data.
- Mark data as a key: will allow the identification of a key that holds a relationship between data tables.
- Mass adjust formatting or cleanse text: will force or correct a data format change or remove control characters from the text value.
- Split data with selection of how: will allow you to split up multiple values of data within one cell (aka catalog field); however, with more controlled nuance than available in basic Excel. It will also allow you to extract text before, between, or after delimiters.
- Merge columns with selection of how: Similarly, similar data spread across several cells can be joined with a dictation of how.
- Merge queries with different join kinds: will allow the specification of which join criteria can be imposed with merging data from different tables of data.
Q. Can you recommend programs that help conduct efficient data cleanup?
In addition to Excel, consider Open Refine—an open-source tool. The functionality is somewhat similar to Excel; however there are nuances in how the data can be isolated and controlled, and you may find one tool isa better fit for your work than another.
Open Refine functionality includes:
- Filters: will support the filtering of data so that you can focus on specific areas of cleanup where the cleanup needs are likely the same.
- Cell formatting: will allow you to select a different type of data format, such as text, numbers, currency, date, etc.
- Trimming whitespace: will trim invisible but troublesome white spaces, such as extra spaces within the data.
- Identical text changes: supports the identification of similar data and offering a blanket change to all related data, not just the cell you’re working on.
- Grouping by facet: will support the gathering of similar data based on the isolated characteristics of the data.
- Split or join multi-valued cells: will allow you to split up multiple values of data within one cell (aka catalog field); however, with more controlled nuance than available in basic Excel. Similarly, similar data spread across several cells can be joined.
- Cluster and edit: facilitates the suggesting clustering of data for mass reconciliation.
- Suggested reconciliation: informed by the data, this function recommends potential data reconciliation.
Conclusion
Data cleanup is perhaps the hardest part of preparing for migration to a new CMS. It can take a long time and a lot of your focused attention. However, with the right tools the cleanup process can be made lighter. And, you may be surprised just how handy these data cleanup tools are for other areas of your spreadsheet usage.
Additional Reading
This is part of an Ask Me Anything series, a capstone to months’ worth of posts and webinars. For more details on the content we’ve covered, you may find the following Think Clearly posts of interest:
Common Museum Data Messes to Look For
Create a Plan for Museum Data Cleanup
How to Prepare for Museum Data Migration
Items Required for Successful Data Migration
Make a Plan for Non-Standard Museum Data
Rachael Cristine Woody
Rachael Woody advises on museum strategies, digital museums, collections management, and grant writing for a wide variety of clients. She has authored several titles published by Lucidea Press, including Museum Digital Projects and You. Where to Begin? Rachael is a regular contributor to the Think Clearly blog and a popular presenter.
Never miss another post. Subscribe today!
Similar Posts
Demand-Driven Museum Story Inspiration
Let’s look at the numbers and consider taking inspiration from what our visitors are most interested in. This demand-driven approach can be incredibly informative and can offer fodder for stories in perpetuity.
Reimagining “Impossible” Museum Exhibitions as Story Inspiration
Online exhibits lack the constraints that can make it impossible to relate stories in a physical exhibition and can inspire us to share in new ways
Exhibitions of Past, Present, and Future as Story Inspiration
In a previous post, I covered how the flexibility of collections online can inspire online storytelling. This week we’ll tap into another source of storytelling inspiration: exhibitions of the past, present, and future.
Storytelling to Inspire Reflection Using Museum Collections Online
Storytelling with online collections is impactful, whether we choose online-only or as part of a hybrid approach to museum exhibitions.
Leave a Comment
Comments are reviewed and must adhere to our comments policy.
0 Comments