Ask Me Anything (AMA): How to Conduct Efficient Data Cleanup

Rachael Cristine Woody

Rachael Cristine Woody

January 17, 2024

Continuing our follow up to the “Ask Me Anything: How to Prepare for a New CMS” webinar, this post will address questions we received in order to provide additional insight, strategies, and resources regarding data clean up. 

Today’s post is about how to conduct efficient data cleanup in preparation for a new museum Collections Management System (CMS). Specifically, questions regarding Microsoft Excel and similar data query and transformation tools.

Q. Are there any recommended queries to run or tools to use in Excel to help streamline the process?

First, begin with the tried-and-true Excel tools. If you’re not familiar with Excel or its more advanced functions, testing with a set of copied or dummy data will allow you to execute functions and see their effect. A lot of data transformation experts start with and continue to use test data and experimentation to hone their efficacy in data remediation (aka cleanup).

Option 1: Excel Spreadsheet

  • Freeze views: will allow you to freeze columns and/or rows so that you can compare expansive data with identifying data.
  • Filters: will support the filtering of data so that you can focus on specific areas of cleanup where the cleanup needs are likely the same.
  • Find and replace: will help you both find and (if desired) replace data that may be inaccurate or incomplete. This is a huge timesaver for repeat data that needs to be updated.
  • Remove duplicates: will help identify and give you the option to remove duplicative data.
  • Sorting: will allow you to sort data into sets that share similar characteristics.
  • Text to columns: will allow you to split up multiple values of data within one cell (aka catalog field).

If you’re more familiar with Excel or wish to learn about some of the more powerful functionality, then launch Excel Power Query Editor from the Excel menu and select your data spreadsheet as the source. There are hundreds, if not thousands, of helpful instructions, forums, and how to videos on all available functions. In addition to playing with test data, it’s worth learning more about each of these functions as they can do quite a bit of heavy lifting in the data remediation department.

Option 2: Excel Power Query Editor

  • Create a step or query script: A complex multi-step cleanup process can be tracked and augmented at any step, as well as be saved and replicated with other data sets.
  • Change the data type: will allow you to select a different type of data format, such as text, numbers, currency, date, etc.
  • Group data: will support the gathering of similar data based on the isolated characteristics of the data.
  • Mark data as a key: will allow the identification of a key that holds a relationship between data tables.
  • Mass adjust formatting or cleanse text: will force or correct a data format change or remove control characters from the text value.
  • Split data with selection of how: will allow you to split up multiple values of data within one cell (aka catalog field); however, with more controlled nuance than available in basic Excel. It will also allow you to extract text before, between, or after delimiters.
  • Merge columns with selection of how: Similarly, similar data spread across several cells can be joined with a dictation of how.
  • Merge queries with different join kinds: will allow the specification of which join criteria can be imposed with merging data from different tables of data.

Q. Can you recommend programs that help conduct efficient data cleanup?

In addition to Excel, consider Open Refine—an open-source tool. The functionality is somewhat similar to Excel; however there are nuances in how the data can be isolated and controlled, and you may find one tool isa better fit for your work than another.

Open Refine functionality includes:

  • Filters: will support the filtering of data so that you can focus on specific areas of cleanup where the cleanup needs are likely the same.
  • Cell formatting: will allow you to select a different type of data format, such as text, numbers, currency, date, etc.
  • Trimming whitespace: will trim invisible but troublesome white spaces, such as extra spaces within the data.
  • Identical text changes: supports the identification of similar data and offering a blanket change to all related data, not just the cell you’re working on.
  • Grouping by facet: will support the gathering of similar data based on the isolated characteristics of the data.
  • Split or join multi-valued cells: will allow you to split up multiple values of data within one cell (aka catalog field); however, with more controlled nuance than available in basic Excel. Similarly, similar data spread across several cells can be joined.
  • Cluster and edit: facilitates the suggesting clustering of data for mass reconciliation.
  • Suggested reconciliation: informed by the data, this function recommends potential data reconciliation.

Conclusion

Data cleanup is perhaps the hardest part of preparing for migration to a new CMS. It can take a long time and a lot of your focused attention. However, with the right tools the cleanup process can be made lighter. And, you may be surprised just how handy these data cleanup tools are for other areas of your spreadsheet usage.

Additional Reading

This is part of an Ask Me Anything series, a capstone to months’ worth of posts and webinars. For more details on the content we’ve covered, you may find the following Think Clearly posts of interest: 

Common Museum Data Messes to Look For

Create a Plan for Museum Data Cleanup

How to Prepare for Museum Data Migration

Items Required for Successful Data Migration

Make a Plan for Non-Standard Museum Data

Strategies for Effective Museum Data Cleanup – Part 1

Strategies for Effective Museum Data Cleanup – Part 2

Rachael Cristine Woody

Rachael Cristine Woody

Rachael Woody advises on museum strategies, digital museums, collections management, and grant writing for a wide variety of clients. She has authored several titles published by Lucidea Press, including Museum Digital Projects and You. Where to Begin? Rachael is a regular contributor to the Think Clearly blog and a popular presenter.

Similar Posts

Leave a Comment

Comments are reviewed and must adhere to our comments policy.

0 Comments

Pin It on Pinterest

Share This