Lucidea logo - click here for homepage

Interview with Susan Walsh:
Dirty Data, AI, and the 2nd Edition of “Between the Spreadsheets”

Lauren Hays

Lauren Hays

September 23, 2025

Susan Walsh, widely known as “The Classification Guru,” is the author of Between the Spreadsheets: Classifying and Fixing Dirty Data. Facet Publishing has just released a second edition of the book, which expands on her COAT framework for evaluating and managing data.

I first interviewed Susan in 2022 about the original edition, and we recently reconnected to discuss how the world of data classification has changed and why she felt it was time for an updated version of Between the Spreadsheets.

In the following interview, she shares what’s new in this edition, how generative AI is reshaping the field, and why clean data matters more than ever for information professionals, including special librarians. 

Please introduce yourself to our readers.

I am Susan Walsh, also known as “The Classification Guru, Fixer of Dirty Data.” I am the founder and MD of The Classification Guru Ltd, where we specialize in cleaning, classifying, and transforming messy data. While we started working mainly with procurement, our services now span various departments from finance to marketing—whoever needs us. There is dirty data everywhere! 

I am passionate about clean and accurate data, and I created the COAT framework (making your data Consistent, Organized, Accurate, and Trustworthy) to help organizations manage their data more effectively. I am also a global speaker, TEDx presenter, course trainer for Pluralsight and O’Reilly, and the author of Between the Spreadsheets: Classifying and Fixing Dirty Data, with a 2nd edition that has just come out, and a new Optimizing Sales & Marketing Data book coming out in 2026. 

Briefly summarize Between the Spreadsheets: Classifying and Fixing Dirty Data.

Between the Spreadsheets is a practical and accessible guide to cleaning and classifying dirty data. It draws on my years of experience fixing real-world data problems and aims to demystify the process with straightforward advice, relatable examples, and a bit of humor. The book introduces my COAT framework, Consistent, Organized, Accurate, and Trustworthy, as a way to evaluate and improve data quality. It is designed for anyone who works with data, taking you through the whole process from the why to the how, benefitting the reader whether they are a seasoned analyst or just starting out. 

Why is data important for special librarians?

Special librarians are information professionals, and data is just another form of information, one that increasingly underpins decision-making, research, and operations. Clean, well-structured data helps librarians manage collections, assess usage, support evidence-based decisions, and even advocate for funding. Poor-quality data can lead to misinformation, missed opportunities, and inefficiencies. In today’s digital world, being data-savvy is a key skill for special librarians. 

Why did you decide to update the book?

Since the first edition was published in 2021, the data landscape has evolved dramatically, especially with the rise of generative AI, which did not even exist back then. Moreover, there is a growing awareness of data ethics, governance, and automation. I wanted the book to reflect these changes and stay relevant to both new and returning readers. I have learned a lot through my continued work and wanted to share those insights and case studies of projects I have worked on so others can learn from them. 

What is new in this second edition?

The second edition expands on the original content with deeper dives into data quality challenges, e.g., a new chapter dedicated to the impact of AI and automation, showing examples of where AI does and does not work. There is also a brand new chapter with two case studies where I walk through our whole process, and even some brand new data horror stories. The tone remains practical and engaging, but the scope is broader to reflect how the data world has grown. 

How is generative AI impacting data? How do you see it affecting data in the future?

Honestly, in many instances, it is making data more dirty! Generative AI is both a blessing and a curse for data. This happens when it has learned from unclean data sets and has not had a person check the data before using it for training.  

There are, of course, areas where it is far more successful than others, and I cover this with examples in the new book. For example, it’s making it easier to automate processes, including some data cleansing and transformation; however, we are not there yet in the area of spend data classification. 

Looking ahead, I see AI becoming more integrated into everyday data work, with agentic AI tools helping manage and maintain data quality proactively. Human oversight will remain critical; AI is a tool, not a substitute for understanding your data’s context. 

Is there anything else you would like to share?

Yes! Make sure your data has its COAT on; it needs to be Consistent, Organized, Accurate, and Trustworthy. I always say data does not have to be boring, and whether you are a librarian, analyst, or business leader, clean data is empowering. It can drive better decisions, save time and money, and reduce frustration. So, embrace the messy spreadsheets; there is power in fixing them. If you need help, there’s a whole community of data professionals (including me!) ready to support you. 

Lauren Hays

Lauren Hays

Librarian Dr. Lauren Hays is an Associate Professor of Instructional Technology at the University of Central Missouri, and a frequent presenter and interviewer on topics related to libraries and librarianship. Please read Lauren’s other posts relevant to special librarians. Learn about Lucidea’s powerful integrated library system, SydneyDigital.

**Disclaimer: Any in-line promotional text does not imply Lucidea product endorsement by the author of this post.

Similar Posts

Leave a Comment

Comments are reviewed and must adhere to our comments policy.

0 Comments

Pin It on Pinterest

Share This