Interview with Susan Walsh:
Dirty Data, AI, and the 2nd Edition of “Between the Spreadsheets”
Lauren Hays
Susan Walsh, widely known as “The Classification Guru,” is the author of Between the Spreadsheets: Classifying and Fixing Dirty Data. Facet Publishing has just released a second edition of the book, which expands on her COAT framework for evaluating and managing data.
I first interviewed Susan in 2022 about the original edition, and we recently reconnected to discuss how the world of data classification has changed and why she felt it was time for an updated version of Between the Spreadsheets.
In the following interview, she shares what’s new in this edition, how generative AI is reshaping the field, and why clean data matters more than ever for information professionals, including special librarians.
Please introduce yourself to our readers.
I am Susan Walsh, also known as “The Classification Guru, Fixer of Dirty Data.” I am the founder and MD of The Classification Guru Ltd, where we specialize in cleaning, classifying, and transforming messy data. While we started working mainly with procurement, our services now span various departments from finance to marketing—whoever needs us. There is dirty data everywhere!
I am passionate about clean and accurate data, and I created the COAT framework (making your data Consistent, Organized, Accurate, and Trustworthy) to help organizations manage their data more effectively. I am also a global speaker, TEDx presenter, course trainer for Pluralsight and O’Reilly, and the author of Between the Spreadsheets: Classifying and Fixing Dirty Data, with a 2nd edition that has just come out, and a new Optimizing Sales & Marketing Data book coming out in 2026.
Briefly summarize Between the Spreadsheets: Classifying and Fixing Dirty Data.
Between the Spreadsheets is a practical and accessible guide to cleaning and classifying dirty data. It draws on my years of experience fixing real-world data problems and aims to demystify the process with straightforward advice, relatable examples, and a bit of humor. The book introduces my COAT framework, Consistent, Organized, Accurate, and Trustworthy, as a way to evaluate and improve data quality. It is designed for anyone who works with data, taking you through the whole process from the why to the how, benefitting the reader whether they are a seasoned analyst or just starting out.
Why is data important for special librarians?
Special librarians are information professionals, and data is just another form of information, one that increasingly underpins decision-making, research, and operations. Clean, well-structured data helps librarians manage collections, assess usage, support evidence-based decisions, and even advocate for funding. Poor-quality data can lead to misinformation, missed opportunities, and inefficiencies. In today’s digital world, being data-savvy is a key skill for special librarians.
Why did you decide to update the book?
Since the first edition was published in 2021, the data landscape has evolved dramatically, especially with the rise of generative AI, which did not even exist back then. Moreover, there is a growing awareness of data ethics, governance, and automation. I wanted the book to reflect these changes and stay relevant to both new and returning readers. I have learned a lot through my continued work and wanted to share those insights and case studies of projects I have worked on so others can learn from them.
What is new in this second edition?
The second edition expands on the original content with deeper dives into data quality challenges, e.g., a new chapter dedicated to the impact of AI and automation, showing examples of where AI does and does not work. There is also a brand new chapter with two case studies where I walk through our whole process, and even some brand new data horror stories. The tone remains practical and engaging, but the scope is broader to reflect how the data world has grown.
How is generative AI impacting data? How do you see it affecting data in the future?
Honestly, in many instances, it is making data more dirty! Generative AI is both a blessing and a curse for data. This happens when it has learned from unclean data sets and has not had a person check the data before using it for training.
There are, of course, areas where it is far more successful than others, and I cover this with examples in the new book. For example, it’s making it easier to automate processes, including some data cleansing and transformation; however, we are not there yet in the area of spend data classification.
Looking ahead, I see AI becoming more integrated into everyday data work, with agentic AI tools helping manage and maintain data quality proactively. Human oversight will remain critical; AI is a tool, not a substitute for understanding your data’s context.
Is there anything else you would like to share?
Yes! Make sure your data has its COAT on; it needs to be Consistent, Organized, Accurate, and Trustworthy. I always say data does not have to be boring, and whether you are a librarian, analyst, or business leader, clean data is empowering. It can drive better decisions, save time and money, and reduce frustration. So, embrace the messy spreadsheets; there is power in fixing them. If you need help, there’s a whole community of data professionals (including me!) ready to support you.
Lauren Hays
Librarian Dr. Lauren Hays is an Associate Professor of Instructional Technology at the University of Central Missouri, and a frequent presenter and interviewer on topics related to libraries and librarianship. Please read Lauren’s other posts relevant to special librarians. Learn about Lucidea’s powerful integrated library system, SydneyDigital.
**Disclaimer: Any in-line promotional text does not imply Lucidea product endorsement by the author of this post.
Never miss another post. Subscribe today!
Similar Posts
Interview with the Author: Angela Fritz on AI and Digital Leadership in the GLAM Sector
Angela Fritz, author of AI and Digital Leadership, on ethical AI, stewardship, AI literacy, and the leadership skills GLAM institutions need as AI tools evolve.
Balancing Human Oversight with AI: Tips for Special Librarians
Special librarians can use AI without losing expert control. Use this practical checklist to verify accuracy, bias, sources, licensing, and fit.
End-of-Calendar-Year Reflections for School Librarians
The end of the calendar year offers school librarians a chance to reflect on what’s working, make thoughtful adjustments, and plan for the year ahead.
Library Instruction: Learning Styles Are Out, Evidence-Based Practices Are In
For instructors and educators of all types, it’s vital to realize that evidence-based practices are more effective than catering to the myth of learning styles.
Leave a Comment
Comments are reviewed and must adhere to our comments policy.
0 Comments