Susan Walsh wrote Between the Spreadsheets: Classifying and Fixing Dirty Data available now from ALA Publications. My interview with her is below.
Susan is Founder and Managing Director of The Classification Guru, a specialist data classification, taxonomy customization and data cleansing consultancy.
Lauren: Briefly summarize Between the Spreadsheets: Classifying and Fixing Dirty Data for our readers.
Susan: Dirty data is a problem that costs businesses thousands, if not millions, every year. In organizations large and small across the globe you will hear talk of data quality issues. What you will rarely hear about is the consequences or how to fix it.
Between the Spreadsheets: Classifying and Fixing Dirty Data draws on classification expert Susan Walsh’s decade of experience in data classification to present a fool-proof method for cleaning and classifying your data.
The book covers everything from the very basics of data classification to normalization and taxonomies, and presents the author’s proven COAT methodology, helping ensure an organization’s data is Consistent, Organized, Accurate and Trustworthy.
Lauren: Why did you decide to write this book?
Susan: My goal is to make data relatable and easy to understand for everyone— not just data professionals, because clean data is the foundation of every successful business, and I believe everyone should have the tools to correctly manage and maintain their data.
Lauren: What changes have you seen occur with data over the years?
Susan: There’s more dirty data out there than ever, coupled with more data in business and more people working in data but there is still a major lack of training and awareness in this space. Thankfully, I’m here to save the day with my book and services! I also offer a huge range of resources online and on my social media channels to help people more effectively manage their data.
Lauren: What are two main points you hope readers takeaway?
Susan: After reading this book, regardless of your level of experience, not only will you be able to work with your data more efficiently, but you will also understand the impact the work you do with it has, and how it affects the rest of the organisation.
Lauren: How do you hope librarians use the book?
Susan: Whether it’s categorising spend data, or categorising books… it’s still data! The COAT methodology is particularly helpful when you have multiple people doing the same job. It makes sure everything is Consistent, Organised, Accurate and Trustworthy – i.e. you can find the books! The information in ‘Between the Spreadsheets’ would also be incredibly helpful for managing a database of borrowers!
Lauren: For planning purposes, what tips do you have for librarians who want to avoid dirty data in the future?
Susan: Let me introduce you to the data COAT!
Just like a real coat, the data COAT protects your data from all the elements, but you have to keep it on all year round and look after it, or it won’t protect your data.
First of all, there’s C for Consistent.
Make sure your whole organization is using the same terms, standards and processes, this is a great start for minimizing errors, and if everyone understands their role and responsibilities, it will be much easier to get everyone working to the same standards. Why not create a list of dirty data for your own team to be aware of that is relevant to your own work?
Following that is O for Organized.
Organize your data however you need it, whether that’s by country, region, division or business unit, have your data categorized with that at all times, so you can find what you need quickly and efficiently without having to join data each time.
Then there’s A for Accurate.
This is vital. But depending on where you work and the type of data you use, the definition of accurate can be different, so agree that with your team and set out the parameters.
And that takes us to T for Trustworthy.
Once you have consistent, organized and accurate data, you have trustworthy data. That means you can make better business decisions and confidently report back to the business. And you can take the credit for it too, you have my permission…
Lauren: Is there anything else you would like to share?
Susan: Yes! You have to keep that data COAT on through regular maintenance! You need to check things have been categorized correctly as it can change over time. Updates and changes mean that before you know it, your once neat and tidy data sets will contain unclassified data, data that’s been incorrectly classified, typos and cut & paste errors to name but a few.
That is why—and I cannot emphasize this enough—it is so, so important to maintain your data. Specifically, it is really important that you continue to check and maintain your data for any errors that can have a knock-on effect to your bottom line – and follow the COAT process for updating new data!
Lauren Hays, PhD, is an Assistant Professor of Instructional Technology at the University of Central Missouri, and a frequent presenter and interviewer on topics related to libraries and librarianship. Her expertise includes information literacy, educational technology, and library and information science education. Please read Lauren’s other posts relevant to special librarians. And take a look at Lucidea’s powerful integrated library systems, SydneyEnterprise, and GeniePlus, used daily by innovative special librarians in libraries of all types, sizes and budgets.
Never miss another post. Subscribe today!
New book for librarians on conducting original research offers detailed research methods and shows how to run and interpret statistical tests.
Interview with the editors of Introduction to Law Librarianship, the first and only open source textbook for the profession.
Interview with the editors of a new book that offers insight into information literacy needs in the workplace as technology evolves
Interview with the authors of A History of Medical Libraries and Medical Librarianship, with a view to the future of the profession in the digital era.