An Introduction to Building Resilient Digital Collections
- The value of digital preservation and digital resiliency for museums.
- Tips for overcoming common technical and logistical challenges.
- How to build a digital resilience plan for your collections.
Read Transcription
Thank you for joining us for today’s webinar with Rachel Cristine Woody. My name is Bradley and I will be your moderator for this webinar titled An Introduction to Building Resilient Digital Collections.
Before we start, I would like to take a moment to provide some information about our company and introduce today’s presenter.
Lucidea is a software developing company specialized in museum and archival collections management solutions, as well as knowledge management and library automation systems. Our brands include Sydney, Presto, Argus, ArchivEra, Eloquent, and CuadraSTAR.
Now I would like to take a moment to introduce today’s presenter Rachel Christine Woody. Rachel is the owner of Relicura and provides services to museums, libraries, and archives. She specializes in museum collections management systems, digitization technology, digital project management, and digital usership. During the course of her career, she has successfully launched multiple digital projects that include advanced digitization technology, collaborative portals, and the migration of collection information into collection management systems. She’s also a popular guest author for Lucidea’s Think Clearly blog and has provided us with many great webinars that are listed on our website, so please feel free to check those out after today’s session. Take it away, Rachel.
Great. Thank you so much, Bradley, for that introduction, and thank you to everybody joining us and for Lucidea for hosting us today.
We’ve got a dense topic, today for us and one that I find particularly interesting and I hope you will as well. We’re gonna go a bit beyond, what I would say just digital preservation, which preservation, of course, is important, but we’re gonna get into the actual concept of building longer and more sustainable, digital resiliency.
So with that in mind, we’ll start with some of the basics. What do I mean when I say digital preservation and what do I mean when we talk about digital resiliency?
We’ll then get into what the technical and logistical challenges are when we’re thinking about managing a healthy digital collection.
Part of that is getting into digital file integrity and what we mean by that as well as digital file access.
File integrity and access are the two main pillars of creating digital resiliency.
And then we’re gonna wrap up with you building your own digital resiliency through actions like detection and repair, as well as how to protect your collections and building that resiliency plan. So we’ll go from start to finish essentially in terms of what do we mean, what are the challenges, what are we looking for, and then how can we address it.
So setting some of the basics up, as collection stewards, we are familiar with the word preservation usually first or primarily when thinking about our physical collections.
Preservation as a whole means to take care of the collection and do proactive activities to help maintain the health of our collection items.
When they are physical items, it’s usually very easy to visually see whether or not that item is healthy, if it’s starting to deteriorate, if it’s exhibiting any signs that would be a threat to its overall preservation.
But when we think about our digital content or collections, it has that air of mystery still. It’s not as easy for us to necessarily see those files and know that there might be a preservation issue, there.
Sometimes yes, absolutely. And we’ll get into those, but it does tend to be a little bit harder to detect at least just as a human looking at our digital collection.
We also have some added, layers or complexity to digital preservation and that is ever evolving technology, especially the rate at which our technology and how we use it is rapidly developing.
And with that, of course, means that as we use, more improved software, we create more improved file types, etcetera.
Each of those changes in evolution poses challenges for what has come before as well as poses challenges for these new things that we don’t quite know how to expect behavior wise in a longer term.
There’s also the concept of digital precarity.
Now, of course, all collections, physical and digital have some, inherent fragileness or fragility to them.
Digital precarity is the fact that our digital files there there’s precariousness, the precarity part of them in both the software and hardware aspects of being able to access run or save those files as well as the files themselves. So there is a multi pronged area of that fragility of those digital files that we wouldn’t necessarily run into when we’re thinking of physical object.
And then also, just one more sort of nuance or difference is that digital preservation or certainly the concept of digital collections for us in the museum field at least seems like it is always ever trendy. I swear the American Alliance of Museums every other trends watch there’s some sort of aspect of digital files or digital collections.
And in fact, this year was no different. One of their, TrendsWatch was focused around, digital precarity, as an issue. So always in the the zeitgeist, so to speak, there’s always resources or talks you can find about it, which is amazing.
But I think that just goes to reinforce the fact that we’re dealing with several different layered aspects of how and why our digital collections are tougher to do the preservation aspect of our work.
So what do I mean by digital preservation and what do we typically mean? And some questions for us to think through as we consider our collections is what does preservation mean when we’re thinking of those digital files especially if you’re newer to the concept of digital preservation knowing that yes part of it is saving you know one pristine file and one working copy file as part of a preservation best practice Making sure it’s saved in multiple locations is another preservation practice. So, preservation, just keeping in mind, it is the proactive activities to help maintain or elongate a life of an artifact and for in this case for us a digital file.
What materials do we consider part of our digital collection? Now the answer to this question is going to be different for pretty much every Museum.
For some museums if they have actually collected digital art then perhaps there’s a distinction for them in terms of their digital art is their digital collection, whereas digital files that are surrogates of their actual physical collection, sometimes museums, if that’s all they have, they do consider that part of their collection. It’s their digital collection.
So there can be a few different instances of what is considered a digital collection. For our purposes today we’re going to focus on any part of your digital files that are intended to be kept in perpetuity essentially. So So whether or not those are considered actual artifacts for your museum, or if they’re the surrogate records, for us, it’s gonna include all of them.
The next question, what does digital preservation look like? It will look different in every shop. Part of that is because the actual file collections you have will look slightly different.
More more differences can occur depending on what preservation practices you have in place, what sort of tools or hardware you have available to you.
So the shape of it will look different in terms of detail, but as we go through and incorporate digital preservation practices as part of our resiliency plan, you’ll get to see what those look like for you.
And then, of course, the most important question perhaps, and especially as we think beyond preservation and into that resiliency piece is, is it enough?
Is what we’re doing in terms of digital preservation enough to actually and effectively preserve our digital collection? So sort of a perpetual question, if you will, for us in the field.
So what is digital resiliency? So moving from what I would say just preservation to a more sustainable and more longer term resiliency plan. And so when something is resilient it means that it can withstand the pressures or difficulties and quickly recover.
And so our work as those collection stewards is to build a resilient environment and have resilient files so that when the inevitable happens in terms of digital decay, you know, accidental loss those collections can be quickly recovered. So it’s no longer just an active. I’m preserving this file. It is I am building this environment for my digital collection so that when things do happen that will impact the health of my digital collection.
I know that the environment I built is resilient enough that those files can be recovered.
So it’s a hopefully that helps to sort of illustrate the it’s not just preservation. We’re going a bit beyond it.
So what does it look like to build digital resiliency?
To get to the answer of that question I’m going to have us focus on challenges for a moment so that you’re fully aware of both the technical and logistical sides.
For the technical challenges there are the the three most common. One is loss of access.
So loss of access to my digital file because maybe the file is corrupt. Loss of access to the digital file because maybe I don’t have that software program anymore.
Loss of access to the digital file because the hard drive it was on was corrupted. So maybe the file was fine, but the hardware I used to access it is no longer available. So a few different ways we can lose access to something, whether that is the file or the tool.
The loss of data. So whether that is data within the file or data that is embedded or comes with the file, there’s a few aspects of data in our digital collections as well as just the data we create, that can accompany those digital files and similar loss, whether those, you know, data becomes corrupted, maybe become separated or divorced from the files supposed to be with, loss of access to the actual, piece of hardware where the data is stored. So not just the file there. The actual data is is the loss we’re focusing on. And then for the third one loss caused by human or nature made disasters. So humans can certainly be a weak point and create our own preservation issues both physically and digitally for our collections.
This could be accidental deletion. This could be not setting up our backups properly. This could be moving files and you know corrupting them in the process.
A number of ways in which we humans can make mistakes In addition to of course natural made disasters which we are well aware of in terms of how natural disasters can impact our physical collections and what we do to try and mitigate those.
Similarly, it will also impact our digital collections perhaps in a slightly different way slightly different dangers or threats. So certainly still present but just the impact is it looks different than what we would think of our physical collections.
And the logistical challenges, this particular area is that one that I wish was more or better covered in our field. Often we’ll we’ll very easily find and see evidence of these are all of the technical challenges we need to make sure we’re aware of.
Almost as importantly and perhaps more so is the logistical challenges. So for us in museums and often if it is a medium to larger size museum, well even I take that back it can be any museum. We can have the logistical challenge of understanding. So working with staff and working with administration and higher ups for resources that are needed or decisions that are needed.
Those folks tend to have a harder time understanding digital collections as a whole as well as what sort of risks may be involved and what sort of resources are needed in order to not just have digital preservation but create digital resiliency.
So first challenge is just the understanding of those around us in terms of what are the risks and how can we actually properly address them and what resources those take.
The second challenge is staff may have an understanding. There is some knowledge there but perhaps there is a perception challenge.
For digital collections, it’s very easy for us to take for granted just how easy it is to access the file on the day to day. It is harder to think through what the long term repercussions are of not protecting or putting into place any sort of preservation measures for those digital files.
So the this piece is the there’s the understanding, but the perception is not there in terms of accurately understanding those aspects.
Third is priorities.
So maybe there is understanding, maybe there is even some correct perception around digital collections and the need for digital resiliency and yet it is not necessarily prioritized above other items. Usually big and bold exhibitions are favored in terms of taking up a lot of resources. It can be really hard to compete with let’s make sure we build great digital hygiene and have a great digital resiliency plan. It can be harder to advocate for.
So in that regard, it can be harder to prioritize.
And all three of these, of course, create the fourth logistical challenge of actually getting the resources and budget allocated for you to not just do digital preservation activities, but to actually build digital resiliency into your program. So these are very real challenges that can build on each other and are certainly things for us to be aware of as we move through this process of building our own digital resiliency and starting to strategize around how to address those challenges.
All right, with those challenges in place, we’re going to do a review of the digital file integrity and access pieces so that you’re aware of the technical aspects that we should then address as part of our resiliency plan. So for our first piece here for digital file integrity, this is essentially, the wholeness of the file. And so for us, digital file integrity is when I am opening and viewing the file, is the visual quality there? Are things, are things sticky to where they need to be in terms of formatting?
Essentially is the is the visual there? Is the file as it should be to the human eye? So very similar to if we were checking out our physical collections for example, a first cursory look to see is there something going on with this particular artifact.
The second one is format and this is met in a perhaps more literal sense of as we’re looking at files each of the different files of course will be in a different file format so an excel file is in a spreadsheet format.
A sort of manuscript narrative document could be in a word or pdf format, for example.
Those are more common and often because they are more common a little easier to manage in that regard.
However, we do also have files at the museum that can be more exotic. So three-dimensional files, of course, are a bit more exotic for us. We don’t have quite a handle on the behavior or long term expectations of those files as we do for the more common PDF.
And there are also more like exhibit design, publication, architectural files that could also be a part of our collection that could be in those more exotic file format types. And so while those are not necessarily bad, they do pose a bit more of a mystery to us in terms of the long term health and ultimately then the management that we need to have in place for the health of those files.
The next one to think through about the wholeness and integrity of the file is the versioning.
Now if your particular digital collection is one where it does include working files, then versioning is the, of course, even more important for you because there will be the original file. There’ll be a working file. It as you go through, if it is a record that evolves over a certain period of time, there is going to naturally be several dozen, perhaps, versions of that file over time.
For some amazing digital software or digital preservation software systems, versioning is something that they can very easily manage for you and it’s very easy to see the visual distinction of those different files being saved.
If you do not have that sort of software or tool in place, it’s not impossible but it does mean the control of versioning is going to be more manual, and of course something for us to keep in mind if that is a part of your digital collection.
And then final thing to think about in terms of the integrity or the wholeness of the file is, in the digital preservation field referred to as checksums.
The name at least for my brain is sort of meaningless. So I typically, if it’s helpful for you, I think about it as the file fingerprint.
Every file when it’s assigned a checksum, it is essentially assigning this is the fingerprint that we have, identified for this file and these are the unique points of interest that only this file will have.
When a file is assessed and assigned that fingerprint from then on as you check the health of your files, you will essentially be checking that record of that fingerprint against the file to see if the fingerprint has changed. Now it shouldn’t our fingerprints should not change and if the fingerprint has changed that means that the file itself something has changed which which means that it’s no longer the file that you are attempting to preserve something has happened.
So these four put together are ways in which we can assess as the humans in charge of managing the collections, what is happening with the file and also to take a measure of health for the digital file integrity.
Now digital file access is that that second pillar to the integrity piece because the file can be as healthy and robust as ever and yet if we lose access to it, it does not matter how healthy that file is which it can just be heartbreaking when it happens. So, we want to think about access as important as the health and the file integrity.
So for the access pieces, it can be a little bit more brass tacks, which can be nice. So we want to think about the storage hardware that we’re using making sure that it is quality storage hardware. So this is if you are saving things on computer desktops, on servers, on external drives, things that are are tangible. Maybe not necessarily tangible to you on your desk, but it exists tangibly elsewhere for example.
And thinking through the regular replacement, we know that our computers and phones need to replace every so often. Otherwise, they stop working. Very same is true for all of the digital storage devices that we tend to use.
We also want to think about appropriate storage software.
For museums that can afford a digital asset management system, a dams, or preservation management system, you may be lucky and have access to some great tools and appropriate software to manage quite a bit of the resiliency piece for you.
If you do not, at least being aware of the capabilities of your storage software, to what extent are there gaps and planning around it. So even if you have more basic software, you can still build a resilient digital collection.
And then of course when working with hardware and software it requires regular maintenance and updates. So just wanting to keep that in mind that, of course, as we would expect for our phones that at least for mine seems to update every other week, we need to make sure software updates are happening. We also need to make sure that the hardware maintenance and updates or swapping out, as things obsolete are just as important.
So with those in mind, thinking through digital file integrity and digital file access, we’re gonna get into the resilience, pieces of things to put in place for ultimately a holistic resiliency plan.
So first part as we think through things and as we start to create this plan is thinking through detection. So detection being the first part of what are the activities processes or tools we can put in place to help us detect when there is a health issue whether that is the integrity of files are being compromised or perhaps an access issue is happening.
For detection, we can do things like monitoring the file format. So when we were talking about, more exotic file formats or file formats that are perhaps a few decades old now, and maybe we don’t have the software as readily available.
We need to be aware of the files we have, the formats in place, and where those more, areas that are more susceptible to obsolescence can occur.
We also want to make sure that as we have assigned or gathered what those fingerprints are for your file that we are then regularly checking those fingerprints or checking the checksums to make sure that those have not changed and remember they should never change so anything that does look off or not match up is a sign of a health issue.
And then of course the more literal and perhaps more practical opening and viewing the file. This is using our eyes to assess are things opening? Are is the format of the actual file within? Is the text sticking to where it’s supposed to?
This is particularly important for not as common file formats or files that are older where that stickiness of the actual format in the file can be an issue. So bringing sort of those health checks into place for detection.
Digital resilience via repair. So one thing is to detect once we do detect a potential issue we need to go into repair mode. If things are possible to repair and for us repair could be using a backup system which is part of our larger plan that we’ll get to in a moment.
And that is to sometimes just easier to use a backed up version of the file from yesterday because something happened overnight and the file today is no longer healthy, we can replace it. Another active repair if you have a preservation preservation system in place then there are similar tools that can either revert the file or help to address depending on that preservation issue what is happening especially if it can be a formatting issue and you’ve got some fancy preservation software, there can be more options available to you there.
We may also want to consider a repair technician. So perhaps if the file issue is with the actual hard drive that you’re trying to access is not allowing you to access your files and so perhaps working with a repair technician first to get access to the hardware or to at least transfer your files to something different could be another avenue. So our goal for this is to know what repair actions are available to us and then depending on the issue that presents itself one or more of these repair actions may be ones that you would want to take if it comes up.
Now digital resilience via protection and this is where I think it sort of touches back on the the essence of preservation in terms of we want to do proactive things. It is far easier to proactively protect things and then refer back to them if needed than it is to have to have nothing available to us. And so part of that resiliency piece is leveraging that those proactive measures And so getting to best practice of having your digital files stored into two or three geographically separate locations.
And so that means, you know, you may be saving it to the cloud so to speak. If you are saving it to a Silicon Valley option like Dropbox or Google Drive, they will typically always have two to three different server farms that are, separate geographically, which is something for us to think about because we’re not just saving it to the cloud. The cloud is actually real servers that exist, in reality and are very tactile in these different geographic locations. So, knowing which sort of if it is a cloud, where it is eventually saving to and how many copies.
If it is just you and it’s not saving to the cloud, it’s saving to maybe the server in the office and then maybe it has a backup somewhere else.
Similar things just a bit of a different approach. We wanna make sure that there’s two to three different separate locations. If you can get them, like, into different geographic regions, that is even better. But for those smaller to medium sized museums, even just getting them out of the same building is like first step. That is if you can do that, that is also helpful and important.
Next thing to think about for that protection piece is scheduling and taking partial and or full backups at a regular cadence.
This depends on your digital collection and how often, you may be working with files and whether or not you are tracking that as part of your digital collection.
And so for some of the museums that may just be, we are only considering these files as digital art as our digital collection. There’s not changes to them typically. And so periodic backup, can happen quarterly maybe because the loss of the data between one quarter and another is actually very minimal.
Whereas, if you are actively working on files that are considered part of your digital collection and that work is happening every day, then taking a partial backup every night, for example, of the files that were changed and or doing those full backups on a cadence that makes sense for the rate at which you are creating or changing files.
Now even if you don’t have a digital preservation system, there are digital backup software that you can easily get off the shelf that can help you with scheduling these types of backups. It is completely doable in a DIY kind of way so don’t despair if you do not have a fancy system in place.
Another thing to make sure you have is that the backup is consistently ran. So, if for some reason you are not able to even buy the thirty dollar off the shelf software to help with the scheduling of the backups, it is something that you can schedule and trigger manually as well.
It does take more manual work and time on your part, which I don’t love for you, but it is an option, if everything else is not an option for you.
Also, for that protection piece, receiving a regular hardware replacement. So see thinking back to our file access and needing to make sure the hardware is maintenance and or replaced quite regularly.
Similar software, making sure we’re doing tests of software and auditing materials regularly.
This is particularly important if your materials are in those more exotic or older file formats that are not as commonly used and created and monitored today.
So putting all of those together then into a digital resilience plan. So thinking through what our challenges are, how we’re addressing them, thinking through our understanding of the digital file integrity, as well as the access pieces, thinking through what our options are in terms of tools or process of doing the detection and repair actions as well as that protective piece.
So as you put together your plan and think through those, I recommend getting a guideline guidelines in place for acceptable file format types.
This is especially important if your digital collection regularly includes more exotic file types. So knowing and understanding what those file types are and perhaps what the, the most desired from a preservation perspective file format may be for that final output.
For example, for architectural files, maybe it is actually preferred that those get saved for the purposes of preservation as PDF files, whereas they can remain architecture files when they are still like the everyday working files.
We also wanna make sure that we set up a backup system that’s scheduled, whether that is automatically with a tool or software piece in place or manually, it’s you and it’s the end of the month every month and you do a little bit of a manual resave or snapshot of your digital collection.
Running file integrity checks. There is there are great tools out there that are not even necessarily part of a whole digital asset management system that you can acquire fairly cheaply. And I think there’s even some good free open source that will run integrity checks for you. There are some that will even assign those checksums, the fingerprints, and then go back and check those fingerprints for you. So we definitely have resources available when we think through those.
I recommend also scheduling periodic audits. So spot checking using our eyes to can we open these files or is the actual format within the file staying as it should?
Are things opening appropriately? Is my external hard drive still opening files, and still spinning up when I ask it to?
And then determining proactively how digital files will be repaired or replaced. So, in those past couple slides when we were talking about repair and protection actions, what of those will you incorporate into your own resiliency plan?
So that was quite a bit as we did a deep dive into not just digital preservation but what does it mean to build digital resilient collections.
And with that we discussed sort of the the difference between preservation moving into resiliency.
With that we also reviewed technical and logistical challenges with an emphasis on how to address especially those logistical challenges.
We then got into the the technical aspects of what does digital file integrity mean, what are the things we need to look out for in terms of file health, including also that second pillar of file access because it doesn’t matter how healthy a file is if you cannot actually access the file.
We then got into the digital resilience pieces and ultimately those pieces being a part of a fuller plan. So what does it look like to have detection of checking our digital files for integrity and access? What does it look like for different repair actions depending on what sort of preservation issue you run into?
And what does protection look like as we think about more of that proactive piece of making our digital collections resilient? Because good protection and proactive measures make it that much easier for any sort of detection and repair that may need to occur.
And then finally putting those pieces together for what does a resiliency plan look like? What do we need to consider? What sort of information do we need to identify?
What sort of process or tools can we put into place to help make this resiliency holistic and, address as much as we can with as much resources as we have available?
We have some additional reading, where we go into a bit more detail depending on which aspects of digital preservation you would like more information on. Each of these posts are on Lucidea’s Think Clearly blog. So you can do like a search for digital preservation or if you’re super into like digital storage options. So, feel free to check those out over on the blog and search the blog for the different sort of subject matter that you would like more information on.
And before I let you go, just sharing that Lucidea Press and I have the discovery game changer museum collections data enhancement book available.
Lucidea Press is giving out free e-copies if you go to the link. So I hope you’re able to check it out. And if you enjoy data or how data work in your near future, it’s definitely a good guidebook to start out with.
And with that, I’ll hand it back over to you, Bradley.
Thank you, Rachel, for the wonderful presentation. And to our audience, if you’d like to learn more about our museum collections management system called Argus, please feel free to visit our website or reach out to us at sales@lucidea.com, and we’d be happy to have a chat with you.
If you have any questions on any of our software or our company, our contact details are listed on the screen, and please stay tuned for more webinars and content related to this series.
On behalf of the Lucidea team, I thank you all for attending today, and until next time. Thank you.