At Risk Records in 3rd Party Systems

From CURATEcamp
Jump to: navigation, search

Notes taken during the session are below... Another set of notes are here:

Jeanne: How do we know about and capture records created outside the organization - in the cloud, on 3rd party sites?

Brandon: We focus on preserving congressional records. How can we do dynamic data preservation? How do you get data from these 3rd party systems? Do we pull data from these systems.. or do we pull stand up virtualized instances of SharePoint.

At the NY Philharmonic.. first Annual Report/Factbook that they couldn't preserve since 1842. Dynamic content added, 3rd party tool. Closed off to standard approaches to archiving. The only way they could think of preserving this was a video of clicking through every part of the publication. This is not sustainable. 'We weren't consulted' beforehand.. but this is happening, so we have to live with it.

How do we preserve new compound/dynamic content when the organization is moving fast without consultation to the archives?

NSF has a data management plan approach - you must submit a plan before you get your grant. How about making it part of the 'rules' for moving forward.

Large institutions vs small institutions.

Large organizations have lots of rules about what you can and cannot do on the official channels, but people go off and do their own thing. Great ideas that people want to act on.

What are the significant properties? What needs to be preserved - just the content you can extract? Or do you need the full experience? Sometimes the answer is yes - especially if the new format is a continuation of an existing series of records.

Maybe we need standards/guidelines for new content publication.

Keeping both the original 1s and 0s so we can emulate later AND extract content for short term access to 'content'. Unsolvable problems today are likely solvable by the smarter future.

There are consequences to not having and following a 'file plan'. Part of people's jobs have to be to follow the rules.

If you know what is going to be created, you can 'shame' people into giving you what they haven't yet. NY Philharmonic keeps 'every scrap of paper'. This removes the need for judgement calls for individuals - everything goes to the archives.

Maybe we need to do a better job on fewer things? Be more selective rather than keeping everything (which is more than researchers could ever possibly access/use) - do a better job.

How do organizations handle official social media channels?

Figure out who the 'bean counters' are and you have to convince them that this is important and an issue of risk.

"Collecting Evidence" vs "Archiving" may be more convincing to the general public (and lawyers).

How do you preserve the interconnections of web content? How does the experience change - different simultaneous experiences on different social media platforms. ThinkUP - open source software to tackle this problem discussed earlier this week.

When should archivists be in the process? At the start - before content is created, before systems are created?

Keep the original data AND keep updated data. Document everything, data sources, processes applied.

Do you go try to get records from the start of creation OR wait until the business is 'done' with them?

Different types of content:

  • big data, large datasets can be preserved by the creator
  • how do you document where you got external data sets?
  • how about data that changes over time? How easy will it be to get historical data sets?
  • do we care about the interactions?
  • need national/international level efforts to preserve large data sets that will be used by many

What do the records management policies look like for social media?

Only keep external reference content (including data sets) if the final product depends on/uses the content.