Common architecture for preserving/curating 3 types of content

From CURATEcamp
Jump to: navigation, search

Many trying to manage different types of content in a common curation environment

  • faculty papers, campus grey lit, traditional IR stuff
  • born digital special collections
  • locally digitized content

This material has different preservation policies, different ingest workflows, different change behavior over time. What are strategies to handle these differences while still taking advantage of common infrastructure?

Policies - be careful with what you're promising people when you take their stuff. Don't promise everything forever. Instead list formats you can preserve/migrate, hand the bits back for other stuff. CDL has a small list of formats they'll normalize, do more active. Fall back on bitstream preservation.

How to deal with content you know will change over time? Stanford expecting content will change over time, version it. A data set gets deposited, but the researcher will continue to work on it for 6 months. Get clever about storing the different parts of it.

UCSD - take one copy of the data, put it into Chronopolis. Have lots of conversation about how they get the stuff from their partners. Also take a version, "objectify" it, as its objectified, normalized, becomes a different thing. Need to be clear about what's the container and what's the contained.

LOCKSS focuses on an archival unit. (eg a volume of a journal) Describe speculatively what will go in that unit, even if its not collected yet. The archival unit remains open until everything is gathered.

Many content management systems organize their content into a hierarchy (abstract thing w/ versions that are linear or branched). Plus a rendition layer.

You can donate, but we can't promise it will get processed right away. Basic preservation in boxes, A/C, off the floor. This is analogous to doing bit level preservation until it can be processed.

Nobody wants to use a repository if it only takes "done" stuff.

Stanford currently working on 2nd generation repository. It was liberating in their 1st generation repository to realize the metadata didn't have to be done before ingest. METS was underscoring this false assumption b//c everything is all wrapped together. Called it zip and sip at first, then used Bags.

What metadata goes into the repository isn't necessarily the metadata of record. Access systems might fulfill this purpose instead. Get enough into the repo to contextualize the object, help to preserve it.

Workflows for dealing w/ metadata/objects getting created at different times. - at Stanford they want to assemble a bag for the repository. robots sending around for their pieces to be done. keep polling the file system. - at UCSD, most metadata assembled for ingest. access control system so curators can edit metadata in the repository. trying to decide if they want PREMIS events for when metadata is changed. archivists give them METS, they move to RDF. Can spit out METS again.

Different metadata for these 3 types. Nobody in the room says they even try to normalize. - Oregon, they ask for a word doc manifest (the same way they used to want to on paper). tries to model the paper environment as much as she can

Erin O'Meara: admin ui is important

Stanford makes a distinction between deposit and ingest. Hydra is central object core, registry on top of that, deposit systems on top of that. End user will use a deposit interface, the robots take it from there.

CDL has different workflows, but the user decides which is best for them. UI, APIs. CDL doesn't do the ingest for them, just gives them the tools. Most content coming in through the APIs. Libraries are the ones doing the deposit, haven't worked w/ individual faculty yet. Repo is back up in a year. UCSD planning for them to broker faculty content into the CDL's repo.

How to make clear to faculty what pres services you're providing? Get them to help you write it.

What documentation do you create to explain the different queues of content. Diagrams seem to help. Faculty don't seem to care. Mostlyt/ he conversation happens after faculty content is made.

For library staff, partners, Stanford finds "framework" documents helpful. 1-1.5 pages, with use cases.

Lots of confusion reported among library staff between access and preservation.

Bess asks to what extent we (the tech people) should be doing more outreach. Stanford has been giving workshops to staff to give them skills to prepare objects for the repository. There are very few places to learn "what is a digital object"?

Need for training of librarians. Declan suggests the CODE4LIB wiki "guide for the perplexed."

UCSD trying to build expertise in project management (archivist, arts library people, etc., taking this on). Putting together project manager's toolkit.

Exposure is really important - don't just have the conversations over coffee, get the vocab into the daily operation of the library.

Bess ran a training class/study group at UVa. Target audience was those who are involved in the DL workflow, but not centrally.