CURATEcamp DLF 2012 Discussion Ideas

Agenda

Contribution and Ingest: Lowering Barriers

just-in-case vs. just-in-time metadata
what's "good enough": department, creator, hit send?
how can we get digital stuff with the minimum amount of effort?
set up a pre-ingest staging area for review to make sure content is "repository worthy"
build in metadata and structure requirements as part of data creation - "data counseling" for reuse
your digital ingest backlog already exists, it's just distributed across your institution
should it be the content creators' job to deposit? mediated deposit yields better results
set minimum compliance requirements for researchers
this is a "knowledge translation" problem--it won't be easy
tempting to scale back to absolute minimums, but that's not a good long term solution for reuse, discovery. Need a better balance.
Email and Dropbox submission: accounts/authentication already built in, why not use it?
Do mediated deposit and self deposit need to be mutually exclusive? How about "mediate on demand"?
Levels of Service: process and team to spec out how long ingest will take. Data prioritized by content type/project.
Help clean up existing content, and apply "lessons learned" to make templates for future data/metadata creation
Summary: long process with a lot of complexity - make small, incremental steps. It's OK to do what you can for now. Doing what you can is better than over-promising.
DataStaR from Cornell - staging area for repositories http://datastar.mannlib.cornell.edu/
recap of C4L 2011 discussion: why does ingest suck?
- promise of permanence sets up a barrier (forever is intimidating)
- perception that ingest makes objects less discoverable
- rights - need to be cleared before ingest?
- metadata - requires too much? need a way to ingest easily and augment later
- curation happens outside repository, preservation happens inside
- more content in staging area than the "actual" repository
- content creators don't have time to ingest
- lessons learned

Fighting the "One Tool to Rule Them All" Mindset

need to understand what each tool does well and its limitations
the only way to cope with the weaknesses in each tool is multiple access/use layers. interoperability is our job security
what is the usefulness of the comparison matrix? really have to install and use the tools to evaluate them, but sometimes that isn't possible
more useful to think about a "framework" than "tools" -- but it's hard to do anything without programmers. Is outsourcing/contracting really feasible?
compare to the times when we only had an OPAC and that was good enough--then came ERMs, discovery layers
managing any preservation repository takes resources, so it's difficult to have more than one. Have one repository with multiple access/view layers.
Managing/displaying many different types of descriptive metadata--do you need multiple tools to do this?
The discovery layer is going to be Google! (or other search engines) - use RDF/schema.org and focus on how to expose metadata as broadly and usefully as possible
cultural heritage interests are so specific--Google can actually work pretty well with basic information
How to do SEO? are the typical methods of improving relevancy rankings applicable to library content/metadata? the approach that's worked for us
Search engines are "dumb" - don't tolerate AJAX, JavaScript, etc. sitemaps can help
schema.org extension project for books in the works
Getting things into Google Scholar - DSpace has tools that help
ETDs - need a national solution (repository, harvesting, metadata) for ETDs that doesn't involve proprietary licenses

Funding Repositories and Showing Value

Know how to measure goals/outcomes (what do you want to measure and how) before you get started
Base your outcomes on your institutional values. Why are you preserving/curating these objects?
IT and Libraries need to be better at taking credit for the services they provide. You need to tell people how the work you do helps them. Why should people care? Tell stories, not numbers.
Communication channels: both inside and outside the library, materials that go out to alums
Infrastructure is a lot harder to build compelling stories about
Tools don't have quantitative statistics built in, and librarians don't often have the training/experience to do qualitative research
Capture quotes from constituents (satisfied customers)
Still need to manage all the "ordinary" stuff -- have baseline metrics for everything.
What was the original business case for preserving these things? Business case is not "this will turn a profit" but why are you doing this and what benefits will it bring?
Digital preservation is more expensive than keeping stuff on a shelf, so you have to make a stronger case. What is the decision-making process for keeping digital stuff? Being digital is not enough.
Have content creators tell you the significance/importance of their content and set priorities. This should be a mediated process.
Access is easier to sell--how do you sell the importance of long-term preservation without making up worst-case scenarios?
Ingest is half the cost of the overall digital preservation lifecycle--make sure what you ingest is worth it. The "collection development" aspect of physical collections isn't present with digital content in the same way.
Risk assessment - how do you measure the risks that you mitigate?

Long-term Preservation of Complex Objects

Rights management complicates preservation of linked objects
Memento protocol for crawling/harvesting web sites
How/when do you preserve fluid objects? We are used to preserving fixed objects.
Make project/discipline-based decisions on when data should be captured (raw vs. processed data)
Find ways to model structure (METS, RDF)--connect the dots (files). But implementation of highly flexible tools like METS is complicated.
PREMIS and Fedora Commons
Fedora object model with RDF
OAI-ORE: resource map - everything has an identifier and you lump identifiers together to make "things"
RDF triples with ARK identifiers. Would be good to have best practices/general use cases for use of ARKs and METS together
Ways to map METS to ORE
Samples of complex objects
- Hyperlinks from journal articles to external data; complexities of not having all parts of the objects under your control
- ArcGIS data: how much documentation/metadata do you need? License considerations?
  - See North Carolina Geospatial Data Archiving Project (http://www.digitalpreservation.gov/partners/documents/ncgdap_final_report.pdf)
- Archiving video games
- How to preserve complexity of special collections? Reorder with technology?

User Experience

UI/UX development and reuse (how to do this, formal roles, community development)- usage of curation tools by users (vs. curators) - 16
Contextualize objects
How will users use/reuse objects?
Interfaces can get out of hand quickly when using modern interface design tools
Design interfaces that you can reuse/contribute. How do you balance custom/local needs with general reusability considerations?
JQuery UI
user-friendly APIs
modular development - make it easy to use just what you need
common data structures
accessibility considerations
usability studies: how do you analyze usability/gather feedback?
- feedback tends to be very specific about certain functions/features
- how you structure usability testing will set up your results
Developers need to watch users use the system!
Too much variability as you move across applications/collections: how to standardize without losing custom features?
People want repository tools to work like common internet tools (Google, Amazon, etc)
Moving to general conventions to help novice users can alienate specialized users
Beware field labels and library jargon
Librarians get sad when they feel like they aren't doing the rich data justice--but is it really serving the user?
- Brief vs. full metadata and "the tab for librarians"
- Users don't understand the term "metadata"
Copyright data is too verbose, threatening to users--but often featured on description pages
How can you focus on design/usability earlier in the development process?
Mobile front-ends/apps
- Simplifying for mobile can help with SEO - Velocity template on top of solr
- Need a mobile version of a page turner

Community Development

API layers and allowing interaction with repositories
- Should there be APIs by content model/type of metadata?
For those who have custom preservation repositories, how do you create/repurpose front ends?
- Hydra without Fedora: repurpose Fedora API, try to remove some Fedora assumptions
How to make development more community oriented?
- Code on GitHub
- How to migrate custom local code into an open source project?
RESTful interface to Fedora: a standard for Create-Read-Update-Delete in HTTP
CMIS - content management interoperability services
ROpenSci creates a connection between the R stats package and various repositories: http://ropensci.org
Hydra and Islandora communities discussing how they could interoperate on the same Fedora
Non-programmers can contribute by providing user stories to guide API functionality
Having actively engaged users/domain experts in the development process can really speed up development
- Have developers and users in the same room on a frequent basis--constant user testing/interaction
- Open Planets Foundation: librarians/archivists and programmers in the same room. Happens outside of institutions to help remove local pressures
- Clear channels of communication with users to gather feedback and inform development
Importance of domain expertise and people who can talk to both users and programmers

Demos

Data Model from UCSD
- 3-month process to create a new data model for library
- -Data model blog link here-
- Data model diagram (relationships), data dictionary, user stories
- sample record converted to ntriples

Distributed Search: Arctic Data Explorer
- Data only
- Search criteria: space, time, parameter
- How to let users search your data as well as relevant external data in one place?
- Designed for people who don't know what they're looking for--help them discover things they don't know about

ScholarSphere non-demo demo

IIIF image API

screencast of Stanford ETD - a Hydra app

Wrap-up Session

Wrap-up session: community-building: future of CURATEcamp, sustainability - 19

Topics

linked data (7)
digital curation
records management
metadata & authority control (10)
long-term preservation of complex objects (16)
data model from UCSD (17)
bootstrapping repository services (getting started with minimal resources) curation & preservation in the wild (sans repo) - 15
development trends
standards
data management tools & processes
Cylinders of Excellence: living with multiple systems (interoperability, one system to rule them all?) combatting "one tool" philosophy (three tools: DAMS for simple items, repo for authorial/ETD workflow, GIS data somethingsomethin'), how not to shoehorn everything (platform/layers vs. monolithic) - especially issues with multiple workflows - 18
expanding the value of library infrastructure/tools (business use, scholarship) - 12
Contribution/ingest - 20
Abstraction layer for repositories especially from early and/or bespoke systems - 2
community development (e.g. Hydra project on top of not Fedora) - 15
METS development - 1
UI/UX development and reuse (how to do this, formal roles, community development)- usage of curation tools by users (vs. curators) - 16
Has the digital realm affected our idea of what digital preservation means? selection (e.g. of content types) for digital preservation -

are we saving too much? who decides? - 15

Now that the bits are preserved, how do we preserve behavior/experience - 7
multi-institutional repositories (UC, CIC, etc.)
Wrap-up session: community-building: future of CURATEcamp, sustainability - 19
PREMIS for preservation metadata (user feedback, requests) and changes coming in PREMIS 3 - 4
persistent identifiers, e.g., ARKs - 8
Gather round for demos at 3:30 - 25ish
service models for ingest: internal repos vs external or subject repos - 8
project is done, now what? - proving value of investment - ROI ALSO funding models for repository/curation services (grants, etc.)- 18
e-book preservation - 4

Timeline

09:00-09:40 Introductions
09:40-10:00 Break
10:00-10:45 Voting/Ranking
10:45-11:15 Session 1: Ingest Barriers
11:30-12:00 Session 2: One Tool
12:00-12:30 Session 3: ROI
12:30-2:00 Lunch
2:00-2:30 Session 4:
2:30-3:00 Session 5:
3:00-3:30 Break
3:30-4:00 Session 6:
4:00-4:30 Demos
4:30-5:00 Wrap-up/Future of Curate Camp

CURATEcamp DLF 2012 Discussion Ideas

Contents

Agenda

Contribution and Ingest: Lowering Barriers

Fighting the "One Tool to Rule Them All" Mindset

Funding Repositories and Showing Value

Long-term Preservation of Complex Objects

User Experience

Community Development

Demos

Wrap-up Session

Topics

Timeline

Navigation menu

Views

Personal tools

Navigation

Search

Tools