CURATEcamp DLF 2012 Discussion Ideas

From CURATEcamp
Revision as of 23:39, 3 November 2012 by Tony Navarrete (talk | contribs) (Demos)
Jump to: navigation, search

Agenda

Contribution and Ingest: Lowering Barriers

  • just-in-case vs. just-in-time metadata
  • what's "good enough": department, creator, hit send?
  • how can we get digital stuff with the minimum amount of effort?
  • set up a pre-ingest staging area for review to make sure content is "repository worthy"
  • build in metadata and structure requirements as part of data creation - "data counseling" for reuse
  • your digital ingest backlog already exists, it's just distributed across your institution
  • should it be the content creators' job to deposit? mediated deposit yields better results
  • set minimum compliance requirements for researchers
  • this is a "knowledge translation" problem--it won't be easy
  • tempting to scale back to absolute minimums, but that's not a good long term solution for reuse, discovery. Need a better balance.
  • Email and Dropbox submission: accounts/authentication already built in, why not use it?
  • Do mediated deposit and self deposit need to be mutually exclusive? How about "mediate on demand"?
  • Levels of Service: process and team to spec out how long ingest will take. Data prioritized by content type/project.
  • Help clean up existing content, and apply "lessons learned" to make templates for future data/metadata creation
  • Summary: long process with a lot of complexity - make small, incremental steps. It's OK to do what you can for now. Doing what you can is better than over-promising.
  • DataStaR from Cornell - staging area for repositories http://datastar.mannlib.cornell.edu/
  • recap of C4L 2011 discussion: why does ingest suck?
    • promise of permanence sets up a barrier (forever is intimidating)
    • perception that ingest makes objects less discoverable
    • rights - need to be cleared before ingest?
    • metadata - requires too much? need a way to ingest easily and augment later
    • curation happens outside repository, preservation happens inside
    • more content in staging area than the "actual" repository
    • content creators don't have time to ingest
    • lessons learned

Fighting the "One Tool to Rule Them All" Mindset

  • need to understand what each tool does well and its limitations
  • the only way to cope with the weaknesses in each tool is multiple access/use layers. interoperability is our job security
  • what is the usefulness of the comparison matrix? really have to install and use the tools to evaluate them, but sometimes that isn't possible
  • more useful to think about a "framework" than "tools" -- but it's hard to do anything without programmers. Is outsourcing/contracting really feasible?
  • compare to the times when we only had an OPAC and that was good enough--then came ERMs, discovery layers
  • managing any preservation repository takes resources, so it's difficult to have more than one. Have one repository with multiple access/view layers.
  • Managing/displaying many different types of descriptive metadata--do you need multiple tools to do this?
  • The discovery layer is going to be Google! (or other search engines) - use RDF/schema.org and focus on how to expose metadata as broadly and usefully as possible
  • cultural heritage interests are so specific--Google can actually work pretty well with basic information
  • How to do SEO? are the typical methods of improving relevancy rankings applicable to library content/metadata? the approach that's worked for us
  • Search engines are "dumb" - don't tolerate AJAX, JavaScript, etc. sitemaps can help
  • schema.org extension project for books in the works
  • Getting things into Google Scholar - DSpace has tools that help
  • ETDs - need a national solution (repository, harvesting, metadata) for ETDs that doesn't involve proprietary licenses

Funding Repositories and Showing Value

  • Know how to measure goals/outcomes (what do you want to measure and how) before you get started
  • Base your outcomes on your institutional values. Why are you preserving/curating these objects?
  • IT and Libraries need to be better at taking credit for the services they provide. You need to tell people how the work you do helps them. Why should people care? Tell stories, not numbers.
  • Communication channels: both inside and outside the library, materials that go out to alums
  • Infrastructure is a lot harder to build compelling stories about
  • Tools don't have quantitative statistics built in, and librarians don't often have the training/experience to do qualitative research
  • Capture quotes from constituents (satisfied customers)
  • Still need to manage all the "ordinary" stuff -- have baseline metrics for everything.
  • What was the original business case for preserving these things? Business case is not "this will turn a profit" but why are you doing this and what benefits will it bring?
  • Digital preservation is more expensive than keeping stuff on a shelf, so you have to make a stronger case. What is the decision-making process for keeping digital stuff? Being digital is not enough.
  • Have content creators tell you the significance/importance of their content and set priorities. This should be a mediated process.
  • Access is easier to sell--how do you sell the importance of long-term preservation without making up worst-case scenarios?
  • Ingest is half the cost of the overall digital preservation lifecycle--make sure what you ingest is worth it. The "collection development" aspect of physical collections isn't present with digital content in the same way.
  • Risk assessment - how do you measure the risks that you mitigate?

Long-term Preservation of Complex Objects

  • Rights management complicates preservation of linked objects
  • Memento protocol for crawling/harvesting web sites
  • How/when do you preserve fluid objects? We are used to preserving fixed objects.
  • Make project/discipline-based decisions on when data should be captured (raw vs. processed data)
  • Find ways to model structure (METS, RDF)--connect the dots (files). But implementation of highly flexible tools like METS is complicated.
  • PREMIS and Fedora Commons
  • Fedora object model with RDF
  • OAI-ORE: resource map - everything has an identifier and you lump identifiers together to make "things"
  • RDF triples with ARK identifiers. Would be good to have best practices/general use cases for use of ARKs and METS together
  • Ways to map METS to ORE
  • Samples of complex objects
    • Hyperlinks from journal articles to external data; complexities of not having all parts of the objects under your control
    • ArcGIS data: how much documentation/metadata do you need? License considerations?
    • Archiving video games
    • How to preserve complexity of special collections? Reorder with technology?

User Experience

  • UI/UX development and reuse (how to do this, formal roles, community development)- usage of curation tools by users (vs. curators) - 16
  • Contextualize objects
  • How will users use/reuse objects?
  • Interfaces can get out of hand quickly when using modern interface design tools
  • Design interfaces that you can reuse/contribute. How do you balance custom/local needs with general reusability considerations?
  • JQuery UI
  • user-friendly APIs
  • modular development - make it easy to use just what you need
  • common data structures
  • accessibility considerations
  • usability studies: how do you analyze usability/gather feedback?
    • feedback tends to be very specific about certain functions/features
    • how you structure usability testing will set up your results
  • Developers need to watch users use the system!
  • Too much variability as you move across applications/collections: how to standardize without losing custom features?
  • People want repository tools to work like common internet tools (Google, Amazon, etc)
  • Moving to general conventions to help novice users can alienate specialized users
  • Beware field labels and library jargon
  • Librarians get sad when they feel like they aren't doing the rich data justice--but is it really serving the user?
    • Brief vs. full metadata and "the tab for librarians"
    • Users don't understand the term "metadata"
  • Copyright data is too verbose, threatening to users--but often featured on description pages
  • How can you focus on design/usability earlier in the development process?
  • Mobile front-ends/apps
    • Simplifying for mobile can help with SEO - Velocity template on top of solr
    • Need a mobile version of a page turner

Community Development

  • API layers and allowing interaction with repositories
    • Should there be APIs by content model/type of metadata?
  • For those who have custom preservation repositories, how do you create/repurpose front ends?
    • Hydra without Fedora: repurpose Fedora API, try to remove some Fedora assumptions
  • How to make development more community oriented?
    • Code on GitHub
    • How to migrate custom local code into an open source project?
  • RESTful interface to Fedora: a standard for Create-Read-Update-Delete in HTTP
  • CMIS - content management interoperability services
  • ROpenSci creates a connection between the R stats package and various repositories: http://ropensci.org
  • Hydra and Islandora communities discussing how they could interoperate on the same Fedora
  • Non-programmers can contribute by providing user stories to guide API functionality
  • Having actively engaged users/domain experts in the development process can really speed up development
    • Have developers and users in the same room on a frequent basis--constant user testing/interaction
    • Open Planets Foundation: librarians/archivists and programmers in the same room. Happens outside of institutions to help remove local pressures
    • Clear channels of communication with users to gather feedback and inform development
  • Importance of domain expertise and people who can talk to both users and programmers

Demos

  • Data Model from UCSD
    • 3-month process to create a new data model for library
    • -Data model blog link here-
    • Data model diagram (relationships), data dictionary, user stories
    • sample record converted to ntriples
  • Distributed Search: Arctic Data Explorer
    • Data only
    • Search criteria: space, time, parameter
    • How to let users search your data as well as relevant external data in one place?
    • Designed for people who don't know what they're looking for--help them discover things they don't know about

Wrap-up Session

  • Wrap-up session: community-building: future of CURATEcamp, sustainability - 19

Topics

  • linked data (7)
  • digital curation
  • records management
  • metadata & authority control (10)
  • long-term preservation of complex objects (16)
  • data model from UCSD (17)
  • bootstrapping repository services (getting started with minimal resources) curation & preservation in the wild (sans repo) - 15
  • development trends
  • standards
  • data management tools & processes
  • Cylinders of Excellence: living with multiple systems (interoperability, one system to rule them all?) combatting "one tool" philosophy (three tools: DAMS for simple items, repo for authorial/ETD workflow, GIS data somethingsomethin'), how not to shoehorn everything (platform/layers vs. monolithic) - especially issues with multiple workflows - 18
  • expanding the value of library infrastructure/tools (business use, scholarship) - 12
  • Contribution/ingest - 20
  • Abstraction layer for repositories especially from early and/or bespoke systems - 2
  • community development (e.g. Hydra project on top of not Fedora) - 15
  • METS development - 1
  • UI/UX development and reuse (how to do this, formal roles, community development)- usage of curation tools by users (vs. curators) - 16
  • Has the digital realm affected our idea of what digital preservation means? selection (e.g. of content types) for digital preservation -

are we saving too much? who decides? - 15

  • Now that the bits are preserved, how do we preserve behavior/experience - 7
  • multi-institutional repositories (UC, CIC, etc.)
  • Wrap-up session: community-building: future of CURATEcamp, sustainability - 19
  • PREMIS for preservation metadata (user feedback, requests) and changes coming in PREMIS 3 - 4
  • persistent identifiers, e.g., ARKs - 8
  • Gather round for demos at 3:30 - 25ish
  • service models for ingest: internal repos vs external or subject repos - 8
  • project is done, now what? - proving value of investment - ROI ALSO funding models for repository/curation services (grants, etc.)- 18
  • e-book preservation - 4

Timeline

  • 09:00-09:40 Introductions
  • 09:40-10:00 Break
  • 10:00-10:45 Voting/Ranking
  • 10:45-11:15 Session 1: Ingest Barriers
  • 11:30-12:00 Session 2: One Tool
  • 12:00-12:30 Session 3: ROI
  • 12:30-2:00 Lunch
  • 2:00-2:30 Session 4:
  • 2:30-3:00 Session 5:
  • 3:00-3:30 Break
  • 3:30-4:00 Session 6:
  • 4:00-4:30 Demos
  • 4:30-5:00 Wrap-up/Future of Curate Camp