CURATEcamp OR11 Ideas

From CURATEcamp
Jump to: navigation, search

Feel free to use this space to share ideas for discussion at CURATEcamp #OR11.

NOTE: This page is for the Open Repositories '11 camp, not CURATEcamp 2011 Ideas.

Preserving faculty web output; Do you collect scholarly output native to the web? Web pages, blog posts, comments, tweets, winks, likes, zips, whistles? What are current policies, practices, plans, and pie-in-the-sky-semantic-web-dream-scenarios? + 2

Adam Field from the university of Southampton is working on this at the moment, he would be happy to demonstrate it here. Let me know - Mahendra Mahey

Poolside is coolside; Let's test the theory that holding #curatecamp by a pool with margaritas leads to us solving all the digital curation problems.

Managing controlled vocabulary terms: How can we best incorporate terms from controlled vocabularies as subject keywords for improved searching/browsing? + 1

ORCID and other person IDs: Widely adopted person identifiers would solve many of our name authority control problems. How should we store them and how do we want repositories to interact with services like ORCID that provide them?

GUI batch editing system: DSpace has introduced a highly useful system for editing batches of records, but it still requires a bit of back-end-knowhow. Let's make some strides toward designing and developing a user interface for this feature. What are the requirements to make this useful in a variety of repositories?

Using a HAMR: HAMR is a (not-yet-functional) tool for comparing a locally held record to an authority record and easily applying changes. Will the current design work for your repository system? What features would you like to see in this tool?

Tools for knowing when an article is published: Wouldn't it be nice to have an automated notification that additional, more authoritative metadata is available for an item? Also has particular importance for repositories with embargo periods.

Statistics issues/discussion: How to improve what we display? What to report to users? How to measure impact? Standards for excluding robots, local users, etc. Will the DSpace stats setup work for other repositories?

Requirements for reporting systems: What types of reports are most helpful for curators? Can we develop a standard set of reports that all repository platforms should support? One example is frequency reporting -- What values are typical for a field? What are the range of values in this field?

Codified taxonomies vs. folksonomies, how to manage that

Identity & access management + 2

What repository software does for data curation, and how to adapt it (dspace, fedora, etc.) + 1

Building a brilliant new repository system (how to do that, how to start)

Distributed network (nationally, state-wide, campus-wide, inter-institution) for curation services + 4

Libraries working with IT / "Rogue IT"

Ways to automate/batch ingest (SWORD?) & dissemination + 1

Authority control in the IR + 1

Tiered preservation policies (selection, different metadata, etc.) + 2

Discovery of (science) data in repositories

Social feedback loops (comments, trackbacks) in repositories + 1

Incentivizing work among developers / Working with developers

How to scale up (e.g., research data, "Big Data", architectures) + 4

Usability of repository interfaces (how to serve diverse user audiences)

Standards (metadata, formats, etc.) for long-term data preservation

Workflow systems (e.g. ingest) + 1

Multimedia and web formats (capturing and describing audio) + 1

Integrating collections w/ GIS mapping, creative uses of GIS

Finding a repository package that is appropriate for your institution (scalability, training requirements, etc.)

How to deliver digital content physically (digital rights management) + 1

"Quick wins" for digital curation

Curation microservices

Models for institutional data curation services

RDF as a data model in repositories

Levels of curation, models for keeping curation consistent

DMPTool Online. A tool to help researchers create and edit data management plans. We will be able to demo the beta version during CurateCamp.

10:00am: Authority control

  • What would a "global network" of authority control look like?
    • (e.g. authority data for researchers)
  • How can we cooperate around these authority data in our local contexts, and then abroad?
  • Linked data as infrastructure for this network, e.g. VIAF
  • Types of authority control differ: researchers, student data, names.
  • DataCite (, a "neutral" (not institution-specific) site for data identifiers
  • PeopleFinder discussed at LOD-LAM
  • Using Mendeley data to display connections between researchers
  • There is no authority of authorities; there is only a web of authorities
    • How to model trust on the web? Which authorities do I choose, and why?
    • If there's no authority, you can be an authority.
  • Shibboleth as a piece of the puzzle

10:45am: Ingest

  • How do we scale up ingest? The web form is not always ideal (giant batches).
    • ArchiveMatica handles it well. Can also use BagIt to bundle the data. UVA rebuilt as RubyMatica
    • Curator's Workbench at UNC is a similar tool
  • Barrier to ingest: Staff training (who is doing ingest and are they trained to do so?)
  • Using collection development policy to generate action
  • Program-based approach (cross-departmental) helps with buy-in across the organization
  • Item-level description can slow down the ingest process
  • Content tends to accumulate in "staging areas" because the ingest process is expensive or otherwise a barrier to repository deposit
  • Ingest tied to publication can make folks gun-shy (trust issues, quality assurance issues)

11:15am: Preservation policies

  • Who all has preservation policies?
  • Preservation: how long to keep stuff around, if at all?
  • We primarily talk about bit-level preservation, though some (UT) have experience with emulation, for instance
  • Disk images (frozen in time), digital forensics
  • Multimedia formats and migration
    • U. of Minnesota has a separate (Islandora) repository for multimedia called UMedia
    • Originals may be purged while blessed formats are preserved
  • Discussion of media capabilities, the need for streaming, software that supports it
  • Why keep the original?
    • Access issues (preservation format may not be the most access-friendly)
    • Conversion may be lossy

1:20pm: Research data

  • What makes research data such a thorny issue
    • Ill-defined: instrumental? observational?
    • Rights & access control, especially for sensitive data
  • Tension between "just save it" and "curate it for added context"
  • How much of a domain focus is needed?
  • Curation of research data begins before the data exists
  • Value is increased by researcher description -- possible to get to an "a-ha" moment where researchers "get" curation, but it takes investment
  • Why should researchers store data with libraries when there are good disciplinary repositories? Libraries do not necessarily have sufficient domain expertise.
  • Dryad focused on collecting the data, not so much on all the preservation bells and whistles.
  • A gap library-based repositories might bridge is preservation -- disciplinary repositories are not necessarily based on robust preservation platforms
  • Library-based repos could provide a preservation platform with APIs for access, re-use, and discovery
  • Preservation is potentially very expensive. "Just in case" preservation is not affordable by all.
  • Libraries getting involved w/ research data process earlier, while grants are being written. Thanks, NSF.

1:55pm: Faculty web output

  • Adam Field from Southampton working on this (need more information)
  • Some folks getting mileage out of Archive-It
  • Why not let Internet Archive do it? They don't always capture what you're interested in.
  • How do you decide which web resources to capture? There are new tools and sites every year -- implies constant re-evaluation.

2:15pm: Scaling up

  • Technical scalability: storage, computing
  • Service/human resource scalability
  • Size of dataset not as relevant to scale as # of items (each of which needs description, identification, verification, etc.)
  • Automatic indexing may be necessary to scale up description

2:30pm: Libraries & IT

  • We speak different languages
  • Difficult to find IT personnel w/ time & interest to collaborate
  • Techies often left out of decision process
  • How do we get IT interested in our work? They might already be.
  • Collaboration, collaboration, collaboration
  • Budgetary implications
  • Different cultures, different career paths