CURATEcamp OR11 Ideas

From CURATEcamp
Revision as of 19:10, 6 June 2011 by Mjgiarlo (talk | contribs)
Jump to: navigation, search

Feel free to use this space to share ideas for discussion at CURATEcamp #OR11.


Preserving faculty web output; Do you collect scholarly output native to the web? Web pages, blog posts, comments, tweets, winks, likes, zips, whistles? What are current policies, practices, plans, and pie-in-the-sky-semantic-web-dream-scenarios? + 2

Adam Field from the university of Southampton is working on this at the moment, he would be happy to demonstrate it here. Let me know - Mahendra Mahey

Poolside is coolside; Let's test the theory that holding #curatecamp by a pool with margaritas leads to us solving all the digital curation problems.

Managing controlled vocabulary terms: How can we best incorporate terms from controlled vocabularies as subject keywords for improved searching/browsing? + 1

ORCID and other person IDs: Widely adopted person identifiers would solve many of our name authority control problems. How should we store them and how do we want repositories to interact with services like ORCID that provide them?

GUI batch editing system: DSpace has introduced a highly useful system for editing batches of records, but it still requires a bit of back-end-knowhow. Let's make some strides toward designing and developing a user interface for this feature. What are the requirements to make this useful in a variety of repositories?

Using a HAMR: HAMR is a (not-yet-functional) tool for comparing a locally held record to an authority record and easily applying changes. Will the current design work for your repository system? What features would you like to see in this tool?

Tools for knowing when an article is published: Wouldn't it be nice to have an automated notification that additional, more authoritative metadata is available for an item? Also has particular importance for repositories with embargo periods.

Statistics issues/discussion: How to improve what we display? What to report to users? How to measure impact? Standards for excluding robots, local users, etc. Will the DSpace stats setup work for other repositories?

Requirements for reporting systems: What types of reports are most helpful for curators? Can we develop a standard set of reports that all repository platforms should support? One example is frequency reporting -- What values are typical for a field? What are the range of values in this field?

Codified taxonomies vs. folksonomies, how to manage that

Identity & access management + 2

What repository software does for data curation, and how to adapt it (dspace, fedora, etc.) + 1

Building a brilliant new repository system (how to do that, how to start)

Distributed network (nationally, state-wide, campus-wide, inter-institution) for curation services + 4

Libraries working with IT / "Rogue IT"

Ways to automate/batch ingest (SWORD?) & dissemination + 1

Authority control in the IR + 1

Tiered preservation policies (selection, different metadata, etc.) + 2

Discovery of (science) data in repositories

Social feedback loops (comments, trackbacks) in repositories + 1

Incentivizing work among developers / Working with developers

How to scale up (e.g., research data, "Big Data", architectures) + 4

Usability of repository interfaces (how to serve diverse user audiences)

Standards (metadata, formats, etc.) for long-term data preservation

Workflow systems (e.g. ingest) + 1

Multimedia and web formats (capturing and describing audio) + 1

Integrating collections w/ GIS mapping, creative uses of GIS

Finding a repository package that is appropriate for your institution (scalability, training requirements, etc.)

How to deliver digital content physically (digital rights management) + 1

"Quick wins" for digital curation

Curation microservices

Models for institutional data curation services

RDF as a data model in repositories

Levels of curation, models for keeping curation consistent


10:00am: Authority control

  • What would a "global network" of authority control look like?
    • (e.g. authority data for researchers)
  • How can we cooperate around these authority data in our local contexts, and then abroad?
  • Linked data as infrastructure for this network, e.g. VIAF
  • Types of authority control differ: researchers, student data, names.
  • DataCite (datacite.org), a "neutral" (not institution-specific) site for data identifiers
  • PeopleFinder discussed at LOD-LAM
  • Using Mendeley data to display connections between researchers
  • There is no authority of authorities; there is only a web of authorities
    • How to model trust on the web? Which authorities do I choose, and why?
    • If there's no authority, you can be an authority.
  • Shibboleth as a piece of the puzzle

10:45am: Ingest

  • How do we scale up ingest? The web form is not always ideal (giant batches).
    • ArchiveMatica handles it well. Can also use BagIt to bundle the data. UVA rebuilt as RubyMatica
    • Curator's Workbench at UNC is a similar tool
  • Barrier to ingest: Staff training (who is doing ingest and are they trained to do so?)
  • Using collection development policy to generate action
  • Program-based approach (cross-departmental) helps with buy-in across the organization
  • Item-level description can slow down the ingest process
  • Content tends to accumulate in "staging areas" because the ingest process is expensive or otherwise a barrier to repository deposit
  • Ingest tied to publication can make folks gun-shy (trust issues, quality assurance issues)

11:15am: Preservation policies

  • Who all has preservation policies?
  • Preservation: how long to keep stuff around, if at all?
  • We primarily talk about bit-level preservation, though some (UT) have experience with emulation, for instance
  • Disk images (frozen in time), digital forensics
  • Multimedia formats and migration
    • U. of Minnesota has a separate (Islandora) repository for multimedia called UMedia
    • Originals may be purged while blessed formats are preserved
  • Discussion of media capabilities, the need for streaming, software that supports it
  • Why keep the original?
    • Access issues (preservation format may not be the most access-friendly)
    • Conversion may be lossy

1:15pm: