From CURATEcamp
Jump to: navigation, search

Questions About Crowd-sourced Metadata

  1. Who's using user-added metadata?
    • Successes and Failures.
    • Importing (Ingesting) or Exporting.
  2. What types of metadata fields are good candidates for crowd-sourced metadata?
  3. What effective incentives can be provided for metadata entry?
  4. Mechanical Turk.
    • Used by Amazon.
    • Looks computer generated, human created.
  5. What possible legal issues might there be with crowd-sourced metadata?
  6. What quality control or authority control systems can be implemented?
    • What reputation systems might be employed to handle quality / authority issues?
  7. What methods of integration could their be with non-user generated metadata?
  8. Controlled Vocabularies.
    • There must be some consideration of domain-specific vocabularies.
    • One size does not fit all.
  9. Awareness - Users must be aware of crowd-sourcing features.
    • Marketing.
    • Advertising.
    • Public Relations.

Examples of Systems Using Crowd-sourced Metadata

  1. Wikipedia (open-source)
  2. Amazon (corporate)
  3. GT ETDs "keywords"
  4. E-page "keywords via online forms - no schema"
  5. UT OPAC - Tags to any entry
  6. GT VuFind / Primo Central
  7. Emory ETDs "keywords" also selecting categories, soon digitized books.
  8. New York Public Library - menus project (this wiki won't take YouTube links, it thinks they're spam, but check out Barbara Taranto's Crowd Sourcing Metadata link on this page:

Blue Sky

  1. Controlled Vocabularies (via something like LCSH)
  2. Search Options of Labeling Authority
    • Public (wide open)
    • Authority (domain experts)
    • Paid sources via harvest
      • Amazon
      • LibraryThing
      • 2 levels - uncontrolled, authorized, and interface suggests controlled (auto-complete? or like, didyoumean?)

Significantly Relevant Concepts

  1. Critical Mass
  2. Key:Value Pairs vs. Tags/Labeling


  1. Leave tagging wide open to users
  2. The amount of data being gathered is growing at an increasing rate, such that keeping up with metadata values will eventually become insurmountable by hand.
  3. Paid services (unpaid services?) to harvest metadat from where it already exists rather than generating locally may be necessary (Amazon album art, LibraryThing tags, etc).
  4. Without controlled vocabularies faceting becomes significantly less effective. For this reason controlled vocabularies may be a requirement of future systems.
  5. It may become necessary to make researched decisions concerning controlled vocabularies before implementing systems.
    • There may be need of supersets or subsets of vocabularies to extend within or beyond the original domain.
    • Can one size fit all with minimal accuracy and minimal effort while domain specific vocabularies are maintained?
  6. It seems that there will be a significant need for an "Analytics tool for combined labeling systems" to expose usage for evaluation of labeling / metadata effectiveness.