Difference between revisions of "Crowdsourcing"

From CURATEcamp
Jump to: navigation, search
(Blue Sky)
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
*Who's using user-added metadata?
+
= Questions About Crowd-sourced Metadata =
**Successes and Failures
+
#Who's using user-added metadata?
**Importing (Ingesting) or Exporting
+
#*Successes and Failures.
*What types of metadata fields are good candidates for crowd-sourced metadata?
+
#*Importing (Ingesting) or Exporting.
*What effective incentives can be provided for metadata entry?
+
#What types of metadata fields are good candidates for crowd-sourced metadata?
*Mechanical Turk
+
#What effective incentives can be provided for metadata entry?
**Used by Amazon
+
#Mechanical Turk.
**Looks computer generated, human created
+
#*Used by Amazon.
*What possible legal issues might there be with crowd-sourced metadata?
+
#*Looks computer generated, human created.
*What quality control or authority control systems can be implemented?
+
#What possible legal issues might there be with crowd-sourced metadata?
**What reputation systems might be employed to handle quality / authority issues?
+
#What quality control or authority control systems can be implemented?
*What methods of integration could their be with non-user generated metadata?
+
#*What reputation systems might be employed to handle quality / authority issues?
*Controlled Vocabularies
+
#What methods of integration could their be with non-user generated metadata?
**There must be some consideration of domain-specific vocabularies
+
#Controlled Vocabularies.
**One size does not fit all
+
#*There must be some consideration of domain-specific vocabularies.
*Awareness - Users must be aware of crowd-sourcing features
+
#*One size does not fit all.
**Marketing
+
#Awareness - Users must be aware of crowd-sourcing features.
**Advertising
+
#*Marketing.
**Public Relations
+
#*Advertising.
 +
#*Public Relations.
 +
 
 +
= Examples of Systems Using Crowd-sourced Metadata =
 +
#Wikipedia (open-source)
 +
#Amazon (corporate)
 +
#GT ETDs "keywords"
 +
#E-page "keywords via online forms - no schema"
 +
#UT OPAC - Tags to any entry
 +
#GT VuFind / Primo Central
 +
#Emory ETDs "keywords" also selecting categories, soon digitized books.
 +
#New York Public Library - menus project  (this wiki won't take YouTube links, it thinks they're spam, but check out Barbara Taranto's Crowd Sourcing Metadata link on this page:  http://www.cni.org/events/membership-meetings/past-meetings/fall-2011/)
 +
 
 +
= Blue Sky =
 +
#Controlled Vocabularies (via something like LCSH)
 +
#Search Options of Labeling Authority
 +
#*Public (wide open)
 +
#*Authority (domain experts)
 +
#*Paid sources via harvest
 +
#**Amazon
 +
#**LibraryThing
 +
#**2 levels - uncontrolled, authorized, and interface suggests controlled (auto-complete?  or like, didyoumean?)
 +
 
 +
= Significantly Relevant Concepts =
 +
#Critical Mass
 +
#Key:Value Pairs vs. Tags/Labeling
 +
 
 +
= Conclusions =
 +
#Leave tagging wide open to users
 +
#The amount of data being gathered is growing at an increasing rate, such that keeping up with metadata values will eventually become insurmountable by hand.
 +
#Paid services (unpaid services?) to harvest metadat from where it already exists rather than generating locally may be necessary (Amazon album art, LibraryThing tags, etc).
 +
#Without controlled vocabularies faceting becomes significantly less effective. For this reason controlled vocabularies may be a requirement of future systems.
 +
#It may become necessary to make researched decisions concerning controlled vocabularies before implementing systems.
 +
#*There may be need of supersets or subsets of vocabularies to extend within or beyond the original domain.
 +
#*Can one size fit all with minimal accuracy and minimal effort while domain specific vocabularies are maintained?
 +
#It seems that there will be a significant need for an "Analytics tool for combined labeling systems" to expose usage for evaluation of labeling / metadata effectiveness.

Latest revision as of 08:16, 26 May 2012

Questions About Crowd-sourced Metadata

  1. Who's using user-added metadata?
    • Successes and Failures.
    • Importing (Ingesting) or Exporting.
  2. What types of metadata fields are good candidates for crowd-sourced metadata?
  3. What effective incentives can be provided for metadata entry?
  4. Mechanical Turk.
    • Used by Amazon.
    • Looks computer generated, human created.
  5. What possible legal issues might there be with crowd-sourced metadata?
  6. What quality control or authority control systems can be implemented?
    • What reputation systems might be employed to handle quality / authority issues?
  7. What methods of integration could their be with non-user generated metadata?
  8. Controlled Vocabularies.
    • There must be some consideration of domain-specific vocabularies.
    • One size does not fit all.
  9. Awareness - Users must be aware of crowd-sourcing features.
    • Marketing.
    • Advertising.
    • Public Relations.

Examples of Systems Using Crowd-sourced Metadata

  1. Wikipedia (open-source)
  2. Amazon (corporate)
  3. GT ETDs "keywords"
  4. E-page "keywords via online forms - no schema"
  5. UT OPAC - Tags to any entry
  6. GT VuFind / Primo Central
  7. Emory ETDs "keywords" also selecting categories, soon digitized books.
  8. New York Public Library - menus project (this wiki won't take YouTube links, it thinks they're spam, but check out Barbara Taranto's Crowd Sourcing Metadata link on this page: http://www.cni.org/events/membership-meetings/past-meetings/fall-2011/)

Blue Sky

  1. Controlled Vocabularies (via something like LCSH)
  2. Search Options of Labeling Authority
    • Public (wide open)
    • Authority (domain experts)
    • Paid sources via harvest
      • Amazon
      • LibraryThing
      • 2 levels - uncontrolled, authorized, and interface suggests controlled (auto-complete? or like, didyoumean?)

Significantly Relevant Concepts

  1. Critical Mass
  2. Key:Value Pairs vs. Tags/Labeling

Conclusions

  1. Leave tagging wide open to users
  2. The amount of data being gathered is growing at an increasing rate, such that keeping up with metadata values will eventually become insurmountable by hand.
  3. Paid services (unpaid services?) to harvest metadat from where it already exists rather than generating locally may be necessary (Amazon album art, LibraryThing tags, etc).
  4. Without controlled vocabularies faceting becomes significantly less effective. For this reason controlled vocabularies may be a requirement of future systems.
  5. It may become necessary to make researched decisions concerning controlled vocabularies before implementing systems.
    • There may be need of supersets or subsets of vocabularies to extend within or beyond the original domain.
    • Can one size fit all with minimal accuracy and minimal effort while domain specific vocabularies are maintained?
  6. It seems that there will be a significant need for an "Analytics tool for combined labeling systems" to expose usage for evaluation of labeling / metadata effectiveness.