Crowdsourcing
From CURATEcamp
Revision as of 08:12, 26 May 2012 by Laura Akerman (talk | contribs)
Contents
Questions About Crowd-sourced Metadata
- Who's using user-added metadata?
- Successes and Failures.
- Importing (Ingesting) or Exporting.
- What types of metadata fields are good candidates for crowd-sourced metadata?
- What effective incentives can be provided for metadata entry?
- Mechanical Turk.
- Used by Amazon.
- Looks computer generated, human created.
- What possible legal issues might there be with crowd-sourced metadata?
- What quality control or authority control systems can be implemented?
- What reputation systems might be employed to handle quality / authority issues?
- What methods of integration could their be with non-user generated metadata?
- Controlled Vocabularies.
- There must be some consideration of domain-specific vocabularies.
- One size does not fit all.
- Awareness - Users must be aware of crowd-sourcing features.
- Marketing.
- Advertising.
- Public Relations.
Examples of Systems Using Crowd-sourced Metadata
- Wikipedia (open-source)
- Amazon (corporate)
- GT ETDs "keywords"
- E-page "keywords via online forms - no schema"
- UT OPAC - Tags to any entry
- GT VuFind / Primo Central
- Emory ETDs "keywords" also selecting categories, soon digitized books.
- New York Public Library - menus project (this wiki won't take YouTube links, it thinks they're spam, but check out Barbara Taranto's Crowd Sourcing Metadata link on this page: http://www.cni.org/events/membership-meetings/past-meetings/fall-2011/)
Blue Sky
- Controlled Vocabularies (via something like LCSH)
- Search Options of Labeling Authority
- Public (wide open)
- Authority (domain experts)
- Paid sources via harvest
- Amazon
- LibraryThing
Significantly Relevant Concepts
- Critical Mass
- Key:Value Pairs vs. Tags/Labeling
Conclusions
- Leave tagging wide open to users
- The amount of data being gathered is growing at an increasing rate, such that keeping up with metadata values will eventually become insurmountable by hand.
- Paid services (unpaid services?) to harvest metadat from where it already exists rather than generating locally may be necessary (Amazon album art, LibraryThing tags, etc).
- Without controlled vocabularies faceting becomes significantly less effective. For this reason controlled vocabularies may be a requirement of future systems.
- It may become necessary to make researched decisions concerning controlled vocabularies before implementing systems.
- There may be need of supersets or subsets of vocabularies to extend within or beyond the original domain.
- Can one size fit all with minimal accuracy and minimal effort while domain specific vocabularies are maintained?
- It seems that there will be a significant need for an "Analytics tool for combined labeling systems" to expose usage for evaluation of labeling / metadata effectiveness.