https://wiki.curatecamp.org/api.php?action=feedcontributions&user=Chris+Adams&feedformat=atomCURATEcamp - User contributions [en]2024-03-29T12:21:55ZUser contributionsMediaWiki 1.28.0https://wiki.curatecamp.org/index.php?title=CURATEcamp_Exhibition:_Exhibition_in_and_of_the_Digital_Age&diff=2954CURATEcamp Exhibition: Exhibition in and of the Digital Age2013-07-25T17:32:12Z<p>Chris Adams: </p>
<hr />
<div>There will be a one-day CURATEcamp following this years [http://blogs.loc.gov/digitalpreservation/2013/05/curatecamp-exhibition-exhibition-in-and-of-the-digital-age/ DigitalPreservation 2013] conference in Arlington, VA. <br />
<br />
We are focusing this camp on exploring the idea of exhibition. You can read more about the topic below. The facilitators will be Trevor Owens, digital archivist at the Library of Congress, Sharon Leon, director of public projects at the Roy Rosenzweig Center for History and New Media, and Michael Edson, director of web and new media strategy at the Smithsonian Institution.<br />
<br />
== Registration ==<br />
<br />
Registration closed! Full, but email eengle at loc dot gov if you want to be added to the wait list.<br />
== Logistics ==<br />
<br />
*WHEN: '''Thursday July 25, 2013''' (9am - 4pm)<br />
*WHERE: [https://www.starwoodmeeting.com/StarGroupsWeb/booking/reservation?id=1304244232&key=BB50 Westin Alexandria], 400 Courthouse Square, Alexandria, VA 22314<br />
*COST: Free, there is no cost to register for the meeting <br />
*LOGISTICS: The hotel is about .7 mile from the [http://www.wmata.com/rail/station_detail.cfm?station_id=48 King Street Metro]. You can walk or take a taxi (about $5). The hotel offers a complimentary shuttle that can be arranged with the concierge. If you're coming from out of town, more info about how to get to the hotel is [http://www.digitalpreservation.gov/meetings/register/logistics.html here]<br />
*DISCUSSION: [https://twitter.com/#!/search/realtime/%23curatecamp #curatecamp on Twitter] and #curatecamp on irc.freenode.net<br />
<br />
== CURATEcamp Exhibition Theme ==<br />
<br />
An exhibition involves organizing, contextualizing and displaying collection items. As cultural heritage organizations increasingly make both digitized and born-digital materials available, we find a range of opportunities for exhibiting them. Thinking broadly about the idea of exhibition, everything from faceted browsing and visualizations to linear and non-linear modes of presenting materials, is part of the interpretive framework through which users make sense of collection materials.<br />
<br />
=== Potential Session Topics ===<br />
<br />
*Open Authority and Curatorial Voice<br />
*Online Exhibition at Scale<br />
*Visualization as Exhibition<br />
*Exhibiting Born Digital Objects<br />
*Interpretation for Mobile Devices<br />
*Digital Storytelling and Cultural Heritage Collections<br />
*Collection Interfaces that Contextualize<br />
*Storytelling and Linked Data<br />
*Social Media as Exhibition<br />
*Citizen Curators<br />
*Blogs as Serialized Exhibits<br />
*Data Journalism as inspiration for Exhibition<br />
<br />
You can also share ideas you have for topics in the [http://blogs.loc.gov/digitalpreservation/2013/05/curatecamp-exhibition-exhibition-in-and-of-the-digital-age/ comments on the blog post announcement].<br />
<br />
== CURATEcamp Exhibition Schedule ==<br />
<br />
We will fill this in the morning of the conference. <br />
<br />
{| class="wikitable" border=1<br />
! Time<br />
! Edison D (this room)<br />
! Edison E (next door)<br />
! Bell <br />
! Whitney<br />
<br />
|-<br />
| 9am-9:50 || Intros and scheduling ||||||<br />
|-<br />
| 10am-10:50 || [[Blogging/Digital Storytelling]] || [[https://docs.google.com/document/d/1XiQLTDcI5a0B1pxnC5xWCOnVy3kXp9OK_PrrqIlKNCk/edit Mobile!]] || Digital Curation|| [https://docs.google.com/document/d/1_47D8OVZivWEK58wi-P2__AHV3d7NTsS3xppnBHhWdE/edit Visualization as Exhibition] <br />
|-<br />
| 11am-11:50 || Social Media || [https://docs.google.com/document/d/1QdhVLjv_-EtpZqStnnEIbzfs25qFymLQeTJpO8C0Mc8/edit?usp=sharing Exhibiting Audiovisual & Non-textual Objects] || [https://docs.google.com/document/d/1x29l3_O0UiEThAwtmZpoajzX9KKu-vubDcee9Dr0DRg/edit Metadata Preservation (& Presentation)] || [https://docs.google.com/document/d/1egPqIfmrd33FGgDQxs9MchWCdjN_crxWCy5eC9dIL8c/edit?usp=sharing&pli=1 Omeka Best Practices]<br />
|-<br />
| 12pm-1:30 || Lunch on your own || || || <br />
|-<br />
| 1:30-1:55 || Lightning Talks || || || <br />
|-<br />
| 2pm-2:50 || Ask A Coder! || Defining Online Exhibits || Finding Allies Beyond GLAM || Games & Virtual Worlds as Exhibition <br />
|-<br />
| 3pm-4pm || User-generated Exhibits/Wikibition! || Next-gen Online Exhibits || Provenance Data for Online Objects || Metrics for Online Exhibitions<br />
|}<br />
<br />
==☇ Lightning Talks ☇ ==<br />
<br />
# Art of Google Books<br />
# WSLS Newsfilm Collection<br />
# A tool to visualize large amounts of historical data<br />
# Adding to an existing platform: “Archival project docs of World Bank”<br />
# “Permanent Exhibitions” for LAM<br />
# [https://dl.dropboxusercontent.com/u/360980/Presentations/LC%20-%20WDL%20Book%20Viewer%20Lightning%20Talk.pdf WDL's new Book Viewer]<br />
# SPEED</div>Chris Adamshttps://wiki.curatecamp.org/index.php?title=Batch_OCR_%26_Search&diff=1903Batch OCR & Search2012-07-26T18:57:35Z<p>Chris Adams: Import of Brendan's notes</p>
<hr />
<div>* Can an existing piece of text be mapped to the layout from an original image?<br />
** Tesseract can support this.<br />
** Very valuable for old manuscripts<br />
<br />
* OCR error rate depends on quality of the writing and quality of a scan. <br />
<br />
* Some people are using Hadoop to distribute some of the OCR and analysis.<br />
<br />
* Possibility of author tracking (via handwriting style tracking) being integrated into mainstream tools?<br />
<br />
* Possibility of revision tracking built into the OCR’d contents metadata?<br />
<br />
* How much OCR metadata to include?<br />
* Human input vs automation vs hybrid approach.<br />
* Linking OCR’d content back to catalog records is straightforward and easy approach.<br />
<br />
* hOCR - Standard output format for OCR’d content. http://en.wikipedia.org/wiki/HOCR <br />
<br />
* OMR (optical music recognition) - http://en.wikipedia.org/wiki/Music_OCR <br />
<br />
<br />
Projects mentioned:<br />
<br />
* [http://mith.umd.edu/research/project/active-ocr/ ActiveOCR] Corrections are fed back into the software and it learns to become more accurate.<br />
* [http://code.google.com/p/tesseract-ocr/ Tesseract]<br />
** Lots of languages<br />
** Skew on images can produce poor results<br />
* [http://code.google.com/p/ocropus/ OCRopus]</div>Chris Adamshttps://wiki.curatecamp.org/index.php?title=Processing_2012_Schedule&diff=1902Processing 2012 Schedule2012-07-26T18:52:58Z<p>Chris Adams: Batch OCR & Search link</p>
<hr />
<div>= Thursday, July 25th =<br />
<br />
We will fill this in the morning of the conference. <br />
<br />
{| class="wikitable" border=1<br />
! Time<br />
! South Ballroom<br />
! North 1<br />
! North 2<br />
! North 3<br />
<br />
|-<br />
| 9am-9:50 || Intros and scheduling ||||||<br />
|-<br />
| 10am-10:50 || [[Options for Repository Software]]: Despite the ubiquity of Fedora and DSpace, Some have found these tools insufficient for their current needs. In this session we will discuss options for building repositories including, eXist, solr & CouchDB, and if possible, even more cutting edge options for secure storage and retrieval of digital library collections (Doug Reside)|| [[At Risk Records in 3rd Party Systems]]: How do you find, chase, harness, describe & Preserve records created by members of your Org in "External" systems? Thinking Social Media, Cloud based/3rd Party platforms including GIS, infoviz, CoP group sites...etc. What to capture/when? (Jeanne Kramer-Smyth, Brandon Hirsch) || [https://docs.google.com/document/d/1zlaGzX8fXep3L6gSZLsWx7vqhmdQSHxX0WiErW-3Qd8/edit Bare Minimum Processing or More Product/Less Process for Born Digital or What is efficient & Sufficient]: What is the bare minimum that should be done to process incoming born-digital records to maximize the number of records we can provide in usable form the researcher? When your producers can supply very miscellaneous collections of formats & types & these start coming in at massive scale, intesive processing of all records to guarantee each one is easily usable (The Other Meg Mcaller)|| Digital Divide for Practitioners: How can traditional archivists who got their MLS before there were classes/programs on electronic records/ digital curration get their foot in the door (Kathleen) <br />
|-<br />
| 11am-11:50 || Automating Review for Restrictions?: One of the most dramatic current bottlenecks in the flow of archival electronic records from producers to consumers (researchers) is the need to ensure that restricted content is not released. What are the tools & processes that can identify privacy info & more complicated restricted content (confidential business info, law enforcement sensitive etc) so humans don't have to look at every file? Who has tried this? Who is exploring it? (Meg Phillips) || [https://docs.google.com/document/d/1yf5sw3uQX-hxXMBtgNKXNtfXVYsSz0MOL-8Gqqr9Ojs/edit Entity Extractions for Descriptive Metadata + MetadatPreservation of Digitzation Projects ]: (Trevor Owens, Milland Schisler) || [[Historical Geographic Name Search Expansion]]: Is there a practical way to return search results based on alternate place names, eg todz/Litzmannstadt based on an authority eg LCSH geonet geo names? (Michael Levy) || Preservation of Digitization Projects: People often confuse the concept of digitization with the broader concepts of digital preservation. That said how are the two tied? How do we execute digital preservation techniques on the kinds of digital projets being implemented in archives today? (Mitch Brodsky)<br />
|-<br />
| 12pm (lunch) || Lunch and Lightning Talks || || || <br />
|-<br />
| 1pm-1:50 || [https://docs.google.com/document/d/1q4lcJCMadi754RK6NUaZBuOBAikpECBZOwopye5YyRo/edit Disrespect des Fonds]: Rethinking digital order, arrangement & authenticity in digital archives (Jefferson Bailey) || Alternatives to message digests (MD5 etc) for checking file integrity: What are the alternatives to using MD5, SHAx etc for checking file integrity, and what are their advantages and disadvangates (Andrea Goethals) || Virtualization as a means for Preservation: In light of yesterday's intriguing discussions of preserving systems in their original state, how can we leverage virtualization for large-scale, robust preservation (Brandon Hirsch) || [[How do you catch a cloud and pin it down]]: People want to know "how much stuff" we have. How do we translate files and records into books and letters and albums (Kate Zwaard/Liz Madden) <br />
|-<br />
| 2pm-3:00 || [[Batch OCR & Search]]: Discuss how to bootstrap full-text search for large, mixed collections (Chris Adams) || Accessible Visualization (Rabia Gibbs, UT Libraries) || Defining & Extracting Essential Characteristics to support Preservation (Mark Evans) || Backup, Recovery & Preservation of CMS based websites (Nicole Scalessa) <br />
|}</div>Chris Adamshttps://wiki.curatecamp.org/index.php?title=Options_for_Repository_Software&diff=1889Options for Repository Software2012-07-26T16:28:10Z<p>Chris Adams: Tidied openstack section</p>
<hr />
<div>==Tools we are using==<br />
<br />
* Fedora<br />
* DSpace<br />
* Archivematica<br />
* OpenStack / SWIFT<br />
* LOCKSS<br />
* Ex Libris<br />
* Omeka<br />
* E.R.A.<br />
<br />
== What is a Repository ==<br />
<br />
Provisional definition: A file system with transactional ingest (let's you know that the file was copied properly) with regular backups and abstracted links to actual file stores managed by a metadata system that may also include other metadata information.<br />
<br />
== Microservices needed by repository developers ==<br />
<br />
* File storage<br />
* Integrity checking<br />
* Metadata linking <br />
* Public access<br />
* Backups<br />
* Transaction based ingest<br />
<br />
(Lots of interest in combination of OpenStack SWIFT with solr)<br />
<br />
SWIFT [http://swift.openstack.org] provides <br />
* Open source object storage <br />
* Amazon S3, simple semantics REST based?<br />
* clustering reliability<br />
* Large scale production deployments: Rackspace, [https://cloud.sdsc.edu/ San Diego Supercomputing Center]</div>Chris Adamshttps://wiki.curatecamp.org/index.php?title=Historical_Geographic_Name_Search_Expansion&diff=1888Historical Geographic Name Search Expansion2012-07-26T16:26:30Z<p>Chris Adams: Link dump</p>
<hr />
<div>Links:<br />
<br />
* [http://openstreetmaps.org/ OpenStreetMaps]<br />
* [http://geonames.org GeoNames]<br />
* [http://en.wikipedia.org/wiki/Wikipedia:How_to_add_geocodes_to_articles Wikipedia geocoding]<br />
* [http://developer.yahoo.com/geo/geoplanet/ Yahoo GeoPlanet & Where On Earth IDs]<br />
* [http://schema.org/Place schema.org Places] and [http://dev.w3.org/html5/md-LC/ HTML5 microdata]</div>Chris Adamshttps://wiki.curatecamp.org/index.php?title=Processing_2012_Schedule&diff=1887Processing 2012 Schedule2012-07-26T16:22:29Z<p>Chris Adams: Link for Historical Geographic Name Search session</p>
<hr />
<div>= Thursday, July 25th =<br />
<br />
We will fill this in the morning of the conference. <br />
<br />
{| class="wikitable" border=1<br />
! Time<br />
! South Ballroom<br />
! North 1<br />
! North 2<br />
! North 3<br />
<br />
|-<br />
| 9am-9:50 || Intros and scheduling ||||||<br />
|-<br />
| 10am-10:50 || [[Options for Repository Software]]: Despite the ubiquity of Fedora and DSpace, Some have found these tools insufficient for their current needs. In this session we will discuss options for building repositories including, eXist, solr & CouchDB, and if possible, even more cutting edge options for secure storage and retrieval of digital library collections (Doug Reside)|| [[At Risk Records in 3rd Party Systems]]: How do you find, chase, harness, describe & Preserve records created by members of your Org in "External" systems? Thinking Social Media, Cloud based/3rd Party platforms including GIS, infoviz, CoP group sites...etc. What to capture/when? (Jeanne Kramer-Smyth, Brandon Hirsch) || [https://docs.google.com/document/d/1zlaGzX8fXep3L6gSZLsWx7vqhmdQSHxX0WiErW-3Qd8/edit Bare Minimum Processing or More Product/Less Process for Born Digital or What is efficient & Sufficient]: What is the bare minimum that should be done to process incoming born-digital records to maximize the number of records we can provide in usable form the researcher? When your producers can supply very miscellaneous collections of formats & types & these start coming in at massive scale, intesive processing of all records to guarantee each one is easily usable (The Other Meg Mcaller)|| Digital Divide for Practitioners: How can traditional archivists who got their MLS before there were classes/programs on electronic records/ digital curration get their foot in the door (Kathleen) <br />
|-<br />
| 11am-11:50 || Automating Review for Restrictions?: One of the most dramatic current bottlenecks in the flow of archival electronic records from producers to consumers (researchers) is the need to ensure that restricted content is not released. What are the tools & processes that can identify privacy info & more complicated restricted content (confidential business info, law enforcement sensitive etc) so humans don't have to look at every file? Who has tried this? Who is exploring it? (Meg Phillips) || Entity Extractions for Descriptive Metadata + MetadatPreservation of Digitzation Projects: (Trevor Owens, Milland Schisler) || [[Historical Geographic Name Search Expansion]]: Is there a practical way to return search results based on alternate place names, eg todz/Litzmannstadt based on an authority eg LCSH geonet geo names? (Michael Levy) || Preservation of Digitization Projects: People often confuse the concept of digitization with the broader concepts of digital preservation. That said how are the two tied? How do we execute digital preservation techniques on the kinds of digital projets being implemented in archives today? (Mitch Brodsky)<br />
|-<br />
| 12pm (lunch) || Lunch and Lightning Talks || || || <br />
|-<br />
| 1pm-1:50 || Disrespect des Fonds: Rethinking digital order, arrangement & authenticity in digital archives (Jefferson Bailey) || Alternatives to message digests (MD5 etc) for checking file integrity: What are the alternatives to using MD5, SHAx etc for checking file integrity, and what are their advantages and disadvangates (Andrea Goethals) || Virtualization as a means for Preservation: In light of yesterday's intriguing discussions of preserving systems in their original state, how can we leverage virtualization for large-scale, robust preservation (Brandon Hirsch) || How do you catch a cloud and pin it down: People want to know "how much stuff" we have. How do we translate files and records into books and letters and albums (Kate Zwaard/Liz Madden) <br />
|-<br />
| 2pm-3:00 || Batch OCR & Search: Discuss how to bootstrap full-text search for large, mixed collections (Chris Adams) || Accessible Visualization (Rabia Gibbs, UT Libraries) || Defining & Extracting Essential Characteristics to support Preservation (Mark Evans) || Backup, Recovery & Preservation of CMS based websites (Nicole Scalessa) <br />
|}</div>Chris Adamshttps://wiki.curatecamp.org/index.php?title=CURATEcamp_Processing_2012&diff=1867CURATEcamp Processing 20122012-07-26T13:51:24Z<p>Chris Adams: Added chat links</p>
<hr />
<div>Link to CURATEcamp [[Processing 2012 Schedule]]<br />
<br />
There will be a one-day CURATEcamp Following this years [http://blogs.loc.gov/digitalpreservation/2012/06/curatecamp-processing-processing-dataprocessing-collections/ DigitalPreservation 2012] conference in Arlington, VA.<br />
[https://docs.google.com/spreadsheet/viewform?fromEmail=true&formkey=dHhVU0xSbUJJbkZZZk4xSkViYnBxNlE6MQ Registration is open now]! & Space is limited. We are focusing this camp on the idea of processing, bringing together the computational sense of the word with the archival sense of it. We are particularly excited about bringing together archivists and curators with software developers and engineers to do some creative thinking and tinkering. You can read up on the topic below. The Camp is being facilitated by Trevor Owens and Leslie Johnston from the Library of Congress Meg Phillips, Electronic Records Lifecycle Coordinator at the National Archives and Records Administration and Mark Matienzo, Digital Archivist at Yale University.<br />
<br />
*WHEN: '''Thursday July 26, 2012''' (9am - 3pm)<br />
*WHERE: Sheraton Pentagon City, 900 South Orme Street, Arlington, VA 22204<br />
*COST: Free, there is no cost to register for the meeting <br />
*LOGISTICS: The hotel will offer a continuous shuttle to transport guests from Pentagon City Metro between 8-9am and 2:30-3:30pm on July 26. Parking is $10 per day with discount sticker.<br />
*DISCUSSION: [https://twitter.com/#!/search/realtime/%23curatecamp #curatecamp on Twitter] and #curatecamp on irc.freenode.net<br />
<br />
'''Space is limited to 100 registrants''', so [https://docs.google.com/spreadsheet/viewform?fromEmail=true&formkey=dHhVU0xSbUJJbkZZZk4xSkViYnBxNlE6MQ reserve your spot while you can]<br />
<br />
Participants have already started sharing ideas on the camp announcement post. Please take a minute to [http://blogs.loc.gov/digitalpreservation/2012/06/curatecamp-processing-processing-dataprocessing-collections/ share ideas you have for the session on the announcement blog post]. <br />
<br />
'''CURATEcamp Processing: Processing Data/Processing Collections'''<br />
<br />
Processing means different things to an archivist and a software developer. To the former, processing is about taking custody of collections, preserving context, and providing arrangement, description, and accessibility. Processing, in its analog archival sense, also includes a lot of preservation, (stabilization, preliminary conservation assessment, and the dreaded “re-housing”). To the latter, processing is about computer processing and has to do with how one automates a range of tasks through computation. When a cultural heritage organization’s work is organized around processing digital objects, these two notions of processing intermingle. This CurateCamp unconference is intended to put these two notions of processing together in whatever ways can be imagined by the curators, archivists, librarians, scholars, software developers, computer engineers, and others that attend.<br />
<br />
'''Potential topics and considerations could include:'''<br />
*Automated inventorying and file characterization<br />
*Computational determination of hierarchical arrangement<br />
*Format validation & migrations<br />
*Automated metadata extraction<br />
*Potential roles for entity extraction in subject cataloging<br />
*Dynamically generated description<br />
*Malware scanning<br />
*Pattern & fuzzy searching for PII, SSNs, etc<br />
*Automated access restrictions<br />
*Generating visualizations and using them as access tools<br />
*Human computation’s potential role in cultural heritage collections<br />
*Machine learning and digital collections<br />
*Using name authority linked data<br />
*Processes for geo-refferencing<br />
*Potential uses of facial recognition tools for identifying individuals in collection images</div>Chris Adams