https://wiki.curatecamp.org/api.php?action=feedcontributions&user=Courtney+C.+Mumma&feedformat=atomCURATEcamp - User contributions [en]2024-03-29T13:27:40ZUser contributionsMediaWiki 1.28.0https://wiki.curatecamp.org/index.php?title=Welcome_SAA_2013_Campers!&diff=2995Welcome SAA 2013 Campers!2013-08-13T13:59:45Z<p>Courtney C. Mumma: </p>
<hr />
<div>'''Howdy, Campers!''' <br />
<br />
Thanks for registering! & Welcome to CURATECamp SAA 2013, to be held Tuesday, August 13, 2013 from 10am to 4pm at Tulane University in New Orleans.<br />
<br />
*WHERE: Tulane University Lavin-Bernick Center, Suite 212, LBC, located at 201 Boggs <br />
**Transit: easy transit via streetcar, see directions from conference hotel here: http://goo.gl/maps/kHEp5<br />
<br />
We're looking forward to gathering the SAA community around to discuss digital curation.<br />
<br />
CURATEcamp is based on the BarCamp or "unconference" model. Provided below is some information on what you can expect of CURATEcamp, what will be expected of the Campers, the overall theme for discussion topics, and some next steps for you.<br />
<br />
To learn more about CURATEcamp, visit http://curatecamp.org/about which includes information about past camps. There is also a google group and wiki. Join in on the discussion & see what others have been talking about:<br />
<br />
*https://groups.google.com/forum/?fromgroups#!forum/digital-curation<br />
<br />
*http://wiki.curatecamp.org/index.php/Main_Page<br />
<br />
<br />
== CAMPING 101 ==<br />
<br />
*Bring a computing device. This is optional and certainly not required. However, it can be useful for demonstration and interaction purposes.<br />
<br />
*Get there a few minutes early to get situated and to grab your name tag. <br />
<br />
*We’ll start on Tuesday morning by gathering together in one space and going around the room to introduce ourselves. There will also be some housekeeping announcements and signing up for lightning talks. After this, we’ll jump into discussion. <br />
<br />
*We’ll follow the “open agenda” model, sitting in a roughly circular configuration. The meeting organizers will ask for proposed topics, recording them for a group vote, and then we'll decide how to parse out the topics for the rest of the day. We’ll allow some time to present your topic casually or via a 3-minute lightning talk, so come prepared if that’s your thing. Topics can include tools, theories, projects, processes, mysteries, conundrums, etc., related to any aspect of digital curation.<br />
<br />
*After discussing a few topics for the designated amount of time, we'll take a break for lunch. Lunch will be on your own, but we’ll suggest some locations on the wiki and you can ask your facilitators for directions. We encourage moving in packs so you use lunchtime to interact with fellow campers and keep the conversations going.<br />
<br />
*The afternoon will be formatted in breakout sessions based on the general topical discussions from the morning. The main purpose is to connect folks who are eager to share their perspectives and hear those of others. Late in the afternoon, we begin wrapping up the discussion.<br />
<br />
*For those who are interested, we can go to a local spot for a bite and drinks at the end of the day so we can discuss what we've heard and continue networking with peers!<br />
<br />
See [[CURATEcamp SAA 2013 Schedule]] for more details.<br />
<br />
<br />
== HOW TO PREPARE FOR CURATECAMP ==<br />
<br />
<br />
Start thinking about topics you would like to discuss. Here are a few ideas:<br />
<br />
* Workflows used for acquiring, processing, preserving and providing access to digital archives<br />
<br />
* Novel uses of tools for preserving born-digital material<br />
<br />
* Challenges you’re facing with hardware, software or legacy media<br />
<br />
* Emulation or migration strategies<br />
<br />
* Use cases on providing access to born-digital material<br />
<br />
* Demos of software<br />
<br />
* Integration of data curation into the archives workflow<br />
<br />
* How can archivists be actively involved in software/curation services development<br />
<br />
<br />
Need more ideas? See the resources listed below.<br />
<br />
'''Resources:'''<br />
<br />
*http://curatecamp.org/ (look at past CURATEcamp pages for ideas on left hand side of page)<br />
<br />
*http://www.digitalpreservation.gov/index.php<br />
<br />
*http://www.dcc.ac.uk/digital-curation/what-digital-curation<br />
<br />
*http://www.dpconline.org/<br />
<br />
*http://datacurationprofiles.org/ (see also Resources Page)<br />
<br />
<br />
<br />
== WHAT YOU CAN DO BETWEEN NOW AND THE CAMP ==<br />
<br />
First and foremost, think about some curation-related topics you'd like to discuss. When you're ready, you can add your ideas to the CURATEcamp wiki by simply requesting a login, then editing this page: [[CURATEcamp SAA 2012 Discussion Ideas]]<br />
<br />
If you have any questions or requests, send them to or Courtney C. Mumma (courtney@artefactual.com) or Cristela Garcia-Spitz (cgarciaspitz@ucsd.edu). <br />
<br />
We're looking forward to seeing y'all in N'awlins!<br />
<br />
-CURATEcamp UnOrganizers<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Transportation_Info&diff=2994CURATEcamp SAA 2013 Transportation Info2013-08-13T13:59:07Z<p>Courtney C. Mumma: </p>
<hr />
<div>The event will be held at Tulane University's Lavin-Bernick Center, Suite 212, LBC, located at 201 Boggs. You'll find easy transit via streetcar. See directions from conference hotel here: http://goo.gl/maps/kHEp5<br />
<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Schedule&diff=2993CURATEcamp SAA 2013 Schedule2013-08-13T13:58:25Z<p>Courtney C. Mumma: </p>
<hr />
<div>'''CURATEcamp SAA 2013 Pre-conference'''<br/><br />
*WHEN: Tuesday, August 13, 2013, 10am - 4pm<br/><br />
*WHERE: Tulane University [http://tulane.edu/studentaffairs/lbc/ Lavin-Bernick Center], Suite 212, LBC, located at 201 Boggs <br />
**Transit: easy transit via streetcar, see directions from conference hotel here: http://goo.gl/maps/kHEp5<br/><br />
<br/><br />
{| class="wikitable"<br />
|- <br />
'''Schedule'''<br />
|-<br />
| 9:45 - 10:00 am || Arrive early to plug in, get your nametag, and sign up for a lightning talk.<br />
|-<br />
| 10:00 - 10:30 || Welcome and introductions<br />
|-<br />
| 10:30 - 11:00 || Topic selection and session planning<br />
|-<br />
| 11:00 - 12:00 || Group discussion of topics (10 minutes each to help decide what to attend in the PM with allowance for 3-minute lightning talks to introduce topics)<br />
|-<br />
| 12:00 - 1:30 pm || Lunch on your own<br />
|-<br />
| 1:30 - 3:30 || Breakout sessions<br />
|-<br />
| 3:30 - 4:00 || Wrap-up (next steps, evaluations)<br />
|-<br />
| 5:30 - 8ish || Pub sessions (location [http://barcadianeworleans.com/ Barcadia], near the conference hotel, open to all)<br />
|-<br />
|}<br />
<br/><br />
For Twitter: #CURATEcamp<br />
For IRC chat: #curatecamp on irc.freenode.net <br />
<br/><br />
<br/><br />
<br />
== Lunch spot suggestions: ==<br />
<br />
'''Best to stay indoors:'''<br />
*Lavin-Bernick Center Foodcourt: http://www.diningservices.tulane.edu/locations/lbc.html<br />
<br />
(Open during summer): city diner express, wow cafe & wingery, byblos, einstein bros. bagels, panda express, sushi nori<br />
<br />
<br />
'''Able to handle the heat:'''<br />
*Cafe Freret: [http://www.cafefreret.com website] [http://www.yelp.com/biz/cafe-freret-new-orleans review]<br />
*Crêpes à la Cart (to go): [http://www.crepecaterer.com/ website] [http://www.yelp.com/biz/cr%C3%AApes-%C3%A0-la-cart-new-orleans-2 review]<br />
*Boot: [http://www.thebootneworleans.com/ website] [http://www.yelp.com/biz/the-boot-new-orleans review]<br />
<br />
<br />
'''Glutton for the heat & Able to walk fast:'''<br />
*Babylon Cafe: [http://babyloncafe.biz/ website] [http://www.yelp.com/biz/babylon-cafe-new-orleans review]<br />
*Maple Street Cafe: [http://www.maplestreetcafenola.com/ website] [http://www.yelp.com/biz/maple-street-cafe-new-orleans review]<br />
*Satsuma Cafe: [http://satsumacafe.com website] [http://www.yelp.com/biz/satsuma-caf%C3%A9-new-orleans-4 review]<br />
*Favori Deli: [http://www.yelp.com/biz/favori-deli-new-orleans review]<br />
*Ba Chi Canteen: [https://www.facebook.com/bachicanteenla website] [http://www.yelp.com/biz/ba-chi-canteen-new-orleans review]<br />
<br />
For more on Maple Street: http://www.neworleansonline.com/tools/streets/maplestreet.html <br />
<br />
<br />
<br />
'''Caffeine?'''<br />
*Pj's: http://www.diningservices.tulane.edu/locations/pjs.html<br />
<br />
<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2992CURATEcamp SAA 20132013-08-13T13:57:43Z<p>Courtney C. Mumma: </p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August 13th, 2013 (10am - 4pm)<br />
*WHERE: Tulane University Lavin-Bernick Center, Suite 212, LBC, located at 201 Boggs <br />
**Transit: easy transit via streetcar, see directions from conference hotel here: http://goo.gl/maps/kHEp5<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can: [http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&&InvID_W=2692 SAA 2013 CURATEcamp registration]<br />
<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit [http://curatecamp.org/pages/how-it-works CURATEcamp - How it works] or [[CURATEcamp SAA 2012]] for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Discussion_Ideas&diff=2985CURATEcamp SAA 2013 Discussion Ideas2013-08-07T19:03:41Z<p>Courtney C. Mumma: </p>
<hr />
<div>Feel free to use this space to share ideas for discussion at CURATEcamp 2013. Request an account at the log in screen.<br />
<br />
Topics can include tools, theories, projects, processes, mysteries, conundrums, etc., related to any aspect of digital curation.<br />
<br />
----<br />
<br />
'''Topic you are interested in''' (Your name): if you'd like, provide a sentence or three about your topic.<br />
<br />
* '''Curation workflows''' This was overwhelmingly the most requested topic. This can be general or specific, or both! Specifically, some of you mentioned digital forensics workflows, acquisition workflows, and data curation workflows.<br />
<br />
* '''Application of best practices and standards''' Many of you wanted to talk about how institutions stay current with the application of best practices and standards, and how much some repositories might stray from them and why.<br />
<br />
* '''Collisions in organizational culture''' This topic centers around the struggle in some organizations when IT and archives departments wrestle with who is responsible for what parts or all of the digital curation workflow.<br />
<br />
* '''Email preservation''' I think I just heard a collective groan in the community as I typed that title... meaning we should probably talk about it? Some of you thought so, too.<br />
<br />
* '''Preservation of business systems''' Very interesting, but hoping the person who suggested this topic will unpack it a bit for us.<br />
<br />
* '''How can archivists be directly involved in software/curation services development?''' This seems to be about how to be involved and have influence in community development of systems and micro-services for digital curation.<br />
<br />
* '''Digital humanities''' How do/should the DH influence what archivists do?<br />
<br />
* '''Emulation, normalization, or both?''' (Courtney Mumma)<br />
<br />
* '''Should we be compressing AIPs?''' (Courtney Mumma): Compressing AIPs saves storage, but brings up issues of sustainability of the system in the worst case scenario (ie if the info about the compression is stored in the AIPs and/or in the management system, but all you have is the AIPs, how do you unpack them in the apocolypse scenario) as well as issues of authenticity. <br />
<br />
* '''Workarounds''' (Cristela Garcia-Spitz) Share any workarounds that you've developed (between tools or for different metadata, e.g. EAD to METS/MODS) from the most efficient to the most frustrating.<br />
<br />
* '''Dirty laundry''' (Courtney Mumma) What do we do in digital that we would never do in analog?<br />
<br />
* '''Buried records''' (Abby Adams) What happens to original born-digital materials within closed collections, particularly those with time seals of 25 years or more? Are dark archives a way to preserve them or are we simply digging them an early grave?<br />
<br />
<br />
----<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Discussion_Ideas&diff=2984CURATEcamp SAA 2013 Discussion Ideas2013-08-07T18:55:31Z<p>Courtney C. Mumma: </p>
<hr />
<div>Feel free to use this space to share ideas for discussion at CURATEcamp 2013. Request an account at the log in screen.<br />
<br />
Topics can include tools, theories, projects, processes, mysteries, conundrums, etc., related to any aspect of digital curation.<br />
<br />
----<br />
<br />
'''Topic you are interested in''' (Your name): if you'd like, provide a sentence or three about your topic.<br />
<br />
* '''Curation workflows''' This was overwhelmingly the most requested topic. This can be general or specific, or both! Specifically, some of you mentioned digital forensics workflows, acquisition workflows, and data curation workflows.<br />
<br />
* '''Application of best practices and standards''' Many of you wanted to talk about how institutions stay current with the application of best practices and standards, and how much some repositories might stray from them and why.<br />
<br />
* '''Collisions in organizational culture''' This topic centers around the struggle in some organizations when IT and archives departments wrestle with who is responsible for what parts or all of the digital curation workflow.<br />
<br />
* '''Email preservation''' I think I just heard a collective groan in the community as I typed that title... meaning we should probably talk about it? Some of you thought so, too.<br />
<br />
* '''Preservation of business systems''' Very interesting, but hoping the person who suggested this topic will unpack it a bit for us.<br />
<br />
* '''How can archivists be directly involved in software/curation services development''' This seems to be about how to be involved and have influence in community development of systems and micro-services for digital curation.<br />
<br />
* '''Digital humanities''' How do/should the DH influence what archivists do?<br />
<br />
* '''Emulation, normalization, or both?''' (Courtney Mumma)<br />
<br />
* '''Should we be compressing AIPs?''' (Courtney Mumma): Compressing AIPs saves storage, but brings up issues of sustainability of the system in the worst case scenario (ie if the info about the compression is stored in the AIPs and/or in the management system, but all you have is the AIPs, how do you unpack them in the apocolypse scenario) as well as issues of authenticity. <br />
<br />
* '''Workarounds''' (Cristela Garcia-Spitz) Share any workarounds that you've developed (between tools or for different metadata, e.g. EAD to METS/MODS) from the most efficient to the most frustrating.<br />
<br />
* '''Dirty laundry''' (Courtney Mumma) What do we do in digital that we would never do in analog?<br />
<br />
* '''Buried records''' (Abby Adams) What happens to original born-digital materials within closed collections, particularly those with time seals of 25 years or more? Are dark archives a way to preserve them or are we simply digging them an early grave?<br />
<br />
<br />
----<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Schedule&diff=2979CURATEcamp SAA 2013 Schedule2013-08-01T18:58:24Z<p>Courtney C. Mumma: </p>
<hr />
<div>'''CURATEcamp SAA 2013 Pre-conference'''<br/><br />
*WHEN: Tuesday, August 13, 2013, 10am - 4pm<br/><br />
*WHERE: Tulane University [http://tulane.edu/studentaffairs/lbc/ Lavin-Bernick Center], Suite 218, LBC, located at 201 Boggs <br />
**Transit: easy transit via streetcar, see directions from conference hotel here: http://goo.gl/maps/kHEp5<br/><br />
<br/><br />
{| class="wikitable"<br />
|- <br />
'''Schedule'''<br />
|-<br />
| 9:45 - 10:00 am || Arrive early to plug in, get your nametag, and sign up for a lightning talk.<br />
|-<br />
| 10:00 - 10:30 || Welcome and introductions<br />
|-<br />
| 10:30 - 11:00 || Topic selection and session planning<br />
|-<br />
| 11:00 - 12:00 || Group discussion of topics (10 minutes each to help decide what to attend in the PM with allowance for 3-minute lightning talks to introduce topics)<br />
|-<br />
| 12:00 - 1:30 pm || Lunch on your own<br />
|-<br />
| 1:30 - 3:30 || Breakout sessions<br />
|-<br />
| 3:30 - 4:00 || Wrap-up (next steps, evaluations)<br />
|-<br />
| 5:30 - 8ish || Pub sessions (location TBD, near the conference hotel, open to all)<br />
|-<br />
|}<br />
<br/><br />
For Twitter: #CURATEcamp<br />
For IRC chat: #curatecamp on irc.freenode.net <br />
<br/><br />
<br/><br />
<br />
== Lunch spot suggestions: ==<br />
<br />
'''Best to stay indoors:'''<br />
*Lavin-Bernick Center Foodcourt: http://www.diningservices.tulane.edu/locations/lbc.html<br />
<br />
(Open during summer): city diner express, wow cafe & wingery, byblos, einstein bros. bagels, panda express, sushi nori<br />
<br />
<br />
'''Able to handle the heat:'''<br />
*Cafe Freret: [http://www.cafefreret.com website] [http://www.yelp.com/biz/cafe-freret-new-orleans review]<br />
*Crêpes à la Cart (to go): [http://www.crepecaterer.com/ website] [http://www.yelp.com/biz/cr%C3%AApes-%C3%A0-la-cart-new-orleans-2 review]<br />
*Boot: [http://www.thebootneworleans.com/ website] [http://www.yelp.com/biz/the-boot-new-orleans review]<br />
<br />
<br />
'''Glutton for the heat & Able to walk fast:'''<br />
*Babylon Cafe: [http://babyloncafe.biz/ website] [http://www.yelp.com/biz/babylon-cafe-new-orleans review]<br />
*Maple Street Cafe: [http://www.maplestreetcafenola.com/ website] [http://www.yelp.com/biz/maple-street-cafe-new-orleans review]<br />
*Satsuma Cafe: [http://satsumacafe.com website] [http://www.yelp.com/biz/satsuma-caf%C3%A9-new-orleans-4 review]<br />
*Favori Deli: [http://www.yelp.com/biz/favori-deli-new-orleans review]<br />
*Ba Chi Canteen: [https://www.facebook.com/bachicanteenla website] [http://www.yelp.com/biz/ba-chi-canteen-new-orleans review]<br />
<br />
For more on Maple Street: http://www.neworleansonline.com/tools/streets/maplestreet.html <br />
<br />
<br />
<br />
'''Caffeine?'''<br />
*Pj's: http://www.diningservices.tulane.edu/locations/pjs.html<br />
<br />
<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Schedule&diff=2978CURATEcamp SAA 2013 Schedule2013-08-01T18:56:58Z<p>Courtney C. Mumma: </p>
<hr />
<div>'''CURATEcamp SAA 2013 Pre-conference'''<br/><br />
*WHEN: Tuesday, August 13, 2013, 10am - 4pm<br/><br />
*WHERE: Tulane University [http://tulane.edu/studentaffairs/lbc/ Lavin-Bernick Center], Suite 218, LBC, located at 201 Boggs <br />
**Transit: easy transit via streetcar, see directions from conference hotel here: http://goo.gl/maps/kHEp5<br/><br />
<br/><br />
{| class="wikitable"<br />
|- <br />
'''Schedule'''<br />
|-<br />
| 9:45 - 10:00 am || Arrive early to plug in, get your nametag, and sign up for a lightning talk.<br />
|-<br />
| 10:00 - 10:30 || Welcome and introductions<br />
|-<br />
| 10:30 - 11:00 || Topic selection and session planning<br />
|-<br />
| 11:00 - 12:00 || Group discussion of topics (10 minutes each to help decide what to attend in the PM with allowance for 3-minute lightning talks to introduce topics)<br />
|-<br />
| 12:00 - 1:30 pm || Lunch on your own<br />
|-<br />
| 1:30 - 3:30 || Breakout sessions<br />
|-<br />
| 3:30 - 4:00 || Wrap-up (next steps, evaluations)<br />
|-<br />
| 4:00 - 6:00 || Pub sessions (location TBD)<br />
|-<br />
|}<br />
<br/><br />
For Twitter: #CURATEcamp<br />
For IRC chat: #curatecamp on irc.freenode.net <br />
<br/><br />
<br/><br />
<br />
== Lunch spot suggestions: ==<br />
<br />
'''Best to stay indoors:'''<br />
*Lavin-Bernick Center Foodcourt: http://www.diningservices.tulane.edu/locations/lbc.html<br />
<br />
(Open during summer): city diner express, wow cafe & wingery, byblos, einstein bros. bagels, panda express, sushi nori<br />
<br />
<br />
'''Able to handle the heat:'''<br />
*Cafe Freret: [http://www.cafefreret.com website] [http://www.yelp.com/biz/cafe-freret-new-orleans review]<br />
*Crêpes à la Cart (to go): [http://www.crepecaterer.com/ website] [http://www.yelp.com/biz/cr%C3%AApes-%C3%A0-la-cart-new-orleans-2 review]<br />
*Boot: [http://www.thebootneworleans.com/ website] [http://www.yelp.com/biz/the-boot-new-orleans review]<br />
<br />
<br />
'''Glutton for the heat & Able to walk fast:'''<br />
*Babylon Cafe: [http://babyloncafe.biz/ website] [http://www.yelp.com/biz/babylon-cafe-new-orleans review]<br />
*Maple Street Cafe: [http://www.maplestreetcafenola.com/ website] [http://www.yelp.com/biz/maple-street-cafe-new-orleans review]<br />
*Satsuma Cafe: [http://satsumacafe.com website] [http://www.yelp.com/biz/satsuma-caf%C3%A9-new-orleans-4 review]<br />
*Favori Deli: [http://www.yelp.com/biz/favori-deli-new-orleans review]<br />
*Ba Chi Canteen: [https://www.facebook.com/bachicanteenla website] [http://www.yelp.com/biz/ba-chi-canteen-new-orleans review]<br />
<br />
For more on Maple Street: http://www.neworleansonline.com/tools/streets/maplestreet.html <br />
<br />
<br />
<br />
'''Caffeine?'''<br />
*Pj's: http://www.diningservices.tulane.edu/locations/pjs.html<br />
<br />
<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=Welcome_SAA_2013_Campers&diff=2977Welcome SAA 2013 Campers2013-08-01T18:54:05Z<p>Courtney C. Mumma: Created page with "'''Howdy, Campers!''' Thanks for registering! & Welcome to CURATECamp SAA 2013, to be held Tuesday, August 13, 2013 from 10am to 4pm at Tulane University in New Orleans. *WHE..."</p>
<hr />
<div>'''Howdy, Campers!''' <br />
<br />
Thanks for registering! & Welcome to CURATECamp SAA 2013, to be held Tuesday, August 13, 2013 from 10am to 4pm at Tulane University in New Orleans.<br />
<br />
*WHERE: Tulane University Lavin-Bernick Center, Suite 218, LBC, located at 201 Boggs <br />
**Transit: easy transit via streetcar, see directions from conference hotel here: http://goo.gl/maps/kHEp5<br />
<br />
We're looking forward to gathering the SAA community around to discuss digital curation.<br />
<br />
CURATEcamp is based on the BarCamp or "unconference" model. Provided below is some information on what you can expect of CURATEcamp, what will be expected of the Campers, the overall theme for discussion topics, and some next steps for you.<br />
<br />
To learn more about CURATEcamp, visit http://curatecamp.org/about which includes information about past camps. There is also a google group and wiki. Join in on the discussion & see what others have been talking about:<br />
<br />
*https://groups.google.com/forum/?fromgroups#!forum/digital-curation<br />
<br />
*http://wiki.curatecamp.org/index.php/Main_Page<br />
<br />
<br />
== CAMPING 101 ==<br />
<br />
*Bring a computing device. This is optional and certainly not required. However, it can be useful for demonstration and interaction purposes.<br />
<br />
*Get there a few minutes early to get situated and to grab your name tag. <br />
<br />
*We’ll start on Tuesday morning by gathering together in one space and going around the room to introduce ourselves. There will also be some housekeeping announcements and signing up for lightning talks. After this, we’ll jump into discussion. <br />
<br />
*We’ll follow the “open agenda” model, sitting in a roughly circular configuration. The meeting organizers will ask for proposed topics, recording them for a group vote, and then we'll decide how to parse out the topics for the rest of the day. We’ll allow some time to present your topic casually or via a 3-minute lightning talk, so come prepared if that’s your thing. Topics can include tools, theories, projects, processes, mysteries, conundrums, etc., related to any aspect of digital curation.<br />
<br />
*After discussing a few topics for the designated amount of time, we'll take a break for lunch. Lunch will be on your own, but we’ll suggest some locations on the wiki and you can ask your facilitators for directions. We encourage moving in packs so you use lunchtime to interact with fellow campers and keep the conversations going.<br />
<br />
*The afternoon will be formatted in breakout sessions based on the general topical discussions from the morning. The main purpose is to connect folks who are eager to share their perspectives and hear those of others. Late in the afternoon, we begin wrapping up the discussion.<br />
<br />
*For those who are interested, we can go to a local spot for a bite and drinks at the end of the day so we can discuss what we've heard and continue networking with peers!<br />
<br />
See [[CURATEcamp SAA 2013 Schedule]] for more details.<br />
<br />
<br />
== HOW TO PREPARE FOR CURATECAMP ==<br />
<br />
<br />
Start thinking about topics you would like to discuss. Here are a few ideas:<br />
<br />
* Workflows used for acquiring, processing, preserving and providing access to digital archives<br />
<br />
* Novel uses of tools for preserving born-digital material<br />
<br />
* Challenges you’re facing with hardware, software or legacy media<br />
<br />
* Emulation or migration strategies<br />
<br />
* Use cases on providing access to born-digital material<br />
<br />
* Demos of software<br />
<br />
* Integration of data curation into the archives workflow<br />
<br />
* How can archivists be actively involved in software/curation services development<br />
<br />
<br />
Need more ideas? See the resources listed below.<br />
<br />
'''Resources:'''<br />
<br />
*http://curatecamp.org/ (look at past CURATEcamp pages for ideas on left hand side of page)<br />
<br />
*http://www.digitalpreservation.gov/index.php<br />
<br />
*http://www.dcc.ac.uk/digital-curation/what-digital-curation<br />
<br />
*http://www.dpconline.org/<br />
<br />
*http://datacurationprofiles.org/ (see also Resources Page)<br />
<br />
<br />
<br />
== WHAT YOU CAN DO BETWEEN NOW AND THE CAMP ==<br />
<br />
First and foremost, think about some curation-related topics you'd like to discuss. When you're ready, you can add your ideas to the CURATEcamp wiki by simply requesting a login, then editing this page: [[CURATEcamp SAA 2012 Discussion Ideas]]<br />
<br />
If you have any questions or requests, send them to or Courtney C. Mumma (courtney@artefactual.com) or Cristela Garcia-Spitz (cgarciaspitz@ucsd.edu). <br />
<br />
We're looking forward to seeing y'all in N'awlins!<br />
<br />
-CURATEcamp UnOrganizers<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Transportation_Info&diff=2835CURATEcamp SAA 2013 Transportation Info2013-07-16T20:10:27Z<p>Courtney C. Mumma: </p>
<hr />
<div>The event will be held at Tulane University's Lavin-Bernick Center, Suite 218, LBC, located at 201 Boggs. You'll find easy transit via streetcar. See directions from conference hotel here: http://goo.gl/maps/kHEp5<br />
<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Transportation_Info&diff=2834CURATEcamp SAA 2013 Transportation Info2013-07-16T20:10:14Z<p>Courtney C. Mumma: </p>
<hr />
<div>The event will be held at Tulane University's Lavin-Bernick Center, Suite 218, LBC, located at 201 Boggs. You'll find easy transit via streetcar. See directions from conference hotel here: http://goo.gl/maps/kHEp5<br />
<br />
<iframe width="425" height="350" frameborder="0" scrolling="no" marginheight="0" marginwidth="0" src="https://maps.google.com/maps?saddr=Hilton+New+Orleans+Riverside,+Poydras+Street,+New+Orleans,+LA&amp;daddr=Lavin-Bernick+Center,+Tulane+University,+Tulane+University,+201+Boggs,+New+Orleans,+LA+70118&amp;hl=en&amp;sll=29.94509,-90.092324&amp;sspn=0.058829,0.077162&amp;geocode=FfL3yAEd_7uh-iHgzV4tGy56Mim9p5oKbaYghjHgzV4tGy56Mg%3BFaLYyAEdh9-g-imdF9zyD6UghjEwOHENBkbpEw&amp;oq=hil2+Poydras+Street,+New+Orleans,+LA&amp;mra=ltm&amp;t=m&amp;ie=UTF8&amp;ll=29.94509,-90.092324&amp;spn=0.017404,0.058044&amp;output=embed"></iframe><br /><small><a href="https://maps.google.com/maps?saddr=Hilton+New+Orleans+Riverside,+Poydras+Street,+New+Orleans,+LA&amp;daddr=Lavin-Bernick+Center,+Tulane+University,+Tulane+University,+201+Boggs,+New+Orleans,+LA+70118&amp;hl=en&amp;sll=29.94509,-90.092324&amp;sspn=0.058829,0.077162&amp;geocode=FfL3yAEd_7uh-iHgzV4tGy56Mim9p5oKbaYghjHgzV4tGy56Mg%3BFaLYyAEdh9-g-imdF9zyD6UghjEwOHENBkbpEw&amp;oq=hil2+Poydras+Street,+New+Orleans,+LA&amp;mra=ltm&amp;t=m&amp;ie=UTF8&amp;ll=29.94509,-90.092324&amp;spn=0.017404,0.058044&amp;source=embed" style="color:#0000FF;text-align:left">View Larger Map</a></small><br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Transportation_Info&diff=2833CURATEcamp SAA 2013 Transportation Info2013-07-16T20:09:50Z<p>Courtney C. Mumma: </p>
<hr />
<div><br />
The event will be held at Tulane University's Lavin-Bernick Center, Suite 218, LBC, located at 201 Boggs. You'll find easy transit via streetcar. See directions from conference hotel here: http://goo.gl/maps/kHEp5<br />
<br />
<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Schedule&diff=2832CURATEcamp SAA 2013 Schedule2013-07-16T20:08:20Z<p>Courtney C. Mumma: </p>
<hr />
<div>'''CURATEcamp SAA 2012 Pre-conference'''<br/><br />
*WHEN: Tuesday, August 13, 2013, 10am - 4pm<br/><br />
*WHERE: Tulane University Lavin-Bernick Center, Suite 218, LBC, located at 201 Boggs <br />
**Transit: easy transit via streetcar, see directions from conference hotel here: http://goo.gl/maps/kHEp5<br/><br />
<br />
<br />
<br/><br />
{| class="wikitable"<br />
|- <br />
'''Schedule'''<br />
|-<br />
| 9:45 - 10:00 am || Arrive early to plug in, get your nametag, and sign up for a lightning talk.<br />
|-<br />
| 10:00 - 10:30 || Welcome and introductions<br />
|-<br />
| 10:30 - 11:00 || Topic selection and session planning<br />
|-<br />
| 11:00 - 12:00 || Group discussion of topics (10 minutes each to help decide what to attend in the PM with allowance for 3-minute lightning talks to introduce topics)<br />
|-<br />
| 12:00 - 1:30 pm || Lunch on your own<br />
|-<br />
| 1:30 - 3:30 || Breakout sessions<br />
|-<br />
| 3:30 - 4:00 || Wrap-up (next steps, evaluations)<br />
|-<br />
| 4:00 - 6:00 || Pub sessions (location TBD)<br />
|-<br />
|}<br />
<br/><br />
For Twitter: #CURATEcamp<br />
<br/><br />
<br/><br />
<br />
Lunch spot suggestions:<br />
<br />
*<br />
*<br />
*<br />
<br />
For more options, see:<br />
<br />
<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=Welcome_SAA_2013_Campers!&diff=2831Welcome SAA 2013 Campers!2013-07-16T20:07:38Z<p>Courtney C. Mumma: </p>
<hr />
<div>'''Howdy, Campers!''' <br />
<br />
Thanks for registering! & Welcome to CURATECamp SAA 2013, to be held Tuesday, August 13, 2013 from 10am to 4pm at Tulane University in New Orleans.<br />
<br />
*WHERE: Tulane University Lavin-Bernick Center, Suite 218, LBC, located at 201 Boggs <br />
**Transit: easy transit via streetcar, see directions from conference hotel here: http://goo.gl/maps/kHEp5<br />
<br />
We're looking forward to gathering the SAA community around to discuss digital curation.<br />
<br />
CURATEcamp is based on the BarCamp or "unconference" model. Provided below is some information on what you can expect of CURATEcamp, what will be expected of the Campers, the overall theme for discussion topics, and some next steps for you.<br />
<br />
To learn more about CURATEcamp, visit http://curatecamp.org/about which includes information about past camps. There is also a google group and wiki. Join in on the discussion & see what others have been talking about:<br />
<br />
*https://groups.google.com/forum/?fromgroups#!forum/digital-curation<br />
<br />
*http://wiki.curatecamp.org/index.php/Main_Page<br />
<br />
<br />
== CAMPING 101 ==<br />
<br />
*Bring a computing device. This is optional and certainly not required. However, it can be useful for demonstration and interaction purposes.<br />
<br />
*Get there a few minutes early to get situated and to grab your name tag. <br />
<br />
*We’ll start on Tuesday morning by gathering together in one space and going around the room to introduce ourselves. There will also be some housekeeping announcements and signing up for lightning talks. After this, we’ll jump into discussion. <br />
<br />
*We’ll follow the “open agenda” model, sitting in a roughly circular configuration. The meeting organizers will ask for proposed topics, recording them for a group vote, and then we'll decide how to parse out the topics for the rest of the day. We’ll allow some time to present your topic casually or via a 3-minute lightning talk, so come prepared if that’s your thing. Topics can include tools, theories, projects, processes, mysteries, conundrums, etc., related to any aspect of digital curation.<br />
<br />
*After discussing a few topics for the designated amount of time, we'll take a break for lunch. Lunch will be on your own, but we’ll suggest some locations on the wiki and you can ask your facilitators for directions. We encourage moving in packs so you use lunchtime to interact with fellow campers and keep the conversations going.<br />
<br />
*The afternoon will be formatted in breakout sessions based on the general topical discussions from the morning. The main purpose is to connect folks who are eager to share their perspectives and hear those of others. Late in the afternoon, we begin wrapping up the discussion.<br />
<br />
*For those who are interested, we can go to a local spot for a bite and drinks at the end of the day so we can discuss what we've heard and continue networking with peers!<br />
<br />
See [[CURATEcamp SAA 2013 Schedule]] for more details.<br />
<br />
<br />
== HOW TO PREPARE FOR CURATECAMP ==<br />
<br />
<br />
Start thinking about topics you would like to discuss. Here are a few ideas:<br />
<br />
* Workflows used for acquiring, processing, preserving and providing access to digital archives<br />
<br />
* Novel uses of tools for preserving born-digital material<br />
<br />
* Challenges you’re facing with hardware, software or legacy media<br />
<br />
* Emulation or migration strategies<br />
<br />
* Use cases on providing access to born-digital material<br />
<br />
* Demos of software<br />
<br />
* Integration of data curation into the archives workflow<br />
<br />
* How can archivists be actively involved in software/curation services development<br />
<br />
<br />
Need more ideas? See the resources listed below.<br />
<br />
'''Resources:'''<br />
<br />
*http://curatecamp.org/ (look at past CURATEcamp pages for ideas on left hand side of page)<br />
<br />
*http://www.digitalpreservation.gov/index.php<br />
<br />
*http://www.dcc.ac.uk/digital-curation/what-digital-curation<br />
<br />
*http://www.dpconline.org/<br />
<br />
*http://datacurationprofiles.org/ (see also Resources Page)<br />
<br />
<br />
<br />
== WHAT YOU CAN DO BETWEEN NOW AND THE CAMP ==<br />
<br />
First and foremost, think about some curation-related topics you'd like to discuss. When you're ready, you can add your ideas to the CURATEcamp wiki by simply requesting a login, then editing this page: [[CURATEcamp SAA 2012 Discussion Ideas]]<br />
<br />
If you have any questions or requests, send them to or Courtney C. Mumma (courtney@artefactual.com) or Cristela Garcia-Spitz (cgarciaspitz@ucsd.edu). <br />
<br />
We're looking forward to seeing y'all in N'awlins!<br />
<br />
-CURATEcamp UnOrganizers<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2830CURATEcamp SAA 20132013-07-16T20:06:47Z<p>Courtney C. Mumma: </p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August 13th, 2013 (10am - 4pm)<br />
*WHERE: Tulane University Lavin-Bernick Center, Suite 218, LBC, located at 201 Boggs <br />
**Transit: easy transit via streetcar, see directions from conference hotel here: http://goo.gl/maps/kHEp5<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can: [http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&&InvID_W=2692 SAA 2013 CURATEcamp registration]<br />
<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit [http://curatecamp.org/pages/how-it-works CURATEcamp - How it works] or [[CURATEcamp SAA 2012]] for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2829CURATEcamp SAA 20132013-07-16T20:05:25Z<p>Courtney C. Mumma: </p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August 13th, 2013 (10am - 4pm)<br />
*WHERE: Tulane University Lavin-Bernick Center, Suite 218, LBC, located at 201 Boggs <br />
(easy transit via streetcar, see directions from conference hotel here: http://goo.gl/maps/kHEp5)<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can: [http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&&InvID_W=2692 SAA 2013 CURATEcamp registration]<br />
<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit [http://curatecamp.org/pages/how-it-works CURATEcamp - How it works] or [[CURATEcamp SAA 2012]] for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Discussion_Ideas&diff=2821CURATEcamp SAA 2013 Discussion Ideas2013-06-25T21:10:11Z<p>Courtney C. Mumma: </p>
<hr />
<div>Feel free to use this space to share ideas for discussion at CURATEcamp 2013. Request an account at the log in screen.<br />
<br />
Topics can include tools, theories, projects, processes, mysteries, conundrums, etc., related to any aspect of digital curation.<br />
<br />
----<br />
<br />
'''Topic you are interested in''' (Your name): if you'd like, provide a sentence or three about your topic.<br />
<br />
* '''Emulation, normalization, or both?''' (Courtney Mumma): <br />
<br />
* '''Should we be compressing AIPs?''' (Courtney Mumma): Compressing AIPs saves storage, but brings up issues of sustainability of the system in the worst case scenario (ie if the info about the compression is stored in the AIPs and/or in the management system, but all you have is the AIPs, how do you unpack them in the apocolypse scenario) as well as issues of authenticity. <br />
<br />
* '''Workarounds''' (Cristela Garcia-Spitz)<br />
<br />
* '''Dirty laundry''' (Courtney Mumma) What do we do in digital that we would never do in analog?<br />
<br />
<br />
----<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Discussion_Ideas&diff=2820CURATEcamp SAA 2013 Discussion Ideas2013-06-24T18:57:37Z<p>Courtney C. Mumma: </p>
<hr />
<div>Feel free to use this space to share ideas for discussion at CURATEcamp 2013. Request an account at the log in screen.<br />
<br />
----<br />
<br />
'''Topic you are interested in''' (Your name): if you'd like, provide a sentence or three about your topic.<br />
<br />
* '''Emulation, normalization, or both?''' (Courtney Mumma): <br />
<br />
* '''Should we be compressing AIPs?''' (Courtney Mumma): Compressing AIPs saves storage, but brings up issues of sustainability of the system in the worst case scenario (ie if the info about the compression is stored in the AIPs and/or in the management system, but all you have is the AIPs, how do you unpack them in the apocolypse scenario) as well as issues of authenticity. <br />
<br />
* '''Workarounds''' (Cristela Garcia-Spitz)<br />
<br />
* '''Dirty laundry''' (Courtney Mumma) What do we do in digital that we would never do in analog?<br />
<br />
<br />
----<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Schedule&diff=2819CURATEcamp SAA 2013 Schedule2013-06-24T18:52:49Z<p>Courtney C. Mumma: </p>
<hr />
<div>'''CURATEcamp SAA 2012 Pre-conference'''<br/><br />
Tuesday, August 13, 2013, 10am - 4pm<br/><br />
Tulane University, New Orleans (details, maps and directions will be available soon)<br/><br />
<br />
<br />
<br/><br />
{| class="wikitable"<br />
|- <br />
'''Schedule'''<br />
|-<br />
| 9:45 - 10:00 am || Arrive early to plug in, get your nametag, and sign up for a lightning talk.<br />
|-<br />
| 10:00 - 10:30 || Welcome and introductions<br />
|-<br />
| 10:30 - 11:00 || Topic selection and session planning<br />
|-<br />
| 11:00 - 12:00 || Group discussion of topics (10 minutes each to help decide what to attend in the PM with allowance for 3-minute lightning talks to introduce topics)<br />
|-<br />
| 12:00 - 1:30 pm || Lunch on your own<br />
|-<br />
| 1:30 - 3:30 || Breakout sessions<br />
|-<br />
| 3:30 - 4:00 || Wrap-up (next steps, evaluations)<br />
|-<br />
| 4:00 - 6:00 || Pub sessions (location TBD)<br />
|-<br />
|}<br />
<br/><br />
For Twitter: #CURATEcamp<br />
<br/><br />
<br/><br />
<br />
Lunch spot suggestions:<br />
<br />
*<br />
*<br />
*<br />
<br />
For more options, see:<br />
<br />
<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2818CURATEcamp SAA 20132013-06-24T18:51:26Z<p>Courtney C. Mumma: </p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August 13th, 2013 (10am - 4pm)<br />
*WHERE: Tulane University, New Orleans (easy transit via streetcar)<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can: [http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&&InvID_W=2692 SAA 2013 CURATEcamp registration]<br />
<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit [http://curatecamp.org/pages/how-it-works CURATEcamp - How it works]for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Schedule&diff=2817CURATEcamp SAA 2013 Schedule2013-06-21T21:41:35Z<p>Courtney C. Mumma: </p>
<hr />
<div>'''CURATEcamp SAA 2012 Pre-conference'''<br/><br />
Tuesday, August 13, 2013, 10am - 4pm<br/><br />
Location TBD<br/><br />
<br />
<br />
<br/><br />
{| class="wikitable"<br />
|- <br />
'''Schedule'''<br />
|-<br />
| 9:45 - 10:00 am || Arrive early to plug in, get your nametag, and sign up for a lightning talk.<br />
|-<br />
| 10:00 - 10:30 || Welcome and introductions<br />
|-<br />
| 10:30 - 11:00 || Topic selection and session planning<br />
|-<br />
| 11:00 - 12:00 || Group discussion of topics (10 minutes each to help decide what to attend in the PM with allowance for 3-minute lightning talks to introduce topics)<br />
|-<br />
| 12:00 - 1:30 pm || Lunch on your own<br />
|-<br />
| 1:30 - 3:30 || Breakout sessions<br />
|-<br />
| 3:30 - 4:00 || Wrap-up (next steps, evaluations)<br />
|-<br />
| 4:00 - 6:00 || Pub sessions (location TBD)<br />
|-<br />
|}<br />
<br/><br />
For Twitter: #CURATEcamp<br />
<br/><br />
<br/><br />
<br />
Lunch spot suggestions:<br />
<br />
*<br />
*<br />
*<br />
<br />
For more options, see:<br />
<br />
<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Schedule&diff=2816CURATEcamp SAA 2013 Schedule2013-06-21T21:41:01Z<p>Courtney C. Mumma: </p>
<hr />
<div>'''CURATEcamp SAA 2012 Pre-conference'''<br/><br />
Tuesday, August 13, 2013, 10am - 4pm<br/><br />
Location TBD<br/><br />
<br />
<br />
<br/><br />
{| class="wikitable"<br />
|- <br />
'''Schedule'''<br />
|-<br />
| 9:45 - 10:00 am || Arrive early to plug in, get your nametag, and sign up for a lightning talk.<br />
|-<br />
| 10:00 - 10:30 || Welcome and introductions<br />
|-<br />
| 10:30 - 11:00 || Topic selection and session planning<br />
|-<br />
| 11:00 - 12:00 || Group discussion of topics (10 minutes each to help decide what to attend in the PM with allowance for 3-minute lightning talks to introduce topics)<br />
|-<br />
| 12:00 - 1:30 pm || Lunch on your own<br />
|-<br />
| 1:30 - 3:30 || Breakout sessions<br />
|-<br />
| 3:30 - 4:00 || Wrap-up (next steps, evaluations)<br />
|-<br />
| 4:00 - 6:00 || Pub sessions<br />
|-<br />
|}<br />
<br/><br />
For Twitter: #CURATEcamp<br />
<br/><br />
<br/><br />
<br />
Lunch spot suggestions:<br />
<br />
*<br />
*<br />
*<br />
<br />
For more options, see:<br />
<br />
<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2815CURATEcamp SAA 20132013-06-21T21:39:15Z<p>Courtney C. Mumma: /* Description */</p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August 13th, 2013 (10am - 4pm)<br />
*WHERE: New Orleans, Offsite, TBD<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can: [http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&&InvID_W=2692 SAA 2013 CURATEcamp registration]<br />
<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit [http://curatecamp.org/pages/how-it-works CURATEcamp - How it works]for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2814CURATEcamp SAA 20132013-06-21T21:38:38Z<p>Courtney C. Mumma: /* Description */</p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August 13th, 2013 (10am - 4pm)<br />
*WHERE: New Orleans, Offsite, TBD<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can: [http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&&InvID_W=2692 SAA 2013 CURATEcamp registration]<br />
<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit [http://curatecamp.org/pages/how-it-works CURATEcamp - How it works] or [[CURATEcamp SAA 2013]] for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Schedule&diff=2813CURATEcamp SAA 2013 Schedule2013-06-21T21:37:53Z<p>Courtney C. Mumma: </p>
<hr />
<div>'''CURATEcamp SAA 2012 Pre-conference'''<br/><br />
Tuesday, August 13, 2013, 10am - 4pm<br/><br />
Location TBD<br/><br />
<br />
<br />
<br/><br />
{| class="wikitable"<br />
|- <br />
'''Schedule'''<br />
|-<br />
| 9:45 - 10:00 am || Arrive early to plug in, get your nametag, and sign up for a lightning talk.<br />
|-<br />
| 10:00 - 10:30 || Welcome and introductions<br />
|-<br />
| 10:30 - 11:00 || Topic selection and session planning<br />
|-<br />
| 11:00 - 12:00 || Group discussion of topics (10 minutes each to help decide what to attend in the PM with allowance for 3-minute lightning talks to introduce topics)<br />
|-<br />
| 12:00 - 1:30 pm || Lunch on your own<br />
|-<br />
| 1:30 - 3:30 || Breakout sessions<br />
|-<br />
| 3:30 - 4:00 || Wrap-up (next steps, evaluations)<br />
|-<br />
| 4:00 - 6:00 || Pub sessions<br />
|-<br />
}<br />
<br/><br />
For Twitter: #CURATEcamp<br />
<br/><br />
<br/><br />
<br />
Lunch spot suggestions:<br />
<br />
*<br />
*<br />
*<br />
<br />
For more options, see:<br />
<br />
<br />
<br />
----<br />
<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Discussion_Ideas&diff=2812CURATEcamp SAA 2013 Discussion Ideas2013-06-21T21:22:49Z<p>Courtney C. Mumma: </p>
<hr />
<div>Feel free to use this space to share ideas for discussion at CURATEcamp 2013. Request an account at the log in screen.<br />
<br />
----<br />
<br />
'''Topic you are interested in''' (Your name): A sentence or three about your topic.<br />
<br />
* '''Emulation, normalization, or both?''' (Courtney Mumma): <br />
<br />
* '''Should we be compressing AIPs?''' (Courtney Mumma) <br />
<br />
* '''Workarounds''' (Cristela Garcia-Spitz)<br />
<br />
* '''Dirty laundry''' (Courtney Mumma) What do we do in digital that we would never do in analog?<br />
<br />
<br />
----<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2794CURATEcamp SAA 20132013-04-24T23:57:06Z<p>Courtney C. Mumma: </p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August 13th, 2013 (10am - 4pm)<br />
*WHERE: New Orleans, Offsite, TBD<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can: [http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&&InvID_W=2692 SAA 2013 CURATEcamp registration]<br />
<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit [http://curatecamp.org/pages/how-it-works CURATEcamp - How it works] for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2793CURATEcamp SAA 20132013-04-24T23:44:38Z<p>Courtney C. Mumma: </p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August 13th, 2013 (10am - 4pm)<br />
*WHERE: Offsite, TBD<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can: [http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&&InvID_W=2692 SAA 2013 CURATEcamp registration]<br />
<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit [http://curatecamp.org/pages/how-it-works CURATEcamp - How it works] for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Discussion_Ideas&diff=2792CURATEcamp SAA 2013 Discussion Ideas2013-04-24T23:42:32Z<p>Courtney C. Mumma: </p>
<hr />
<div>Feel free to use this space to share ideas for discussion at CURATEcamp 2013. Request an account at the log in screen.<br />
<br />
----<br />
<br />
'''Topic you are interested in''' (Your name): A sentence or three about your topic.<br />
<br />
<br />
<br />
<br />
----<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2791CURATEcamp SAA 20132013-04-24T23:33:10Z<p>Courtney C. Mumma: /* Description */</p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August th, 2013 (10am - 4pm)<br />
*WHERE: Offsite, TBD<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can: [http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&&InvID_W=2692 SAA 2013 CURATEcamp registration]<br />
<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit [http://curatecamp.org/pages/how-it-works CURATEcamp - How it works] for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2790CURATEcamp SAA 20132013-04-24T23:31:51Z<p>Courtney C. Mumma: /* Description */</p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August th, 2013 (10am - 4pm)<br />
*WHERE: Offsite, TBD<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can: [http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&&InvID_W=2692 SAA 2013 CURATEcamp registration]<br />
<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit http://curatecamp.org/pages/how-it-works for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2789CURATEcamp SAA 20132013-04-24T23:31:23Z<p>Courtney C. Mumma: </p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August th, 2013 (10am - 4pm)<br />
*WHERE: Offsite, TBD<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can: [http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&&InvID_W=2692 SAA 2013 CURATEcamp registration]<br />
<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit http://curatecamp.org/pages/how-it-works for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
Attendance is limited to 40.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2788CURATEcamp SAA 20132013-04-24T23:31:06Z<p>Courtney C. Mumma: </p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August th, 2013 (10am - 4pm)<br />
*WHERE: Offsite, TBD<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can: [http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&&InvID_W=2692 SAA 2013 CURATEcamp registration]<br />
<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit http://curatecamp.org/pages/how-it-works for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
Attendance is limited to 40.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2787CURATEcamp SAA 20132013-04-24T23:28:46Z<p>Courtney C. Mumma: /* Moderators */</p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August th, 2013 (10am - 4pm)<br />
*WHERE: Offsite, TBD<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can:<br />
http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&Time=523633104&InvID_W=2303<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit http://curatecamp.org/pages/how-it-works for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
Attendance is limited to 40.<br />
<br />
== Moderators ==<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2786CURATEcamp SAA 20132013-04-24T23:28:28Z<p>Courtney C. Mumma: </p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August th, 2013 (10am - 4pm)<br />
*WHERE: Offsite, TBD<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can:<br />
http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&Time=523633104&InvID_W=2303<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit http://curatecamp.org/pages/how-it-works for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
Attendance is limited to 40.<br />
<br />
== Moderators ==<br />
<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=Main_Page&diff=2785Main Page2013-04-24T23:26:20Z<p>Courtney C. Mumma: /* CURATEcamp SAA 2013 */</p>
<hr />
<div>== IS&T Archiving Conference CURATEcamp 2013 ==<br />
* [[IS&T Archiving Conference CURATEcamp 2013]]<br />
<br />
== AVPres CURATEcamp 2013 ==<br />
* [http://wiki.curatecamp.org/index.php/CURATEcamp_AVpres_2013 AVPres CURATEcamp 2013]<br />
<br />
== ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2013 CURATEcamp ==<br />
* [[ACM/IEEE Joint Conference on Digital Libraries 2013 CURATEcamp]]<br />
<br />
== CURATEcamp SAA 2013 ==<br />
* [[CURATEcamp SAA 2013]]<br />
**[[Welcome SAA 2013 Campers!]]<br />
**[[CURATEcamp SAA 2013 Schedule]]<br />
**[[CURATEcamp SAA 2013 Transportation Info]]<br />
**[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
**[[CURATEcamp SAA 2013 Notes]]<br />
<br />
== CURATEcamp Workspaces ==<br />
* [[Bootstrapping Repositories]]<br />
* [[Ideas for Future Curate Camps]]<br />
<br />
== Other Pages ==<br />
* CURATEcamp main page http://curatecamp.org/<br />
* [[Past CURATEcamp pages]]<br />
<br />
== If you want an account, just request one at the log in screen. We're getting to them pretty quickly! ==</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013_Discussion_Ideas&diff=2784CURATEcamp SAA 2013 Discussion Ideas2013-04-24T23:25:29Z<p>Courtney C. Mumma: Created page with "Feel free to use this space to share ideas for discussion at CURATEcamp 2013. ---- '''Topic you are interested in''' (Your name): A sentence or three about your topic. ---..."</p>
<hr />
<div>Feel free to use this space to share ideas for discussion at CURATEcamp 2013.<br />
<br />
----<br />
<br />
'''Topic you are interested in''' (Your name): A sentence or three about your topic.<br />
<br />
<br />
<br />
<br />
----<br />
[[CURATEcamp SAA 2013]]<br />
<br />
[[Welcome SAA 2013 Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_SAA_2013&diff=2783CURATEcamp SAA 20132013-04-24T23:20:12Z<p>Courtney C. Mumma: Created page with "There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registrat..."</p>
<hr />
<div>There will be a one-day pre-conference CURATEcamp at this year's [http://www2.archivists.org/conference/2013/new-orleans Society of American Archivists Annual Meeting]. Registration is open now! & Space is limited.<br />
<br />
'''CURATEcamp SAA 2013 Pre-conference''' <br />
<br />
*WHEN: Tuesday, August th, 2013 (10am - 4pm)<br />
*WHERE: Offsite, TBD<br />
*COST: $39 (SAA Members), $69 (Non-Members) in advance*<br />
<br />
Space is limited to 40 registrants, so reserve your spot while you can:<br />
http://saa.archivists.org/4DCGI/events/eventdetail.html?Action=Events_Detail&Time=523633104&InvID_W=2303<br />
<br />
== Description ==<br />
<br />
This workshop is an unconference-style event at which participants will engage in discussions related to data curation and digital archives. It’s an unconventional format, with participants in charge of determining learning objectives by choosing the topics and driving the discussion. And it’s an opportunity to brainstorm on current topics, explore ideas in progress or tough concepts, and share best practices.<br />
<br />
This open forum allows for discussions with a diverse group of professionals in a setting in which topics develop organically throughout the day. Visit http://curatecamp.org/pages/how-it-works for more information.<br />
<br />
One of the core goals of CURATEcamp is that everyone engages in peer-to-peer learning, collaboration, and creativity to broaden the digital curation community. Most of all, you’ll be in a position to propose topics, ask questions, get answers, and make connections with your peers in a welcoming environment. There are no spectators at CURATEcamp...only participants!<br />
<br />
:'''Who should attend?''' Anyone who touches digital records and wants to participate and learn in this new format.<br />
<br />
:'''What should you already know?''' You should have a basic understanding of digital collections and data sets.<br />
<br />
Attendance is limited to 40.<br />
<br />
== Moderators ==<br />
<br />
<br />
:'''Cristela Garcia-Spitz'''<br />
:Digital Library Program Project Manager<br />
:University of California, San Diego<br />
<br />
<br />
:'''Courtney C. Mumma'''<br />
:Systems Analyst and Archivematica Product Manager<br />
:Artefactual Systems, Inc., Vancouver, Canada<br />
<br />
== Registration ==<br />
<br />
:Members(Advance/Regular) <br />
:$39 / $89 <br />
<br />
:Employees of Member Institutions(Advance/Regular)<br />
:$59 / $109<br />
<br />
:Nonmembers(Advance/Regular)<br />
:$69 / $119<br />
<br />
<br />
----<br />
[[Welcome 2013 SAA Campers!]]<br />
<br />
[[CURATEcamp SAA 2013 Schedule]]<br />
<br />
[[CURATEcamp SAA 2013 Transportation Info]]<br />
<br />
[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
<br />
[[CURATEcamp SAA 2013 Notes]]</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=Main_Page&diff=2782Main Page2013-04-24T23:11:29Z<p>Courtney C. Mumma: </p>
<hr />
<div>== IS&T Archiving Conference CURATEcamp 2013 ==<br />
* [[IS&T Archiving Conference CURATEcamp 2013]]<br />
<br />
== AVPres CURATEcamp 2013 ==<br />
* [http://wiki.curatecamp.org/index.php/CURATEcamp_AVpres_2013 AVPres CURATEcamp 2013]<br />
<br />
== ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2013 CURATEcamp ==<br />
* [[ACM/IEEE Joint Conference on Digital Libraries 2013 CURATEcamp]]<br />
<br />
== CURATEcamp SAA 2013 ==<br />
* [[CURATEcamp SAA 2013]]<br />
**[[Welcome 2013 SAA Campers!]]<br />
**[[CURATEcamp SAA 2013 Schedule]]<br />
**[[CURATEcamp SAA 2013 Transportation Info]]<br />
**[[CURATEcamp SAA 2013 Discussion Ideas]]<br />
**[[CURATEcamp SAA 2013 Notes]]<br />
<br />
== CURATEcamp Workspaces ==<br />
* [[Bootstrapping Repositories]]<br />
* [[Ideas for Future Curate Camps]]<br />
<br />
== Other Pages ==<br />
* CURATEcamp main page http://curatecamp.org/<br />
* [[Past CURATEcamp pages]]<br />
<br />
== If you want an account, just request one at the log in screen. We're getting to them pretty quickly! ==</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_24_hour_worldwide_file_id_hackathon_Nov_16_2012&diff=2443CURATEcamp 24 hour worldwide file id hackathon Nov 16 20122012-11-17T08:03:15Z<p>Courtney C. Mumma: /* Summary */</p>
<hr />
<div>[[Main Page]] > CURATEcamp iPRES 2012 > CURATEcamp and Open Planets Foundation 24 hour file id hackathon Nov 16 2012<br />
<br />
=Summary=<br />
<br />
At the end of the day, we got A LOT done!<br />
<br />
Thanks to participants!@mopennock @WilliamKilbride @GaryM03062 @anjacks0n @carusb @benfinoradin @peshkira @petemay @Britpunk80 @HeatherBowden @pjvangarderen @jordanheit and everyone else (please add who is missing)<br />
<br />
* GaryM03062 discovered a but testing FITS a bug in JHOVE: https://sourceforge.net/tracker/?func=detail&aid=3587890&group_id=221311&atid=1052190 <br />
* File corpus! https://github.com/openplanets/format-corpus/commit/b0971e1c32b2df7a9bceafe1f00d81f49cb45990 <br />
* (@benfinoradin) Kept all #fileidhack tweets today [pic.twitter.com/7QI1DfmD]<br />
* (@mopennock, @anjacks0n, @petemay et al) British Library team worked on eBook format identification <br />
* (@peshkira) OpenFITS compiles<br />
* (@petemay) Tika signatures for PDB, Kindle AZW and LRF files created, re-testing over sample file set #fileidhack #eBook<br />
* (@mopennock) Added 7 new eBook signatures to Tika this morning <br />
* Encouraged pinging PRONOM (@Britpunk80) to create/test/submit: [http://test.linkeddatapronom.nationalarchives.gov.uk/sigdev/index.htm]<br />
* @Britpunk80 handed some droid signature files to @anjacks0n on rocketbook, epub, and ibooks. * @HeatherBowden shared some Quark and InDesign files. <br />
* @GaryM03062: New commit of OpenFITS allows setting max no. of threads in fits.xml [https://github.com/gmcgath/openfits]<br />
* @benfinoradin shared resource on RIFF/RIFX [http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html]<br />
* the Quicktime motherload! by @mistydemeo - Quicktime videos [https://github.com/openplanets/format-corpus/tree/master/video/Quicktime]<br />
* OpenFITS : [[FITS#Improving_JHOVE_performance_within_FITS]]<br />
* @mistydemeo: Created @machomebrew formula for fidget to make file ID signatures for #fileidhack [https://github.com/mistydemeo/homebrew-formulae]<br />
* #openarchives chat /nick artefactualmtgroom #fileidhack pic.twitter.com/1Ffp1v6Y<br />
* @anjacks0n: new [https://github.com/anjackson/percipio/downloads | Percipio] and [https://github.com/openplanets/format-corpus/downloads | Fidget] available dev and feedback <br />
* @pjvangarderen @archivematica: Artefactual picks up #fileidhack baton. OpenFits debian package for testing [https://launchpad.net/~archivematica/+archive/externals-dev/+build/3989642]<br />
* @GaryM03062 uploaded source changes to JHOVE. [https://sourceforge.net/projects/jhove/]<br />
* @jordanheit testing OpenFITS [[FITS]]<br />
<br />
=Background=<br />
One break-out session at the CURATEcamp iPRES 2012 was affectionately branded "file id confessional" where we commiserated on the state of our file id tools and processes. We also talked about:<br />
<br />
*We can do better job specifying and documenting our file id requirements / use cases<br />
*We're all hooked on that FITS.xml but [[FITS]] needs performance optimization ASAP (also, Is Harvard up for extra dev?)<br />
*Apache Tika is very actively supported and useful tool for file id and content extraction. How much of our file id requirements can it in fact cover?<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] use case (see also [http://actionplan.fcla.edu/ DAITSS action plans])<br />
* Jason Scott's "[http://ascii.textfiles.com/archives/3645 Let's Just Solve the Problem]" campaign to boldly catalog as much file format info as possible in the month of November.<br />
* also, CURATEcamp iPres participant Paul Wheatley has since posted: [http://www.openplanetsfoundation.org/blogs/2012-10-19-practitioners-have-spoken-we-need-better-characterisation We Need Better Characterization] as well as link to [http://willsworld.blogs.edina.ac.uk/2012/10/18/online-hack-event/ Online Hack Event]. This led to Twitter discussion between @pjvangarderen @anjacks0n @prwheatley about this 24 hr hackathon event.<br />
<br />
==What==<br />
<br />
24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions. <br />
<br />
Project proposals can be made by anyone.<br />
<br />
We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).<br />
<br />
==Why==<br />
* Because we'll probably get some useful stuff done<br />
* Because its fun to work with CURATEcamp people in a CURATEcamp way<br />
* Because doing a 24hr+ worldwide hack with real time collaboration tools is cool<br />
<br />
=Logistics=<br />
<br />
==When: '''Fri Nov 16'''==<br />
<br />
* Friday, November 16, 2012<br />
** [http://wiki.opf-labs.org/display/KB/2012-11-13+OPF+Hackathon+-+Emulation%2C+learn+from+the+experts OPF Emulation Hackathon] is Nov 13-15. Freiburg, Germany. Sorry, Nov 16th was chosen somewhat haphazardly. We didn't mean to compete with OPF Hackathon event. But emulation needs file characterization too? Maybe OPF Emulation Hackathon can hand off some "File Id for Emulation" use cases to the Nov 16 24 hr Hackathon...or better yet, extend the Freiburg event to include participation in the Nov 16 24 hr worldwide #fileidhack event. Great way to cap off their Hackathon week! --[[User:PeterVG|PeterVG]] 11:48, 23 Oct 2012 (PDT)<br />
* <strike>Friday, November 23, 2012</strike><br />
** RT @declan: @pjvangarderen neat idea! You know that date is the day after US Thanksgiving, right? people might be on vacation<br />
<br />
==How==<br />
* Twitter: [https://twitter.com/search/realtime?q=%23fileidhack #fileidhack] (made it shorter)<br />
* CURATEcamp Mediawiki: [[Special:UserLogin|Log-in]] and please help update this page<br />
<br />
Let's put together a schedule, tasklist, & volunteers to road-test these tools for Nov 16:<br />
* Google Hangout: [[Google Hangout for CURATEcamp|fire up a webcam]], make it public and share the link<br />
* GoogleDocs: we can live edit any docs we feel the urge to produce<br />
**[[Collecting_format_ID_test_files|Format ID Test Files Project]]'s [[Collecting_format_ID_test_files#Via_Google_Drive|Google Drive]]<br />
* IRC: The chat room is on the irc.OFTC.net server, and the room name is #openarchives [irc://#openarchives@irc.OFTC.net|irc://#openarchives@irc.OFTC.net]<br />
** Chat room help and browser chat option: [https://www.archivematica.org/wiki/Chat_room https://www.archivematica.org/wiki/Chat_room]<br />
* GitHub: get those pull requests going<br />
** [[Collecting_format_ID_test_files|Format ID Test Files Project]]'s [[Collecting_format_ID_test_files#Via_Google_Drive|Git repo]]<br />
<br />
<br />
==Who ([[Special:UserLogin|Sign up]])==<br />
* '''GMT +12:00''' Digital Preservation Practical Implementers Guild (@DP_PIG)<br />
* ?<br />
* '''GMT +7:00''' [[User:Euan_Cochrane|Euan Cochrane]] (@euanc)<br />
* ?<br />
* '''GMT +2:00''' [[User:Maurice_de_Rooij|TechMaurice]] (NANETH)<br />
* '''GMT +1:00''' [[User:Nicholas_Clarke|Nicholas Clarke]] (@nclarkedk) - netarkivet.dk<br />
* '''GMT +0:00''' [[User:Andy_Jackson|Andy Jackson]] (@anjacks0n), Paul Wheatley (@prwheatley), BL digital preservation team - Maureen (@mopennock), PeterM, Lynn, William, and maybe more...; [[User:David Underdown|David Underdown]] (@davidunderdown9) and maybe some more TNA folk<br />
* ?<br />
* '''GMT -5:00''' Kara Van Malssen (@kvanmalssen), Dave Rice (@dericed), Ben Fino-Radin (@benfinoradin), Gary McGath (@Garym03062), @anarchivist<br />
* '''GMT -5:00''' @lljohnston @blefurgy et al!<br />
* '''GMT -5:00''' [[User:Greg Jansen|Greg Jansen]] @gregj, [[User:Ben Pennell|Ben Pennell]] @pennellben<br />
* '''GMT -5:00''' [[User:Heather Bowden|Heather Bowden]] @heatherbowden - will help when/where I can. Happy to help US East Coasters and Artefactual Team, or whomever. Contact me if you need an extra hand.<br />
* ?<br />
* '''GMT -8:00''' [http://artefactual.com/team Artefactual]: peter (@pjvangarderen), courtney (@snarkivist), evelyn, joseph, mikeC (@mcantelon), mikeG, austin, dan...plus any VanCity people wanting to participate from [http://artefactual.com/contact.html Artefactual office].<br />
<br />
=Project Proposals=<br />
* Document file id requirements / use cases<br />
* ArchiveTeam "Just Solve the Problem" wiki scraping -> structured data (CSV?, XML?, RDF?); as an ongoing service?<br />
* [[Improving format ID coverage]]<br />
** Maybe incorporate [http://www.ace.net.nz/tech/TechFileFormat.html "Almost Every file format in the world!"]<br />
* [[Collecting format ID test files]]<br />
** [[Creating an artificial test set using emulation]]<br />
* [[Improving identification methods]]<br />
** Develop a Format ID [http://digitalcontinuity.org/post/7327791836/emulation-workbench-for-digital-object-format-analysis "Emulation Workbench"] for format analysis<br />
** Document software input and output formats to use in limiting the option set for files of a particular time period (if we know all formats that were creatable during a period when a file was created then we can limit results to only those formats), and for use in [http://digitalcontinuity.org/post/7325561455/mining-application-documentation-for-file-format format intelligence mining].<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] testing<br />
** @archivematica team & volunteers<br />
* @kvanmalssen Improved file id /characterization support for AV files in existing tools like Tika and FITS. An update of Exiftool and inclusion of MediaInfo would be a good start. Or maybe test applicability of ffprobe/avprobe for this task.<br />
** @dericed This is exactly what ffprobe/avprobe does. Whereas the many of the digipres tools do identification by sampling x bytes from the head and tail, ffprobe/avprobe incorporate one of the many extensive demuxing libraries to manage identification of the contents.<br />
** @kvanmalssen - Yes, so can we get avprobe to output in a structured way? And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** @dericed - Yes ffprobe/avprobe have the -print_format (-of) option so you can get json, xml, csv, or others. There's also an xsd published for the output. I suppose ffprobe could be incorporated into FITS but not sure if this is an efficient idea. The premise of FITS seems to put all preservation metadata considerations on the container (file format) but in AV collections the codecs and contained bitstreams are far more significant to consider.<br />
** @kvanmalssen - Issue is we need AV support (including track/bitstream support) in these general tools so people can process mixed collections. That's what I'd like to figure out.<br />
And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** See also [[Improving identification methods]], which could perhaps be split into two or three and one of which merged with the above tweet discussion? [[User:Andy Jackson|Andy Jackson]] 15:20, 22 October 2012 (PDT)<br />
* FITS or Tika bugfix marathon (e.g. [https://issues.apache.org/jira/browse/TIKA-539 this one]).<br />
** Perhaps consider refactoring FITS to re-use existing dependency management tools like Maven and apt/yum/etc instead of manual dependency management? [[User:Andy Jackson|Andy Jackson]] 05:16, 23 October 2012 (PDT)<br />
*** I'm willing to put a fork of FITS on Github if a couple of people say they want it. --[[User:Gary McGath|Gary McGath]] 13:27, 11 November 2012 (PST)<br />
* [[User:Maurice_de_Rooij|TechMaurice]]: Replace container identification function of [https://github.com/openplanets/fido FIDO] using PRONOM container signature.<br />
* [[User:Misty De Meo|Misty De Meo]] Just a thought... it strikes me that the basic functionality of FITS is not super complicated. As well, in my experience, most users are using a fairly minimal set of features. Given some of the problems we're having with FITS, it may be worth doing a minimal rewrite of FITS (in, say, Python or C) with a focus on a) speed, and b) maintainability. This is more than a day's work but could get a start if this is something other people would be interested in. Things I'd want to see would include:<br />
** Don't vendor tools - just recommend versions, but draw form whatever tools the user has installed.<br />
** Implement better AV support (with all the caveats listed above)<br />
** Possibly restrict the number of tools?<br />
** +1 to FITS refactoring --[[User:Greg Jansen|Greg Jansen]] 11:33, 15 November 2012 (PST)<br />
** Implement only the configuration options most people use, and let those be specified on the commandline instead of via XML.<br />
*** [[User:Gary_McGath|Gary McGrath]] on IRC points out that the use of external tools means that in FITS scanned files are independently loaded from disk by multiple tools, introducing unneeded IO overhead. Could be fixed in FITS itself.<br />
''Should we take a poll a day in advance to select 2 or 3 projects or should we just let everyone work on whatever proposal they wish?''<br />
<br />
==Preparation TODO==<br />
* GitHub How To<br />
** Set up temporary FITS and/or Tika forks that we can work on?<br />
* Set up Archivematica instances to test FPR<br />
* Easier signature development tools and/or signature contribution tracking, now partially complete, as outlined in [[Improving format ID coverage]]<br />
* Example file contribution How To document, c.f. [[Collecting format ID test files]]<br />
<br />
=Results=<br />
'''Nov 17. 07:30 UTC -- 30 hours later'''<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
Peter Van Garderen @pjvangarderen<br />
Proud to lead 24hr real time R&D cycle. Thanks #fileidhack people for your passion RT @jordanheit: testing OpenFITS wiki.curatecamp.org/index.php/FITS<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: Testing FITS led me to discover a bug in JHOVE, so #fileidhack is worth something. sourceforge.net/tracker/?func=…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @anjacks0n: Thanks for the files, @carusb github.com/openplanets/fo… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack live link? RT @benfinoradin: Archiving all #fileidhack tweets today pic.twitter.com/7QI1DfmD<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks BL! #fileidhack RT @mopennock: BL team are working on eBook format identification today for #fileidhack - @anjacks0n @petemay et al<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @peshkira: OpenFITS current status: It compiles! #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @petemay: Tika sigs for PDB, Kindle AZW and LRF files created, re-testing over sample file set #fileidhack #eBook<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @mopennock: We've added 7 new eBook signatures to Tika this morning #fileidhack. Great work all!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Everyone ping PRONOM pls! RT @Britpunk80: #fileidhack if you want to create/test/submit your own: …keddatapronom.nationalarchives.gov.uk/sigdev/index.h…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @Britpunk80: I've handed some droid sig files to @anjacks0n on rocketbook, epub, and ibooks. #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @HeatherBowden: @euanc @anjacks0n I have some Quark and InDesign files. You interested? #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Nov1614:00UTC RT @pjvangarderen: Wazzup! West Coast in da fileidhacking house! RT @declan: good morning #fileidhack!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @peshkira: #fileidhack Current status: FITS mavenized. PullRequest/Wiki \w explanation follow. /cc @GaryM03062<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: New commit of OpenFITS allows setting max no. of threads in fits.xml #fileidhack github.com/gmcgath/openfi…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Great work on OpenFITS! Lets keep this alive RT @GaryM03062: Calling a day for #fileidhack. Great working with everyone!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @Snarkivist<br />
#fileidhack team - just catching up on your work today - was internetless - look for summary in the morning<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @pjvangarderen: Nov1601:17UTC @euanc (Perth) #fileidhack IRC - Nov1704:16UTC @archivematica crew still hacking #24hrs+<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @benfinoradin: Good resource on RIFF/RIFX: johnloomis.org/cpe102/asgn/as…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Holy cow, the Quicktime motherload! RT @mistydemeo: Have some Quicktime videos, #fileidhack github.com/openplanets/fo…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: Another update for OpenFITS. Please read the wiki: wiki.curatecamp.org/index.php/FITS… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @mistydemeo: Created @MacHomebrew formula for fidget to make file ID signatures for #fileidhack github.com/mistydemeo/hom…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @pjvangarderen: @mistydemeo meet #openarchives /nick artefactualmtgroom #fileidhack pic.twitter.com/1Ffp1v6Y<br />
<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @archivematica: Artefactual picks up #fileidhack baton. OpenFits debian package launchpad.net/~archivematica… test time!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @anjacks0n: @benfinoradin tweaked your sig, now identifies all test files you sent github.com/openplanets/fo… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: As a side effect of #fileidhack, I've been uploading source changes to JHOVE. sourceforge.net/projects/jhove/<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Thanks GMT! RT @mopennock: It's all go this morning for the #fileidhack! wiki.curatecamp.org/index.php/CURA…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @WilliamKilbride: It's #dpc #ff follow friday. look at #fileidhack Better still, get involved wiki.curatecamp.org/index.php/CURA…<br />
<br />
<br />
</pre></div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_24_hour_worldwide_file_id_hackathon_Nov_16_2012&diff=2442CURATEcamp 24 hour worldwide file id hackathon Nov 16 20122012-11-17T08:01:01Z<p>Courtney C. Mumma: </p>
<hr />
<div>[[Main Page]] > CURATEcamp iPRES 2012 > CURATEcamp and Open Planets Foundation 24 hour file id hackathon Nov 16 2012<br />
<br />
=Summary=<br />
<br />
At the end of the day, we got A LOT done!<br />
<br />
Thanks to participants!@mopennock @WilliamKilbride @GaryM03062 @anjacks0n @carusb @benfinoradin @peshkira @petemay @Britpunk80 @HeatherBowden @pjvangarderen @jordanheit and everyone else (please add who is missing)<br />
<br />
* GaryM03062 discovered a but testing FITS a bug in JHOVE: https://sourceforge.net/tracker/?func=detail&aid=3587890&group_id=221311&atid=1052190 <br />
* File corpus! https://github.com/openplanets/format-corpus/commit/b0971e1c32b2df7a9bceafe1f00d81f49cb45990 <br />
* (@benfinoradin) Kept all #fileidhack tweets today pic.twitter.com/7QI1DfmD<br />
* (@mopennock, @anjacks0n, @petemay et al) British Library team worked on eBook format identification <br />
* (@peshkira) OpenFITS compiles<br />
* (@petemay) Tika signatures for PDB, Kindle AZW and LRF files created, re-testing over sample file set #fileidhack #eBook<br />
* (@mopennock) Added 7 new eBook signatures to Tika this morning <br />
* Encouraged pinging PRONOM (@Britpunk80) to create/test/submit: [http://test.linkeddatapronom.nationalarchives.gov.uk/sigdev/index.htm]<br />
* @Britpunk80 handed some droid signature files to @anjacks0n on rocketbook, epub, and ibooks. * @HeatherBowden shared some Quark and InDesign files. <br />
* @GaryM03062: New commit of OpenFITS allows setting max no. of threads in fits.xml [https://github.com/gmcgath/openfits]<br />
* @benfinoradin shared resource on RIFF/RIFX [http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html]<br />
* the Quicktime motherload! by @mistydemeo - Quicktime videos [https://github.com/openplanets/format-corpus/tree/master/video/Quicktime]<br />
* OpenFITS : [[FITS#Improving_JHOVE_performance_within_FITS]]<br />
* @mistydemeo: Created @machomebrew formula for fidget to make file ID signatures for #fileidhack [https://github.com/mistydemeo/homebrew-formulae]<br />
* #openarchives chat /nick artefactualmtgroom #fileidhack pic.twitter.com/1Ffp1v6Y<br />
* @anjacks0n: new [https://github.com/anjackson/percipio/downloads | Percipio] and [https://github.com/openplanets/format-corpus/downloads | Fidget] available dev and feedback <br />
* @pjvangarderen @archivematica: Artefactual picks up #fileidhack baton. OpenFits debian package for testing [https://launchpad.net/~archivematica/+archive/externals-dev/+build/3989642]<br />
* @GaryM03062 uploaded source changes to JHOVE. [https://sourceforge.net/projects/jhove/]<br />
* @jordanheit testing OpenFITS [[FITS]]<br />
<br />
=Background=<br />
One break-out session at the CURATEcamp iPRES 2012 was affectionately branded "file id confessional" where we commiserated on the state of our file id tools and processes. We also talked about:<br />
<br />
*We can do better job specifying and documenting our file id requirements / use cases<br />
*We're all hooked on that FITS.xml but [[FITS]] needs performance optimization ASAP (also, Is Harvard up for extra dev?)<br />
*Apache Tika is very actively supported and useful tool for file id and content extraction. How much of our file id requirements can it in fact cover?<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] use case (see also [http://actionplan.fcla.edu/ DAITSS action plans])<br />
* Jason Scott's "[http://ascii.textfiles.com/archives/3645 Let's Just Solve the Problem]" campaign to boldly catalog as much file format info as possible in the month of November.<br />
* also, CURATEcamp iPres participant Paul Wheatley has since posted: [http://www.openplanetsfoundation.org/blogs/2012-10-19-practitioners-have-spoken-we-need-better-characterisation We Need Better Characterization] as well as link to [http://willsworld.blogs.edina.ac.uk/2012/10/18/online-hack-event/ Online Hack Event]. This led to Twitter discussion between @pjvangarderen @anjacks0n @prwheatley about this 24 hr hackathon event.<br />
<br />
==What==<br />
<br />
24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions. <br />
<br />
Project proposals can be made by anyone.<br />
<br />
We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).<br />
<br />
==Why==<br />
* Because we'll probably get some useful stuff done<br />
* Because its fun to work with CURATEcamp people in a CURATEcamp way<br />
* Because doing a 24hr+ worldwide hack with real time collaboration tools is cool<br />
<br />
=Logistics=<br />
<br />
==When: '''Fri Nov 16'''==<br />
<br />
* Friday, November 16, 2012<br />
** [http://wiki.opf-labs.org/display/KB/2012-11-13+OPF+Hackathon+-+Emulation%2C+learn+from+the+experts OPF Emulation Hackathon] is Nov 13-15. Freiburg, Germany. Sorry, Nov 16th was chosen somewhat haphazardly. We didn't mean to compete with OPF Hackathon event. But emulation needs file characterization too? Maybe OPF Emulation Hackathon can hand off some "File Id for Emulation" use cases to the Nov 16 24 hr Hackathon...or better yet, extend the Freiburg event to include participation in the Nov 16 24 hr worldwide #fileidhack event. Great way to cap off their Hackathon week! --[[User:PeterVG|PeterVG]] 11:48, 23 Oct 2012 (PDT)<br />
* <strike>Friday, November 23, 2012</strike><br />
** RT @declan: @pjvangarderen neat idea! You know that date is the day after US Thanksgiving, right? people might be on vacation<br />
<br />
==How==<br />
* Twitter: [https://twitter.com/search/realtime?q=%23fileidhack #fileidhack] (made it shorter)<br />
* CURATEcamp Mediawiki: [[Special:UserLogin|Log-in]] and please help update this page<br />
<br />
Let's put together a schedule, tasklist, & volunteers to road-test these tools for Nov 16:<br />
* Google Hangout: [[Google Hangout for CURATEcamp|fire up a webcam]], make it public and share the link<br />
* GoogleDocs: we can live edit any docs we feel the urge to produce<br />
**[[Collecting_format_ID_test_files|Format ID Test Files Project]]'s [[Collecting_format_ID_test_files#Via_Google_Drive|Google Drive]]<br />
* IRC: The chat room is on the irc.OFTC.net server, and the room name is #openarchives [irc://#openarchives@irc.OFTC.net|irc://#openarchives@irc.OFTC.net]<br />
** Chat room help and browser chat option: [https://www.archivematica.org/wiki/Chat_room https://www.archivematica.org/wiki/Chat_room]<br />
* GitHub: get those pull requests going<br />
** [[Collecting_format_ID_test_files|Format ID Test Files Project]]'s [[Collecting_format_ID_test_files#Via_Google_Drive|Git repo]]<br />
<br />
<br />
==Who ([[Special:UserLogin|Sign up]])==<br />
* '''GMT +12:00''' Digital Preservation Practical Implementers Guild (@DP_PIG)<br />
* ?<br />
* '''GMT +7:00''' [[User:Euan_Cochrane|Euan Cochrane]] (@euanc)<br />
* ?<br />
* '''GMT +2:00''' [[User:Maurice_de_Rooij|TechMaurice]] (NANETH)<br />
* '''GMT +1:00''' [[User:Nicholas_Clarke|Nicholas Clarke]] (@nclarkedk) - netarkivet.dk<br />
* '''GMT +0:00''' [[User:Andy_Jackson|Andy Jackson]] (@anjacks0n), Paul Wheatley (@prwheatley), BL digital preservation team - Maureen (@mopennock), PeterM, Lynn, William, and maybe more...; [[User:David Underdown|David Underdown]] (@davidunderdown9) and maybe some more TNA folk<br />
* ?<br />
* '''GMT -5:00''' Kara Van Malssen (@kvanmalssen), Dave Rice (@dericed), Ben Fino-Radin (@benfinoradin), Gary McGath (@Garym03062), @anarchivist<br />
* '''GMT -5:00''' @lljohnston @blefurgy et al!<br />
* '''GMT -5:00''' [[User:Greg Jansen|Greg Jansen]] @gregj, [[User:Ben Pennell|Ben Pennell]] @pennellben<br />
* '''GMT -5:00''' [[User:Heather Bowden|Heather Bowden]] @heatherbowden - will help when/where I can. Happy to help US East Coasters and Artefactual Team, or whomever. Contact me if you need an extra hand.<br />
* ?<br />
* '''GMT -8:00''' [http://artefactual.com/team Artefactual]: peter (@pjvangarderen), courtney (@snarkivist), evelyn, joseph, mikeC (@mcantelon), mikeG, austin, dan...plus any VanCity people wanting to participate from [http://artefactual.com/contact.html Artefactual office].<br />
<br />
=Project Proposals=<br />
* Document file id requirements / use cases<br />
* ArchiveTeam "Just Solve the Problem" wiki scraping -> structured data (CSV?, XML?, RDF?); as an ongoing service?<br />
* [[Improving format ID coverage]]<br />
** Maybe incorporate [http://www.ace.net.nz/tech/TechFileFormat.html "Almost Every file format in the world!"]<br />
* [[Collecting format ID test files]]<br />
** [[Creating an artificial test set using emulation]]<br />
* [[Improving identification methods]]<br />
** Develop a Format ID [http://digitalcontinuity.org/post/7327791836/emulation-workbench-for-digital-object-format-analysis "Emulation Workbench"] for format analysis<br />
** Document software input and output formats to use in limiting the option set for files of a particular time period (if we know all formats that were creatable during a period when a file was created then we can limit results to only those formats), and for use in [http://digitalcontinuity.org/post/7325561455/mining-application-documentation-for-file-format format intelligence mining].<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] testing<br />
** @archivematica team & volunteers<br />
* @kvanmalssen Improved file id /characterization support for AV files in existing tools like Tika and FITS. An update of Exiftool and inclusion of MediaInfo would be a good start. Or maybe test applicability of ffprobe/avprobe for this task.<br />
** @dericed This is exactly what ffprobe/avprobe does. Whereas the many of the digipres tools do identification by sampling x bytes from the head and tail, ffprobe/avprobe incorporate one of the many extensive demuxing libraries to manage identification of the contents.<br />
** @kvanmalssen - Yes, so can we get avprobe to output in a structured way? And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** @dericed - Yes ffprobe/avprobe have the -print_format (-of) option so you can get json, xml, csv, or others. There's also an xsd published for the output. I suppose ffprobe could be incorporated into FITS but not sure if this is an efficient idea. The premise of FITS seems to put all preservation metadata considerations on the container (file format) but in AV collections the codecs and contained bitstreams are far more significant to consider.<br />
** @kvanmalssen - Issue is we need AV support (including track/bitstream support) in these general tools so people can process mixed collections. That's what I'd like to figure out.<br />
And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** See also [[Improving identification methods]], which could perhaps be split into two or three and one of which merged with the above tweet discussion? [[User:Andy Jackson|Andy Jackson]] 15:20, 22 October 2012 (PDT)<br />
* FITS or Tika bugfix marathon (e.g. [https://issues.apache.org/jira/browse/TIKA-539 this one]).<br />
** Perhaps consider refactoring FITS to re-use existing dependency management tools like Maven and apt/yum/etc instead of manual dependency management? [[User:Andy Jackson|Andy Jackson]] 05:16, 23 October 2012 (PDT)<br />
*** I'm willing to put a fork of FITS on Github if a couple of people say they want it. --[[User:Gary McGath|Gary McGath]] 13:27, 11 November 2012 (PST)<br />
* [[User:Maurice_de_Rooij|TechMaurice]]: Replace container identification function of [https://github.com/openplanets/fido FIDO] using PRONOM container signature.<br />
* [[User:Misty De Meo|Misty De Meo]] Just a thought... it strikes me that the basic functionality of FITS is not super complicated. As well, in my experience, most users are using a fairly minimal set of features. Given some of the problems we're having with FITS, it may be worth doing a minimal rewrite of FITS (in, say, Python or C) with a focus on a) speed, and b) maintainability. This is more than a day's work but could get a start if this is something other people would be interested in. Things I'd want to see would include:<br />
** Don't vendor tools - just recommend versions, but draw form whatever tools the user has installed.<br />
** Implement better AV support (with all the caveats listed above)<br />
** Possibly restrict the number of tools?<br />
** +1 to FITS refactoring --[[User:Greg Jansen|Greg Jansen]] 11:33, 15 November 2012 (PST)<br />
** Implement only the configuration options most people use, and let those be specified on the commandline instead of via XML.<br />
*** [[User:Gary_McGath|Gary McGrath]] on IRC points out that the use of external tools means that in FITS scanned files are independently loaded from disk by multiple tools, introducing unneeded IO overhead. Could be fixed in FITS itself.<br />
''Should we take a poll a day in advance to select 2 or 3 projects or should we just let everyone work on whatever proposal they wish?''<br />
<br />
==Preparation TODO==<br />
* GitHub How To<br />
** Set up temporary FITS and/or Tika forks that we can work on?<br />
* Set up Archivematica instances to test FPR<br />
* Easier signature development tools and/or signature contribution tracking, now partially complete, as outlined in [[Improving format ID coverage]]<br />
* Example file contribution How To document, c.f. [[Collecting format ID test files]]<br />
<br />
=Results=<br />
'''Nov 17. 07:30 UTC -- 30 hours later'''<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
Peter Van Garderen @pjvangarderen<br />
Proud to lead 24hr real time R&D cycle. Thanks #fileidhack people for your passion RT @jordanheit: testing OpenFITS wiki.curatecamp.org/index.php/FITS<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: Testing FITS led me to discover a bug in JHOVE, so #fileidhack is worth something. sourceforge.net/tracker/?func=…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @anjacks0n: Thanks for the files, @carusb github.com/openplanets/fo… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack live link? RT @benfinoradin: Archiving all #fileidhack tweets today pic.twitter.com/7QI1DfmD<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks BL! #fileidhack RT @mopennock: BL team are working on eBook format identification today for #fileidhack - @anjacks0n @petemay et al<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @peshkira: OpenFITS current status: It compiles! #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @petemay: Tika sigs for PDB, Kindle AZW and LRF files created, re-testing over sample file set #fileidhack #eBook<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @mopennock: We've added 7 new eBook signatures to Tika this morning #fileidhack. Great work all!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Everyone ping PRONOM pls! RT @Britpunk80: #fileidhack if you want to create/test/submit your own: …keddatapronom.nationalarchives.gov.uk/sigdev/index.h…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @Britpunk80: I've handed some droid sig files to @anjacks0n on rocketbook, epub, and ibooks. #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @HeatherBowden: @euanc @anjacks0n I have some Quark and InDesign files. You interested? #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Nov1614:00UTC RT @pjvangarderen: Wazzup! West Coast in da fileidhacking house! RT @declan: good morning #fileidhack!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @peshkira: #fileidhack Current status: FITS mavenized. PullRequest/Wiki \w explanation follow. /cc @GaryM03062<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: New commit of OpenFITS allows setting max no. of threads in fits.xml #fileidhack github.com/gmcgath/openfi…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Great work on OpenFITS! Lets keep this alive RT @GaryM03062: Calling a day for #fileidhack. Great working with everyone!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @Snarkivist<br />
#fileidhack team - just catching up on your work today - was internetless - look for summary in the morning<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @pjvangarderen: Nov1601:17UTC @euanc (Perth) #fileidhack IRC - Nov1704:16UTC @archivematica crew still hacking #24hrs+<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @benfinoradin: Good resource on RIFF/RIFX: johnloomis.org/cpe102/asgn/as…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Holy cow, the Quicktime motherload! RT @mistydemeo: Have some Quicktime videos, #fileidhack github.com/openplanets/fo…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: Another update for OpenFITS. Please read the wiki: wiki.curatecamp.org/index.php/FITS… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @mistydemeo: Created @MacHomebrew formula for fidget to make file ID signatures for #fileidhack github.com/mistydemeo/hom…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @pjvangarderen: @mistydemeo meet #openarchives /nick artefactualmtgroom #fileidhack pic.twitter.com/1Ffp1v6Y<br />
<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @archivematica: Artefactual picks up #fileidhack baton. OpenFits debian package launchpad.net/~archivematica… test time!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @anjacks0n: @benfinoradin tweaked your sig, now identifies all test files you sent github.com/openplanets/fo… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: As a side effect of #fileidhack, I've been uploading source changes to JHOVE. sourceforge.net/projects/jhove/<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Thanks GMT! RT @mopennock: It's all go this morning for the #fileidhack! wiki.curatecamp.org/index.php/CURA…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @WilliamKilbride: It's #dpc #ff follow friday. look at #fileidhack Better still, get involved wiki.curatecamp.org/index.php/CURA…<br />
<br />
<br />
</pre></div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_24_hour_worldwide_file_id_hackathon_Nov_16_2012&diff=2441CURATEcamp 24 hour worldwide file id hackathon Nov 16 20122012-11-17T08:00:15Z<p>Courtney C. Mumma: /* Summary */</p>
<hr />
<div>[[Main Page]] > CURATEcamp iPRES 2012 > CURATEcamp and Open Planets Foundation 24 hour file id hackathon Nov 16 2012<br />
<br />
=Summary=<br />
At the end of the day, we got A LOT done!<br />
<br />
Thanks to participants!@mopennock @WilliamKilbride @GaryM03062 @anjacks0n @carusb @benfinoradin @peshkira @petemay @Britpunk80 @HeatherBowden @pjvangarderen @jordanheit and everyone else (please add who is missing)<br />
<br />
* GaryM03062 discovered a but testing FITS a bug in JHOVE: https://sourceforge.net/tracker/?func=detail&aid=3587890&group_id=221311&atid=1052190 <br />
* File corpus! https://github.com/openplanets/format-corpus/commit/b0971e1c32b2df7a9bceafe1f00d81f49cb45990 <br />
* (@benfinoradin) Kept all #fileidhack tweets today pic.twitter.com/7QI1DfmD<br />
* (@mopennock, @anjacks0n, @petemay et al) British Library team worked on eBook format identification <br />
* (@peshkira) OpenFITS compiles<br />
* (@petemay) Tika signatures for PDB, Kindle AZW and LRF files created, re-testing over sample file set #fileidhack #eBook<br />
* (@mopennock) Added 7 new eBook signatures to Tika this morning <br />
* Encouraged pinging PRONOM (@Britpunk80) to create/test/submit: [http://test.linkeddatapronom.nationalarchives.gov.uk/sigdev/index.htm]<br />
* @Britpunk80 handed some droid signature files to @anjacks0n on rocketbook, epub, and ibooks. * @HeatherBowden shared some Quark and InDesign files. <br />
* @GaryM03062: New commit of OpenFITS allows setting max no. of threads in fits.xml [https://github.com/gmcgath/openfits]<br />
* @benfinoradin shared resource on RIFF/RIFX [http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html]<br />
* the Quicktime motherload! by @mistydemeo - Quicktime videos [https://github.com/openplanets/format-corpus/tree/master/video/Quicktime]<br />
* OpenFITS : [[FITS#Improving_JHOVE_performance_within_FITS]]<br />
* @mistydemeo: Created @machomebrew formula for fidget to make file ID signatures for #fileidhack [https://github.com/mistydemeo/homebrew-formulae]<br />
* #openarchives chat /nick artefactualmtgroom #fileidhack pic.twitter.com/1Ffp1v6Y<br />
* @anjacks0n: new [https://github.com/anjackson/percipio/downloads | Percipio] and [https://github.com/openplanets/format-corpus/downloads | Fidget] available dev and feedback <br />
* @pjvangarderen @archivematica: Artefactual picks up #fileidhack baton. OpenFits debian package for testing [https://launchpad.net/~archivematica/+archive/externals-dev/+build/3989642]<br />
* @GaryM03062 uploaded source changes to JHOVE. [https://sourceforge.net/projects/jhove/]<br />
* @jordanheit testing OpenFITS [[FITS]]<br />
<br />
=Background=<br />
One break-out session at the CURATEcamp iPRES 2012 was affectionately branded "file id confessional" where we commiserated on the state of our file id tools and processes. We also talked about:<br />
<br />
*We can do better job specifying and documenting our file id requirements / use cases<br />
*We're all hooked on that FITS.xml but [[FITS]] needs performance optimization ASAP (also, Is Harvard up for extra dev?)<br />
*Apache Tika is very actively supported and useful tool for file id and content extraction. How much of our file id requirements can it in fact cover?<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] use case (see also [http://actionplan.fcla.edu/ DAITSS action plans])<br />
* Jason Scott's "[http://ascii.textfiles.com/archives/3645 Let's Just Solve the Problem]" campaign to boldly catalog as much file format info as possible in the month of November.<br />
* also, CURATEcamp iPres participant Paul Wheatley has since posted: [http://www.openplanetsfoundation.org/blogs/2012-10-19-practitioners-have-spoken-we-need-better-characterisation We Need Better Characterization] as well as link to [http://willsworld.blogs.edina.ac.uk/2012/10/18/online-hack-event/ Online Hack Event]. This led to Twitter discussion between @pjvangarderen @anjacks0n @prwheatley about this 24 hr hackathon event.<br />
<br />
==What==<br />
<br />
24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions. <br />
<br />
Project proposals can be made by anyone.<br />
<br />
We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).<br />
<br />
==Why==<br />
* Because we'll probably get some useful stuff done<br />
* Because its fun to work with CURATEcamp people in a CURATEcamp way<br />
* Because doing a 24hr+ worldwide hack with real time collaboration tools is cool<br />
<br />
=Logistics=<br />
<br />
==When: '''Fri Nov 16'''==<br />
<br />
* Friday, November 16, 2012<br />
** [http://wiki.opf-labs.org/display/KB/2012-11-13+OPF+Hackathon+-+Emulation%2C+learn+from+the+experts OPF Emulation Hackathon] is Nov 13-15. Freiburg, Germany. Sorry, Nov 16th was chosen somewhat haphazardly. We didn't mean to compete with OPF Hackathon event. But emulation needs file characterization too? Maybe OPF Emulation Hackathon can hand off some "File Id for Emulation" use cases to the Nov 16 24 hr Hackathon...or better yet, extend the Freiburg event to include participation in the Nov 16 24 hr worldwide #fileidhack event. Great way to cap off their Hackathon week! --[[User:PeterVG|PeterVG]] 11:48, 23 Oct 2012 (PDT)<br />
* <strike>Friday, November 23, 2012</strike><br />
** RT @declan: @pjvangarderen neat idea! You know that date is the day after US Thanksgiving, right? people might be on vacation<br />
<br />
==How==<br />
* Twitter: [https://twitter.com/search/realtime?q=%23fileidhack #fileidhack] (made it shorter)<br />
* CURATEcamp Mediawiki: [[Special:UserLogin|Log-in]] and please help update this page<br />
<br />
Let's put together a schedule, tasklist, & volunteers to road-test these tools for Nov 16:<br />
* Google Hangout: [[Google Hangout for CURATEcamp|fire up a webcam]], make it public and share the link<br />
* GoogleDocs: we can live edit any docs we feel the urge to produce<br />
**[[Collecting_format_ID_test_files|Format ID Test Files Project]]'s [[Collecting_format_ID_test_files#Via_Google_Drive|Google Drive]]<br />
* IRC: The chat room is on the irc.OFTC.net server, and the room name is #openarchives [irc://#openarchives@irc.OFTC.net|irc://#openarchives@irc.OFTC.net]<br />
** Chat room help and browser chat option: [https://www.archivematica.org/wiki/Chat_room https://www.archivematica.org/wiki/Chat_room]<br />
* GitHub: get those pull requests going<br />
** [[Collecting_format_ID_test_files|Format ID Test Files Project]]'s [[Collecting_format_ID_test_files#Via_Google_Drive|Git repo]]<br />
<br />
<br />
==Who ([[Special:UserLogin|Sign up]])==<br />
* '''GMT +12:00''' Digital Preservation Practical Implementers Guild (@DP_PIG)<br />
* ?<br />
* '''GMT +7:00''' [[User:Euan_Cochrane|Euan Cochrane]] (@euanc)<br />
* ?<br />
* '''GMT +2:00''' [[User:Maurice_de_Rooij|TechMaurice]] (NANETH)<br />
* '''GMT +1:00''' [[User:Nicholas_Clarke|Nicholas Clarke]] (@nclarkedk) - netarkivet.dk<br />
* '''GMT +0:00''' [[User:Andy_Jackson|Andy Jackson]] (@anjacks0n), Paul Wheatley (@prwheatley), BL digital preservation team - Maureen (@mopennock), PeterM, Lynn, William, and maybe more...; [[User:David Underdown|David Underdown]] (@davidunderdown9) and maybe some more TNA folk<br />
* ?<br />
* '''GMT -5:00''' Kara Van Malssen (@kvanmalssen), Dave Rice (@dericed), Ben Fino-Radin (@benfinoradin), Gary McGath (@Garym03062), @anarchivist<br />
* '''GMT -5:00''' @lljohnston @blefurgy et al!<br />
* '''GMT -5:00''' [[User:Greg Jansen|Greg Jansen]] @gregj, [[User:Ben Pennell|Ben Pennell]] @pennellben<br />
* '''GMT -5:00''' [[User:Heather Bowden|Heather Bowden]] @heatherbowden - will help when/where I can. Happy to help US East Coasters and Artefactual Team, or whomever. Contact me if you need an extra hand.<br />
* ?<br />
* '''GMT -8:00''' [http://artefactual.com/team Artefactual]: peter (@pjvangarderen), courtney (@snarkivist), evelyn, joseph, mikeC (@mcantelon), mikeG, austin, dan...plus any VanCity people wanting to participate from [http://artefactual.com/contact.html Artefactual office].<br />
<br />
=Project Proposals=<br />
* Document file id requirements / use cases<br />
* ArchiveTeam "Just Solve the Problem" wiki scraping -> structured data (CSV?, XML?, RDF?); as an ongoing service?<br />
* [[Improving format ID coverage]]<br />
** Maybe incorporate [http://www.ace.net.nz/tech/TechFileFormat.html "Almost Every file format in the world!"]<br />
* [[Collecting format ID test files]]<br />
** [[Creating an artificial test set using emulation]]<br />
* [[Improving identification methods]]<br />
** Develop a Format ID [http://digitalcontinuity.org/post/7327791836/emulation-workbench-for-digital-object-format-analysis "Emulation Workbench"] for format analysis<br />
** Document software input and output formats to use in limiting the option set for files of a particular time period (if we know all formats that were creatable during a period when a file was created then we can limit results to only those formats), and for use in [http://digitalcontinuity.org/post/7325561455/mining-application-documentation-for-file-format format intelligence mining].<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] testing<br />
** @archivematica team & volunteers<br />
* @kvanmalssen Improved file id /characterization support for AV files in existing tools like Tika and FITS. An update of Exiftool and inclusion of MediaInfo would be a good start. Or maybe test applicability of ffprobe/avprobe for this task.<br />
** @dericed This is exactly what ffprobe/avprobe does. Whereas the many of the digipres tools do identification by sampling x bytes from the head and tail, ffprobe/avprobe incorporate one of the many extensive demuxing libraries to manage identification of the contents.<br />
** @kvanmalssen - Yes, so can we get avprobe to output in a structured way? And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** @dericed - Yes ffprobe/avprobe have the -print_format (-of) option so you can get json, xml, csv, or others. There's also an xsd published for the output. I suppose ffprobe could be incorporated into FITS but not sure if this is an efficient idea. The premise of FITS seems to put all preservation metadata considerations on the container (file format) but in AV collections the codecs and contained bitstreams are far more significant to consider.<br />
** @kvanmalssen - Issue is we need AV support (including track/bitstream support) in these general tools so people can process mixed collections. That's what I'd like to figure out.<br />
And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** See also [[Improving identification methods]], which could perhaps be split into two or three and one of which merged with the above tweet discussion? [[User:Andy Jackson|Andy Jackson]] 15:20, 22 October 2012 (PDT)<br />
* FITS or Tika bugfix marathon (e.g. [https://issues.apache.org/jira/browse/TIKA-539 this one]).<br />
** Perhaps consider refactoring FITS to re-use existing dependency management tools like Maven and apt/yum/etc instead of manual dependency management? [[User:Andy Jackson|Andy Jackson]] 05:16, 23 October 2012 (PDT)<br />
*** I'm willing to put a fork of FITS on Github if a couple of people say they want it. --[[User:Gary McGath|Gary McGath]] 13:27, 11 November 2012 (PST)<br />
* [[User:Maurice_de_Rooij|TechMaurice]]: Replace container identification function of [https://github.com/openplanets/fido FIDO] using PRONOM container signature.<br />
* [[User:Misty De Meo|Misty De Meo]] Just a thought... it strikes me that the basic functionality of FITS is not super complicated. As well, in my experience, most users are using a fairly minimal set of features. Given some of the problems we're having with FITS, it may be worth doing a minimal rewrite of FITS (in, say, Python or C) with a focus on a) speed, and b) maintainability. This is more than a day's work but could get a start if this is something other people would be interested in. Things I'd want to see would include:<br />
** Don't vendor tools - just recommend versions, but draw form whatever tools the user has installed.<br />
** Implement better AV support (with all the caveats listed above)<br />
** Possibly restrict the number of tools?<br />
** +1 to FITS refactoring --[[User:Greg Jansen|Greg Jansen]] 11:33, 15 November 2012 (PST)<br />
** Implement only the configuration options most people use, and let those be specified on the commandline instead of via XML.<br />
*** [[User:Gary_McGath|Gary McGrath]] on IRC points out that the use of external tools means that in FITS scanned files are independently loaded from disk by multiple tools, introducing unneeded IO overhead. Could be fixed in FITS itself.<br />
''Should we take a poll a day in advance to select 2 or 3 projects or should we just let everyone work on whatever proposal they wish?''<br />
<br />
==Preparation TODO==<br />
* GitHub How To<br />
** Set up temporary FITS and/or Tika forks that we can work on?<br />
* Set up Archivematica instances to test FPR<br />
* Easier signature development tools and/or signature contribution tracking, now partially complete, as outlined in [[Improving format ID coverage]]<br />
* Example file contribution How To document, c.f. [[Collecting format ID test files]]<br />
<br />
=Results=<br />
'''Nov 17. 07:30 UTC -- 30 hours later'''<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
Peter Van Garderen @pjvangarderen<br />
Proud to lead 24hr real time R&D cycle. Thanks #fileidhack people for your passion RT @jordanheit: testing OpenFITS wiki.curatecamp.org/index.php/FITS<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: Testing FITS led me to discover a bug in JHOVE, so #fileidhack is worth something. sourceforge.net/tracker/?func=…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @anjacks0n: Thanks for the files, @carusb github.com/openplanets/fo… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack live link? RT @benfinoradin: Archiving all #fileidhack tweets today pic.twitter.com/7QI1DfmD<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks BL! #fileidhack RT @mopennock: BL team are working on eBook format identification today for #fileidhack - @anjacks0n @petemay et al<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @peshkira: OpenFITS current status: It compiles! #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @petemay: Tika sigs for PDB, Kindle AZW and LRF files created, re-testing over sample file set #fileidhack #eBook<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @mopennock: We've added 7 new eBook signatures to Tika this morning #fileidhack. Great work all!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Everyone ping PRONOM pls! RT @Britpunk80: #fileidhack if you want to create/test/submit your own: …keddatapronom.nationalarchives.gov.uk/sigdev/index.h…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @Britpunk80: I've handed some droid sig files to @anjacks0n on rocketbook, epub, and ibooks. #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @HeatherBowden: @euanc @anjacks0n I have some Quark and InDesign files. You interested? #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Nov1614:00UTC RT @pjvangarderen: Wazzup! West Coast in da fileidhacking house! RT @declan: good morning #fileidhack!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @peshkira: #fileidhack Current status: FITS mavenized. PullRequest/Wiki \w explanation follow. /cc @GaryM03062<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: New commit of OpenFITS allows setting max no. of threads in fits.xml #fileidhack github.com/gmcgath/openfi…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Great work on OpenFITS! Lets keep this alive RT @GaryM03062: Calling a day for #fileidhack. Great working with everyone!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @Snarkivist<br />
#fileidhack team - just catching up on your work today - was internetless - look for summary in the morning<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @pjvangarderen: Nov1601:17UTC @euanc (Perth) #fileidhack IRC - Nov1704:16UTC @archivematica crew still hacking #24hrs+<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @benfinoradin: Good resource on RIFF/RIFX: johnloomis.org/cpe102/asgn/as…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Holy cow, the Quicktime motherload! RT @mistydemeo: Have some Quicktime videos, #fileidhack github.com/openplanets/fo…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: Another update for OpenFITS. Please read the wiki: wiki.curatecamp.org/index.php/FITS… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @mistydemeo: Created @MacHomebrew formula for fidget to make file ID signatures for #fileidhack github.com/mistydemeo/hom…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @pjvangarderen: @mistydemeo meet #openarchives /nick artefactualmtgroom #fileidhack pic.twitter.com/1Ffp1v6Y<br />
<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @archivematica: Artefactual picks up #fileidhack baton. OpenFits debian package launchpad.net/~archivematica… test time!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @anjacks0n: @benfinoradin tweaked your sig, now identifies all test files you sent github.com/openplanets/fo… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: As a side effect of #fileidhack, I've been uploading source changes to JHOVE. sourceforge.net/projects/jhove/<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Thanks GMT! RT @mopennock: It's all go this morning for the #fileidhack! wiki.curatecamp.org/index.php/CURA…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @WilliamKilbride: It's #dpc #ff follow friday. look at #fileidhack Better still, get involved wiki.curatecamp.org/index.php/CURA…<br />
<br />
<br />
</pre><br />
<br />
==Summary==</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_24_hour_worldwide_file_id_hackathon_Nov_16_2012&diff=2440CURATEcamp 24 hour worldwide file id hackathon Nov 16 20122012-11-17T07:58:56Z<p>Courtney C. Mumma: /* Results */</p>
<hr />
<div>[[Main Page]] > CURATEcamp iPRES 2012 > CURATEcamp and Open Planets Foundation 24 hour file id hackathon Nov 16 2012<br />
<br />
=Summary=<br />
<br />
Thanks to participants!@mopennock @WilliamKilbride @GaryM03062 @anjacks0n @carusb @benfinoradin @peshkira @petemay @Britpunk80 @HeatherBowden @pjvangarderen @jordanheit<br />
* GaryM03062 discovered a but testing FITS a bug in JHOVE: https://sourceforge.net/tracker/?func=detail&aid=3587890&group_id=221311&atid=1052190 <br />
* File corpus! https://github.com/openplanets/format-corpus/commit/b0971e1c32b2df7a9bceafe1f00d81f49cb45990 <br />
* (@benfinoradin) Kept all #fileidhack tweets today pic.twitter.com/7QI1DfmD<br />
* (@mopennock, @anjacks0n, @petemay et al) British Library team worked on eBook format identification <br />
* (@peshkira) OpenFITS compiles<br />
* (@petemay) Tika signatures for PDB, Kindle AZW and LRF files created, re-testing over sample file set #fileidhack #eBook<br />
* (@mopennock) Added 7 new eBook signatures to Tika this morning <br />
* Encouraged pinging PRONOM (@Britpunk80) to create/test/submit: [http://test.linkeddatapronom.nationalarchives.gov.uk/sigdev/index.htm]<br />
* @Britpunk80 handed some droid signature files to @anjacks0n on rocketbook, epub, and ibooks. * @HeatherBowden shared some Quark and InDesign files. <br />
* @GaryM03062: New commit of OpenFITS allows setting max no. of threads in fits.xml [https://github.com/gmcgath/openfits]<br />
* @benfinoradin shared resource on RIFF/RIFX [http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html]<br />
* the Quicktime motherload! by @mistydemeo - Quicktime videos [https://github.com/openplanets/format-corpus/tree/master/video/Quicktime]<br />
* OpenFITS : [[FITS#Improving_JHOVE_performance_within_FITS]]<br />
* @mistydemeo: Created @machomebrew formula for fidget to make file ID signatures for #fileidhack [https://github.com/mistydemeo/homebrew-formulae]<br />
* #openarchives chat /nick artefactualmtgroom #fileidhack pic.twitter.com/1Ffp1v6Y<br />
* @anjacks0n: new [https://github.com/anjackson/percipio/downloads | Percipio] and [https://github.com/openplanets/format-corpus/downloads | Fidget] available dev and feedback <br />
* @pjvangarderen @archivematica: Artefactual picks up #fileidhack baton. OpenFits debian package for testing [https://launchpad.net/~archivematica/+archive/externals-dev/+build/3989642]<br />
* @GaryM03062 uploaded source changes to JHOVE. [https://sourceforge.net/projects/jhove/]<br />
* @jordanheit testing OpenFITS [[FITS]]<br />
<br />
=Background=<br />
One break-out session at the CURATEcamp iPRES 2012 was affectionately branded "file id confessional" where we commiserated on the state of our file id tools and processes. We also talked about:<br />
<br />
*We can do better job specifying and documenting our file id requirements / use cases<br />
*We're all hooked on that FITS.xml but [[FITS]] needs performance optimization ASAP (also, Is Harvard up for extra dev?)<br />
*Apache Tika is very actively supported and useful tool for file id and content extraction. How much of our file id requirements can it in fact cover?<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] use case (see also [http://actionplan.fcla.edu/ DAITSS action plans])<br />
* Jason Scott's "[http://ascii.textfiles.com/archives/3645 Let's Just Solve the Problem]" campaign to boldly catalog as much file format info as possible in the month of November.<br />
* also, CURATEcamp iPres participant Paul Wheatley has since posted: [http://www.openplanetsfoundation.org/blogs/2012-10-19-practitioners-have-spoken-we-need-better-characterisation We Need Better Characterization] as well as link to [http://willsworld.blogs.edina.ac.uk/2012/10/18/online-hack-event/ Online Hack Event]. This led to Twitter discussion between @pjvangarderen @anjacks0n @prwheatley about this 24 hr hackathon event.<br />
<br />
==What==<br />
<br />
24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions. <br />
<br />
Project proposals can be made by anyone.<br />
<br />
We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).<br />
<br />
==Why==<br />
* Because we'll probably get some useful stuff done<br />
* Because its fun to work with CURATEcamp people in a CURATEcamp way<br />
* Because doing a 24hr+ worldwide hack with real time collaboration tools is cool<br />
<br />
=Logistics=<br />
<br />
==When: '''Fri Nov 16'''==<br />
<br />
* Friday, November 16, 2012<br />
** [http://wiki.opf-labs.org/display/KB/2012-11-13+OPF+Hackathon+-+Emulation%2C+learn+from+the+experts OPF Emulation Hackathon] is Nov 13-15. Freiburg, Germany. Sorry, Nov 16th was chosen somewhat haphazardly. We didn't mean to compete with OPF Hackathon event. But emulation needs file characterization too? Maybe OPF Emulation Hackathon can hand off some "File Id for Emulation" use cases to the Nov 16 24 hr Hackathon...or better yet, extend the Freiburg event to include participation in the Nov 16 24 hr worldwide #fileidhack event. Great way to cap off their Hackathon week! --[[User:PeterVG|PeterVG]] 11:48, 23 Oct 2012 (PDT)<br />
* <strike>Friday, November 23, 2012</strike><br />
** RT @declan: @pjvangarderen neat idea! You know that date is the day after US Thanksgiving, right? people might be on vacation<br />
<br />
==How==<br />
* Twitter: [https://twitter.com/search/realtime?q=%23fileidhack #fileidhack] (made it shorter)<br />
* CURATEcamp Mediawiki: [[Special:UserLogin|Log-in]] and please help update this page<br />
<br />
Let's put together a schedule, tasklist, & volunteers to road-test these tools for Nov 16:<br />
* Google Hangout: [[Google Hangout for CURATEcamp|fire up a webcam]], make it public and share the link<br />
* GoogleDocs: we can live edit any docs we feel the urge to produce<br />
**[[Collecting_format_ID_test_files|Format ID Test Files Project]]'s [[Collecting_format_ID_test_files#Via_Google_Drive|Google Drive]]<br />
* IRC: The chat room is on the irc.OFTC.net server, and the room name is #openarchives [irc://#openarchives@irc.OFTC.net|irc://#openarchives@irc.OFTC.net]<br />
** Chat room help and browser chat option: [https://www.archivematica.org/wiki/Chat_room https://www.archivematica.org/wiki/Chat_room]<br />
* GitHub: get those pull requests going<br />
** [[Collecting_format_ID_test_files|Format ID Test Files Project]]'s [[Collecting_format_ID_test_files#Via_Google_Drive|Git repo]]<br />
<br />
<br />
==Who ([[Special:UserLogin|Sign up]])==<br />
* '''GMT +12:00''' Digital Preservation Practical Implementers Guild (@DP_PIG)<br />
* ?<br />
* '''GMT +7:00''' [[User:Euan_Cochrane|Euan Cochrane]] (@euanc)<br />
* ?<br />
* '''GMT +2:00''' [[User:Maurice_de_Rooij|TechMaurice]] (NANETH)<br />
* '''GMT +1:00''' [[User:Nicholas_Clarke|Nicholas Clarke]] (@nclarkedk) - netarkivet.dk<br />
* '''GMT +0:00''' [[User:Andy_Jackson|Andy Jackson]] (@anjacks0n), Paul Wheatley (@prwheatley), BL digital preservation team - Maureen (@mopennock), PeterM, Lynn, William, and maybe more...; [[User:David Underdown|David Underdown]] (@davidunderdown9) and maybe some more TNA folk<br />
* ?<br />
* '''GMT -5:00''' Kara Van Malssen (@kvanmalssen), Dave Rice (@dericed), Ben Fino-Radin (@benfinoradin), Gary McGath (@Garym03062), @anarchivist<br />
* '''GMT -5:00''' @lljohnston @blefurgy et al!<br />
* '''GMT -5:00''' [[User:Greg Jansen|Greg Jansen]] @gregj, [[User:Ben Pennell|Ben Pennell]] @pennellben<br />
* '''GMT -5:00''' [[User:Heather Bowden|Heather Bowden]] @heatherbowden - will help when/where I can. Happy to help US East Coasters and Artefactual Team, or whomever. Contact me if you need an extra hand.<br />
* ?<br />
* '''GMT -8:00''' [http://artefactual.com/team Artefactual]: peter (@pjvangarderen), courtney (@snarkivist), evelyn, joseph, mikeC (@mcantelon), mikeG, austin, dan...plus any VanCity people wanting to participate from [http://artefactual.com/contact.html Artefactual office].<br />
<br />
=Project Proposals=<br />
* Document file id requirements / use cases<br />
* ArchiveTeam "Just Solve the Problem" wiki scraping -> structured data (CSV?, XML?, RDF?); as an ongoing service?<br />
* [[Improving format ID coverage]]<br />
** Maybe incorporate [http://www.ace.net.nz/tech/TechFileFormat.html "Almost Every file format in the world!"]<br />
* [[Collecting format ID test files]]<br />
** [[Creating an artificial test set using emulation]]<br />
* [[Improving identification methods]]<br />
** Develop a Format ID [http://digitalcontinuity.org/post/7327791836/emulation-workbench-for-digital-object-format-analysis "Emulation Workbench"] for format analysis<br />
** Document software input and output formats to use in limiting the option set for files of a particular time period (if we know all formats that were creatable during a period when a file was created then we can limit results to only those formats), and for use in [http://digitalcontinuity.org/post/7325561455/mining-application-documentation-for-file-format format intelligence mining].<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] testing<br />
** @archivematica team & volunteers<br />
* @kvanmalssen Improved file id /characterization support for AV files in existing tools like Tika and FITS. An update of Exiftool and inclusion of MediaInfo would be a good start. Or maybe test applicability of ffprobe/avprobe for this task.<br />
** @dericed This is exactly what ffprobe/avprobe does. Whereas the many of the digipres tools do identification by sampling x bytes from the head and tail, ffprobe/avprobe incorporate one of the many extensive demuxing libraries to manage identification of the contents.<br />
** @kvanmalssen - Yes, so can we get avprobe to output in a structured way? And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** @dericed - Yes ffprobe/avprobe have the -print_format (-of) option so you can get json, xml, csv, or others. There's also an xsd published for the output. I suppose ffprobe could be incorporated into FITS but not sure if this is an efficient idea. The premise of FITS seems to put all preservation metadata considerations on the container (file format) but in AV collections the codecs and contained bitstreams are far more significant to consider.<br />
** @kvanmalssen - Issue is we need AV support (including track/bitstream support) in these general tools so people can process mixed collections. That's what I'd like to figure out.<br />
And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** See also [[Improving identification methods]], which could perhaps be split into two or three and one of which merged with the above tweet discussion? [[User:Andy Jackson|Andy Jackson]] 15:20, 22 October 2012 (PDT)<br />
* FITS or Tika bugfix marathon (e.g. [https://issues.apache.org/jira/browse/TIKA-539 this one]).<br />
** Perhaps consider refactoring FITS to re-use existing dependency management tools like Maven and apt/yum/etc instead of manual dependency management? [[User:Andy Jackson|Andy Jackson]] 05:16, 23 October 2012 (PDT)<br />
*** I'm willing to put a fork of FITS on Github if a couple of people say they want it. --[[User:Gary McGath|Gary McGath]] 13:27, 11 November 2012 (PST)<br />
* [[User:Maurice_de_Rooij|TechMaurice]]: Replace container identification function of [https://github.com/openplanets/fido FIDO] using PRONOM container signature.<br />
* [[User:Misty De Meo|Misty De Meo]] Just a thought... it strikes me that the basic functionality of FITS is not super complicated. As well, in my experience, most users are using a fairly minimal set of features. Given some of the problems we're having with FITS, it may be worth doing a minimal rewrite of FITS (in, say, Python or C) with a focus on a) speed, and b) maintainability. This is more than a day's work but could get a start if this is something other people would be interested in. Things I'd want to see would include:<br />
** Don't vendor tools - just recommend versions, but draw form whatever tools the user has installed.<br />
** Implement better AV support (with all the caveats listed above)<br />
** Possibly restrict the number of tools?<br />
** +1 to FITS refactoring --[[User:Greg Jansen|Greg Jansen]] 11:33, 15 November 2012 (PST)<br />
** Implement only the configuration options most people use, and let those be specified on the commandline instead of via XML.<br />
*** [[User:Gary_McGath|Gary McGrath]] on IRC points out that the use of external tools means that in FITS scanned files are independently loaded from disk by multiple tools, introducing unneeded IO overhead. Could be fixed in FITS itself.<br />
''Should we take a poll a day in advance to select 2 or 3 projects or should we just let everyone work on whatever proposal they wish?''<br />
<br />
==Preparation TODO==<br />
* GitHub How To<br />
** Set up temporary FITS and/or Tika forks that we can work on?<br />
* Set up Archivematica instances to test FPR<br />
* Easier signature development tools and/or signature contribution tracking, now partially complete, as outlined in [[Improving format ID coverage]]<br />
* Example file contribution How To document, c.f. [[Collecting format ID test files]]<br />
<br />
=Results=<br />
'''Nov 17. 07:30 UTC -- 30 hours later'''<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
Peter Van Garderen @pjvangarderen<br />
Proud to lead 24hr real time R&D cycle. Thanks #fileidhack people for your passion RT @jordanheit: testing OpenFITS wiki.curatecamp.org/index.php/FITS<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: Testing FITS led me to discover a bug in JHOVE, so #fileidhack is worth something. sourceforge.net/tracker/?func=…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @anjacks0n: Thanks for the files, @carusb github.com/openplanets/fo… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack live link? RT @benfinoradin: Archiving all #fileidhack tweets today pic.twitter.com/7QI1DfmD<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks BL! #fileidhack RT @mopennock: BL team are working on eBook format identification today for #fileidhack - @anjacks0n @petemay et al<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @peshkira: OpenFITS current status: It compiles! #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @petemay: Tika sigs for PDB, Kindle AZW and LRF files created, re-testing over sample file set #fileidhack #eBook<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @mopennock: We've added 7 new eBook signatures to Tika this morning #fileidhack. Great work all!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Everyone ping PRONOM pls! RT @Britpunk80: #fileidhack if you want to create/test/submit your own: …keddatapronom.nationalarchives.gov.uk/sigdev/index.h…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @Britpunk80: I've handed some droid sig files to @anjacks0n on rocketbook, epub, and ibooks. #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @HeatherBowden: @euanc @anjacks0n I have some Quark and InDesign files. You interested? #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Nov1614:00UTC RT @pjvangarderen: Wazzup! West Coast in da fileidhacking house! RT @declan: good morning #fileidhack!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @peshkira: #fileidhack Current status: FITS mavenized. PullRequest/Wiki \w explanation follow. /cc @GaryM03062<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: New commit of OpenFITS allows setting max no. of threads in fits.xml #fileidhack github.com/gmcgath/openfi…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Great work on OpenFITS! Lets keep this alive RT @GaryM03062: Calling a day for #fileidhack. Great working with everyone!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @Snarkivist<br />
#fileidhack team - just catching up on your work today - was internetless - look for summary in the morning<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @pjvangarderen: Nov1601:17UTC @euanc (Perth) #fileidhack IRC - Nov1704:16UTC @archivematica crew still hacking #24hrs+<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @benfinoradin: Good resource on RIFF/RIFX: johnloomis.org/cpe102/asgn/as…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Holy cow, the Quicktime motherload! RT @mistydemeo: Have some Quicktime videos, #fileidhack github.com/openplanets/fo…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: Another update for OpenFITS. Please read the wiki: wiki.curatecamp.org/index.php/FITS… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @mistydemeo: Created @MacHomebrew formula for fidget to make file ID signatures for #fileidhack github.com/mistydemeo/hom…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @pjvangarderen: @mistydemeo meet #openarchives /nick artefactualmtgroom #fileidhack pic.twitter.com/1Ffp1v6Y<br />
<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @archivematica: Artefactual picks up #fileidhack baton. OpenFits debian package launchpad.net/~archivematica… test time!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @anjacks0n: @benfinoradin tweaked your sig, now identifies all test files you sent github.com/openplanets/fo… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: As a side effect of #fileidhack, I've been uploading source changes to JHOVE. sourceforge.net/projects/jhove/<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Thanks GMT! RT @mopennock: It's all go this morning for the #fileidhack! wiki.curatecamp.org/index.php/CURA…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @WilliamKilbride: It's #dpc #ff follow friday. look at #fileidhack Better still, get involved wiki.curatecamp.org/index.php/CURA…<br />
<br />
<br />
</pre><br />
<br />
==Summary==</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_24_hour_worldwide_file_id_hackathon_Nov_16_2012&diff=2439CURATEcamp 24 hour worldwide file id hackathon Nov 16 20122012-11-17T07:55:19Z<p>Courtney C. Mumma: </p>
<hr />
<div>[[Main Page]] > CURATEcamp iPRES 2012 > CURATEcamp and Open Planets Foundation 24 hour file id hackathon Nov 16 2012<br />
<br />
=Results=<br />
<br />
Thanks to participants!@mopennock @WilliamKilbride @GaryM03062 @anjacks0n @carusb @benfinoradin @peshkira @petemay @Britpunk80 @HeatherBowden<br />
-- GaryM03062 discovered a but testing FITS a bug in JHOVE: https://sourceforge.net/tracker/?func=detail&aid=3587890&group_id=221311&atid=1052190 <br />
-- File corpus! https://github.com/openplanets/format-corpus/commit/b0971e1c32b2df7a9bceafe1f00d81f49cb45990 <br />
-- (@benfinoradin) Kept all #fileidhack tweets today pic.twitter.com/7QI1DfmD<br />
-- (@mopennock, @anjacks0n, @petemay et al) British Library team worked on eBook format identification <br />
-- (@peshkira) OpenFITS compiles<br />
-- (@petemay) Tika signatures for PDB, Kindle AZW and LRF files created, re-testing over sample file set #fileidhack #eBook<br />
-- (@mopennock) Added 7 new eBook signatures to Tika this morning <br />
-- Encouraged pinging PRONOM (@Britpunk80) to create/test/submit: [http://test.linkeddatapronom.nationalarchives.gov.uk/sigdev/index.htm]<br />
-- @Britpunk80 handed some droid signature files to @anjacks0n on rocketbook, epub, and ibooks. -- @HeatherBowden shared some Quark and InDesign files. <br />
-- @GaryM03062: New commit of OpenFITS allows setting max no. of threads in fits.xml [https://github.com/gmcgath/openfits]<br />
-- @benfinoradin shared resource on RIFF/RIFX [http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html]<br />
-- the Quicktime motherload! by @mistydemeo - Quicktime videos [https://github.com/openplanets/format-corpus/tree/master/video/Quicktime]<br />
-- OpenFITS : [[FITS#Improving_JHOVE_performance_within_FITS]]<br />
-- @mistydemeo: Created @machomebrew formula for fidget to make file ID signatures for #fileidhack [https://github.com/mistydemeo/homebrew-formulae]<br />
-- #openarchives chat /nick artefactualmtgroom #fileidhack pic.twitter.com/1Ffp1v6Y<br />
-- @anjacks0n: new [https://github.com/anjackson/percipio/downloads | Percipio] and [https://github.com/openplanets/format-corpus/downloads | Fidget] available dev and feedback <br />
-- @pjvangarderen @archivematica: Artefactual picks up #fileidhack baton. OpenFits debian package for testing [https://launchpad.net/~archivematica/+archive/externals-dev/+build/3989642]<br />
-- @GaryM03062 uploaded source changes to JHOVE. [https://sourceforge.net/projects/jhove/]<br />
-- @jordanheit testing OpenFITS [[FITS]]<br />
<br />
=Background=<br />
One break-out session at the CURATEcamp iPRES 2012 was affectionately branded "file id confessional" where we commiserated on the state of our file id tools and processes. We also talked about:<br />
<br />
*We can do better job specifying and documenting our file id requirements / use cases<br />
*We're all hooked on that FITS.xml but [[FITS]] needs performance optimization ASAP (also, Is Harvard up for extra dev?)<br />
*Apache Tika is very actively supported and useful tool for file id and content extraction. How much of our file id requirements can it in fact cover?<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] use case (see also [http://actionplan.fcla.edu/ DAITSS action plans])<br />
* Jason Scott's "[http://ascii.textfiles.com/archives/3645 Let's Just Solve the Problem]" campaign to boldly catalog as much file format info as possible in the month of November.<br />
* also, CURATEcamp iPres participant Paul Wheatley has since posted: [http://www.openplanetsfoundation.org/blogs/2012-10-19-practitioners-have-spoken-we-need-better-characterisation We Need Better Characterization] as well as link to [http://willsworld.blogs.edina.ac.uk/2012/10/18/online-hack-event/ Online Hack Event]. This led to Twitter discussion between @pjvangarderen @anjacks0n @prwheatley about this 24 hr hackathon event.<br />
<br />
==What==<br />
<br />
24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions. <br />
<br />
Project proposals can be made by anyone.<br />
<br />
We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).<br />
<br />
==Why==<br />
* Because we'll probably get some useful stuff done<br />
* Because its fun to work with CURATEcamp people in a CURATEcamp way<br />
* Because doing a 24hr+ worldwide hack with real time collaboration tools is cool<br />
<br />
=Logistics=<br />
<br />
==When: '''Fri Nov 16'''==<br />
<br />
* Friday, November 16, 2012<br />
** [http://wiki.opf-labs.org/display/KB/2012-11-13+OPF+Hackathon+-+Emulation%2C+learn+from+the+experts OPF Emulation Hackathon] is Nov 13-15. Freiburg, Germany. Sorry, Nov 16th was chosen somewhat haphazardly. We didn't mean to compete with OPF Hackathon event. But emulation needs file characterization too? Maybe OPF Emulation Hackathon can hand off some "File Id for Emulation" use cases to the Nov 16 24 hr Hackathon...or better yet, extend the Freiburg event to include participation in the Nov 16 24 hr worldwide #fileidhack event. Great way to cap off their Hackathon week! --[[User:PeterVG|PeterVG]] 11:48, 23 Oct 2012 (PDT)<br />
* <strike>Friday, November 23, 2012</strike><br />
** RT @declan: @pjvangarderen neat idea! You know that date is the day after US Thanksgiving, right? people might be on vacation<br />
<br />
==How==<br />
* Twitter: [https://twitter.com/search/realtime?q=%23fileidhack #fileidhack] (made it shorter)<br />
* CURATEcamp Mediawiki: [[Special:UserLogin|Log-in]] and please help update this page<br />
<br />
Let's put together a schedule, tasklist, & volunteers to road-test these tools for Nov 16:<br />
* Google Hangout: [[Google Hangout for CURATEcamp|fire up a webcam]], make it public and share the link<br />
* GoogleDocs: we can live edit any docs we feel the urge to produce<br />
**[[Collecting_format_ID_test_files|Format ID Test Files Project]]'s [[Collecting_format_ID_test_files#Via_Google_Drive|Google Drive]]<br />
* IRC: The chat room is on the irc.OFTC.net server, and the room name is #openarchives [irc://#openarchives@irc.OFTC.net|irc://#openarchives@irc.OFTC.net]<br />
** Chat room help and browser chat option: [https://www.archivematica.org/wiki/Chat_room https://www.archivematica.org/wiki/Chat_room]<br />
* GitHub: get those pull requests going<br />
** [[Collecting_format_ID_test_files|Format ID Test Files Project]]'s [[Collecting_format_ID_test_files#Via_Google_Drive|Git repo]]<br />
<br />
<br />
==Who ([[Special:UserLogin|Sign up]])==<br />
* '''GMT +12:00''' Digital Preservation Practical Implementers Guild (@DP_PIG)<br />
* ?<br />
* '''GMT +7:00''' [[User:Euan_Cochrane|Euan Cochrane]] (@euanc)<br />
* ?<br />
* '''GMT +2:00''' [[User:Maurice_de_Rooij|TechMaurice]] (NANETH)<br />
* '''GMT +1:00''' [[User:Nicholas_Clarke|Nicholas Clarke]] (@nclarkedk) - netarkivet.dk<br />
* '''GMT +0:00''' [[User:Andy_Jackson|Andy Jackson]] (@anjacks0n), Paul Wheatley (@prwheatley), BL digital preservation team - Maureen (@mopennock), PeterM, Lynn, William, and maybe more...; [[User:David Underdown|David Underdown]] (@davidunderdown9) and maybe some more TNA folk<br />
* ?<br />
* '''GMT -5:00''' Kara Van Malssen (@kvanmalssen), Dave Rice (@dericed), Ben Fino-Radin (@benfinoradin), Gary McGath (@Garym03062), @anarchivist<br />
* '''GMT -5:00''' @lljohnston @blefurgy et al!<br />
* '''GMT -5:00''' [[User:Greg Jansen|Greg Jansen]] @gregj, [[User:Ben Pennell|Ben Pennell]] @pennellben<br />
* '''GMT -5:00''' [[User:Heather Bowden|Heather Bowden]] @heatherbowden - will help when/where I can. Happy to help US East Coasters and Artefactual Team, or whomever. Contact me if you need an extra hand.<br />
* ?<br />
* '''GMT -8:00''' [http://artefactual.com/team Artefactual]: peter (@pjvangarderen), courtney (@snarkivist), evelyn, joseph, mikeC (@mcantelon), mikeG, austin, dan...plus any VanCity people wanting to participate from [http://artefactual.com/contact.html Artefactual office].<br />
<br />
=Project Proposals=<br />
* Document file id requirements / use cases<br />
* ArchiveTeam "Just Solve the Problem" wiki scraping -> structured data (CSV?, XML?, RDF?); as an ongoing service?<br />
* [[Improving format ID coverage]]<br />
** Maybe incorporate [http://www.ace.net.nz/tech/TechFileFormat.html "Almost Every file format in the world!"]<br />
* [[Collecting format ID test files]]<br />
** [[Creating an artificial test set using emulation]]<br />
* [[Improving identification methods]]<br />
** Develop a Format ID [http://digitalcontinuity.org/post/7327791836/emulation-workbench-for-digital-object-format-analysis "Emulation Workbench"] for format analysis<br />
** Document software input and output formats to use in limiting the option set for files of a particular time period (if we know all formats that were creatable during a period when a file was created then we can limit results to only those formats), and for use in [http://digitalcontinuity.org/post/7325561455/mining-application-documentation-for-file-format format intelligence mining].<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] testing<br />
** @archivematica team & volunteers<br />
* @kvanmalssen Improved file id /characterization support for AV files in existing tools like Tika and FITS. An update of Exiftool and inclusion of MediaInfo would be a good start. Or maybe test applicability of ffprobe/avprobe for this task.<br />
** @dericed This is exactly what ffprobe/avprobe does. Whereas the many of the digipres tools do identification by sampling x bytes from the head and tail, ffprobe/avprobe incorporate one of the many extensive demuxing libraries to manage identification of the contents.<br />
** @kvanmalssen - Yes, so can we get avprobe to output in a structured way? And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** @dericed - Yes ffprobe/avprobe have the -print_format (-of) option so you can get json, xml, csv, or others. There's also an xsd published for the output. I suppose ffprobe could be incorporated into FITS but not sure if this is an efficient idea. The premise of FITS seems to put all preservation metadata considerations on the container (file format) but in AV collections the codecs and contained bitstreams are far more significant to consider.<br />
** @kvanmalssen - Issue is we need AV support (including track/bitstream support) in these general tools so people can process mixed collections. That's what I'd like to figure out.<br />
And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** See also [[Improving identification methods]], which could perhaps be split into two or three and one of which merged with the above tweet discussion? [[User:Andy Jackson|Andy Jackson]] 15:20, 22 October 2012 (PDT)<br />
* FITS or Tika bugfix marathon (e.g. [https://issues.apache.org/jira/browse/TIKA-539 this one]).<br />
** Perhaps consider refactoring FITS to re-use existing dependency management tools like Maven and apt/yum/etc instead of manual dependency management? [[User:Andy Jackson|Andy Jackson]] 05:16, 23 October 2012 (PDT)<br />
*** I'm willing to put a fork of FITS on Github if a couple of people say they want it. --[[User:Gary McGath|Gary McGath]] 13:27, 11 November 2012 (PST)<br />
* [[User:Maurice_de_Rooij|TechMaurice]]: Replace container identification function of [https://github.com/openplanets/fido FIDO] using PRONOM container signature.<br />
* [[User:Misty De Meo|Misty De Meo]] Just a thought... it strikes me that the basic functionality of FITS is not super complicated. As well, in my experience, most users are using a fairly minimal set of features. Given some of the problems we're having with FITS, it may be worth doing a minimal rewrite of FITS (in, say, Python or C) with a focus on a) speed, and b) maintainability. This is more than a day's work but could get a start if this is something other people would be interested in. Things I'd want to see would include:<br />
** Don't vendor tools - just recommend versions, but draw form whatever tools the user has installed.<br />
** Implement better AV support (with all the caveats listed above)<br />
** Possibly restrict the number of tools?<br />
** +1 to FITS refactoring --[[User:Greg Jansen|Greg Jansen]] 11:33, 15 November 2012 (PST)<br />
** Implement only the configuration options most people use, and let those be specified on the commandline instead of via XML.<br />
*** [[User:Gary_McGath|Gary McGrath]] on IRC points out that the use of external tools means that in FITS scanned files are independently loaded from disk by multiple tools, introducing unneeded IO overhead. Could be fixed in FITS itself.<br />
''Should we take a poll a day in advance to select 2 or 3 projects or should we just let everyone work on whatever proposal they wish?''<br />
<br />
==Preparation TODO==<br />
* GitHub How To<br />
** Set up temporary FITS and/or Tika forks that we can work on?<br />
* Set up Archivematica instances to test FPR<br />
* Easier signature development tools and/or signature contribution tracking, now partially complete, as outlined in [[Improving format ID coverage]]<br />
* Example file contribution How To document, c.f. [[Collecting format ID test files]]<br />
<br />
=Results=<br />
'''Nov 17. 07:30 UTC -- 30 hours later'''<br />
<pre style="white-space: pre-wrap; <br />
white-space: -moz-pre-wrap;<br />
white-space: -pre-wrap;<br />
white-space: -o-pre-wrap; <br />
word-wrap: break-word"><br />
<br />
Peter Van Garderen @pjvangarderen<br />
Proud to lead 24hr real time R&D cycle. Thanks #fileidhack people for your passion RT @jordanheit: testing OpenFITS wiki.curatecamp.org/index.php/FITS<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: Testing FITS led me to discover a bug in JHOVE, so #fileidhack is worth something. sourceforge.net/tracker/?func=…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @anjacks0n: Thanks for the files, @carusb github.com/openplanets/fo… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack live link? RT @benfinoradin: Archiving all #fileidhack tweets today pic.twitter.com/7QI1DfmD<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks BL! #fileidhack RT @mopennock: BL team are working on eBook format identification today for #fileidhack - @anjacks0n @petemay et al<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @peshkira: OpenFITS current status: It compiles! #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @petemay: Tika sigs for PDB, Kindle AZW and LRF files created, re-testing over sample file set #fileidhack #eBook<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @mopennock: We've added 7 new eBook signatures to Tika this morning #fileidhack. Great work all!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Everyone ping PRONOM pls! RT @Britpunk80: #fileidhack if you want to create/test/submit your own: …keddatapronom.nationalarchives.gov.uk/sigdev/index.h…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @Britpunk80: I've handed some droid sig files to @anjacks0n on rocketbook, epub, and ibooks. #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @HeatherBowden: @euanc @anjacks0n I have some Quark and InDesign files. You interested? #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Nov1614:00UTC RT @pjvangarderen: Wazzup! West Coast in da fileidhacking house! RT @declan: good morning #fileidhack!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @peshkira: #fileidhack Current status: FITS mavenized. PullRequest/Wiki \w explanation follow. /cc @GaryM03062<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: New commit of OpenFITS allows setting max no. of threads in fits.xml #fileidhack github.com/gmcgath/openfi…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Great work on OpenFITS! Lets keep this alive RT @GaryM03062: Calling a day for #fileidhack. Great working with everyone!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @Snarkivist<br />
#fileidhack team - just catching up on your work today - was internetless - look for summary in the morning<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @pjvangarderen: Nov1601:17UTC @euanc (Perth) #fileidhack IRC - Nov1704:16UTC @archivematica crew still hacking #24hrs+<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @benfinoradin: Good resource on RIFF/RIFX: johnloomis.org/cpe102/asgn/as…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Holy cow, the Quicktime motherload! RT @mistydemeo: Have some Quicktime videos, #fileidhack github.com/openplanets/fo…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: Another update for OpenFITS. Please read the wiki: wiki.curatecamp.org/index.php/FITS… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @mistydemeo: Created @MacHomebrew formula for fidget to make file ID signatures for #fileidhack github.com/mistydemeo/hom…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @pjvangarderen: @mistydemeo meet #openarchives /nick artefactualmtgroom #fileidhack pic.twitter.com/1Ffp1v6Y<br />
<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @archivematica: Artefactual picks up #fileidhack baton. OpenFits debian package launchpad.net/~archivematica… test time!<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @anjacks0n: @benfinoradin tweaked your sig, now identifies all test files you sent github.com/openplanets/fo… #fileidhack<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @GaryM03062: As a side effect of #fileidhack, I've been uploading source changes to JHOVE. sourceforge.net/projects/jhove/<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack Thanks GMT! RT @mopennock: It's all go this morning for the #fileidhack! wiki.curatecamp.org/index.php/CURA…<br />
<br />
<br />
Peter Van Garderen @pjvangarderen<br />
Thanks #fileidhack RT @WilliamKilbride: It's #dpc #ff follow friday. look at #fileidhack Better still, get involved wiki.curatecamp.org/index.php/CURA…<br />
<br />
<br />
</pre><br />
<br />
==Summary==</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_24_hour_worldwide_file_id_hackathon_Nov_16_2012&diff=2333CURATEcamp 24 hour worldwide file id hackathon Nov 16 20122012-11-12T21:26:42Z<p>Courtney C. Mumma: /* How */</p>
<hr />
<div>[[Main Page]] > CURATEcamp iPRES 2012 > CURATEcamp and Open Planets Foundation 24 hour file id hackathon Nov 16 2012<br />
<br />
=Background=<br />
One break-out session at the CURATEcamp iPRES 2012 was affectionately branded "file id confessional" where we commiserated on the state of our file id tools and processes. We also talked about:<br />
<br />
*We can do better job specifying and documenting our file id requirements / use cases<br />
*We're all hooked on that FITS.xml but FITS needs performance optimization ASAP (also, Is Harvard up for extra dev?)<br />
*Apache Tika is very actively supported and useful tool for file id and content extraction. How much of our file id requirements can it in fact cover?<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] use case (see also [http://actionplan.fcla.edu/ DAITSS action plans])<br />
* Jason Scott's "[http://ascii.textfiles.com/archives/3645 Let's Just Solve the Problem]" campaign to boldly catalog as much file format info as possible in the month of November.<br />
* also, CURATEcamp iPres participant Paul Wheatley has since posted: [http://www.openplanetsfoundation.org/blogs/2012-10-19-practitioners-have-spoken-we-need-better-characterisation We Need Better Characterization] as well as link to [http://willsworld.blogs.edina.ac.uk/2012/10/18/online-hack-event/ Online Hack Event]. This led to Twitter discussion between @pjvangarderen @anjacks0n @prwheatley about this 24 hr hackathon event.<br />
<br />
=What=<br />
<br />
24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions. <br />
<br />
Project proposals can be made by anyone.<br />
<br />
We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).<br />
<br />
=When: '''Fri Nov 16'''=<br />
<br />
* Friday, November 16, 2012<br />
** [http://wiki.opf-labs.org/display/KB/2012-11-13+OPF+Hackathon+-+Emulation%2C+learn+from+the+experts OPF Emulation Hackathon] is Nov 13-15. Freiburg, Germany. Sorry, Nov 16th was chosen somewhat haphazardly. We didn't mean to compete with OPF Hackathon event. But emulation needs file characterization too? Maybe OPF Emulation Hackathon can hand off some "File Id for Emulation" use cases to the Nov 16 24 hr Hackathon...or better yet, extend the Freiburg event to include participation in the Nov 16 24 hr worldwide #fileidhack event. Great way to cap off their Hackathon week! --[[User:PeterVG|PeterVG]] 11:48, 23 Oct 2012 (PDT)<br />
* <strike>Friday, November 23, 2012</strike><br />
** RT @declan: @pjvangarderen neat idea! You know that date is the day after US Thanksgiving, right? people might be on vacation<br />
<br />
=How=<br />
* Twitter: [https://twitter.com/search/realtime?q=%23fileidhack #fileidhack] (made it shorter)<br />
* CURATEcamp Mediawiki: [[Special:UserLogin|Log-in]] and please help update this page<br />
<br />
Let's put together a schedule, tasklist, & volunteers to road-test these tools for Nov 16:<br />
* Google Hangout: [[Google Hangout for CURATEcamp|fire up a webcam]] <br />
* GoogleDocs: we can live edit any docs we feel the urge to produce<br />
* IRC: The chat room is on the irc.OFTC.net server, and the room name is #openarchives<br />
[irc://#openarchives@irc.OFTC.net|irc://#openarchives@irc.OFTC.net]<br />
* GitHub: get those pull requests going<br />
<br />
=Why=<br />
* Because we'll probably get some useful stuff done<br />
* Because its fun to work with CURATEcamp people in a CURATEcamp way<br />
* Because doing a 24hr+ worldwide hack with real time collaboration tools is cool<br />
<br />
=Who ([[Special:UserLogin|Sign up]])=<br />
* '''GMT +12:00''' Digital Preservation Practical Implementers Guild (@DP_PIG)<br />
* ?<br />
* '''GMT +7:00''' [[User:Euan_Cochrane|Euan Cochrane]] (@euanc)<br />
* ?<br />
* '''GMT +2:00''' [[User:Maurice_de_Rooij|TechMaurice]] (NANETH)<br />
* '''GMT +1:00''' [[User:Nicholas_Clarke|Nicholas Clarke]] (@nclarkedk) - netarkivet.dk<br />
* '''GMT +0:00''' [[User:Andy_Jackson|Andy Jackson]] (@anjacks0n), Paul Wheatley (@prwheatley), BL digital preservation team - Maureen (@mopennock) PeteC, PeterM, Lynn, William, and maybe more...; [[User:David Underdown|David Underdown]] (@davidunderdown9) and maybe some more TNA folk<br />
* ?<br />
* '''GMT -5:00''' Kara Van Malssen (@kvanmalssen), Dave Rice (@dericed), Ben Fino-Radin (@benfinoradin), Gary McGath (@Garym03062), @anarchivist<br />
* '''GMT -5:00''' @lljohnston @blefurgy et al!<br />
* ?<br />
* '''GMT -8:00''' [http://artefactual.com/team Artefactual]: peter (@pjvangarderen), courtney (@snarkivist), evelyn, joseph, mikeC (@mcantelon), mikeG, austin, dan...plus any VanCity people wanting to participate from [http://artefactual.com/contact.html Artefactual office].<br />
<br />
=Project Proposals=<br />
* Document file id requirements / use cases<br />
* ArchiveTeam "Just Solve the Problem" wiki scraping -> structured data (CSV?, XML?, RDF?); as an ongoing service?<br />
* [[Improving format ID coverage]]<br />
** Maybe incorporate [http://www.ace.net.nz/tech/TechFileFormat.html "Almost Every file format in the world!"]<br />
* [[Collecting format ID test files]]<br />
* [[Improving identification methods]]<br />
** Develop a Format ID [http://digitalcontinuity.org/post/7327791836/emulation-workbench-for-digital-object-format-analysis "Emulation Workbench"] for format analysis<br />
** Document software input and output formats to use in limiting the option set for files of a particular time period (if we know all formats that were creatable during a period when a file was created then we can limit results to only those formats), and for use in [http://digitalcontinuity.org/post/7325561455/mining-application-documentation-for-file-format format intelligence mining].<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] testing<br />
** @archivematica team & volunteers<br />
* @kvanmalssen Improved file id /characterization support for AV files in existing tools like Tika and FITS. An update of Exiftool and inclusion of MediaInfo would be a good start. Or maybe test applicability of ffprobe/avprobe for this task.<br />
** @dericed This is exactly what ffprobe/avprobe does. Whereas the many of the digipres tools do identification by sampling x bytes from the head and tail, ffprobe/avprobe incorporate one of the many extensive demuxing libraries to manage identification of the contents.<br />
** @kvanmalssen - Yes, so can we get avprobe to output in a structured way? And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** @dericed - Yes ffprobe/avprobe have the -print_format (-of) option so you can get json, xml, csv, or others. There's also an xsd published for the output. I suppose ffprobe could be incorporated into FITS but not sure if this is an efficient idea. The premise of FITS seems to put all preservation metadata considerations on the container (file format) but in AV collections the codecs and contained bitstreams are far more significant to consider.<br />
** @kvanmalssen - Issue is we need AV support (including track/bitstream support) in these general tools so people can process mixed collections. That's what I'd like to figure out.<br />
And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** See also [[Improving identification methods]], which could perhaps be split into two or three and one of which merged with the above tweet discussion? [[User:Andy Jackson|Andy Jackson]] 15:20, 22 October 2012 (PDT)<br />
* FITS or Tika bugfix marathon (e.g. [https://issues.apache.org/jira/browse/TIKA-539 this one]).<br />
** Perhaps consider refactoring FITS to re-use existing dependency management tools like Maven and apt/yum/etc instead of manual dependency management? [[User:Andy Jackson|Andy Jackson]] 05:16, 23 October 2012 (PDT)<br />
*** I'm willing to put a fork of FITS on Github if a couple of people say they want it. --[[User:Gary McGath|Gary McGath]] 13:27, 11 November 2012 (PST)<br />
* [[User:Maurice_de_Rooij|TechMaurice]]: Replace container identification function of [https://github.com/openplanets/fido FIDO] using PRONOM container signature.<br />
''Should we take a poll a day in advance to select 2 or 3 projects or should we just let everyone work on whatever proposal they wish?''<br />
<br />
=Preparation TODO=<br />
* GitHub How To<br />
** Set up temporary FITS and/or Tika forks that we can work on?<br />
* Set up Archivematica instances to test FPR<br />
* Easier signature development tools and/or signature contribution tracking, as outlined in [[Improving format ID coverage]]<br />
* Example file contribution How To document, c.f. [[Collecting format ID test files]]<br />
* Prep Archivematica dev VMs (incl Tika checkout), spin up & grant IPs/SSH to Hackfest participants upon request (Artefactual: [http://artefactual.com/austin-trask.html Austin])</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_24_hour_worldwide_file_id_hackathon_Nov_16_2012&diff=2332CURATEcamp 24 hour worldwide file id hackathon Nov 16 20122012-11-12T21:26:05Z<p>Courtney C. Mumma: /* How */</p>
<hr />
<div>[[Main Page]] > CURATEcamp iPRES 2012 > CURATEcamp and Open Planets Foundation 24 hour file id hackathon Nov 16 2012<br />
<br />
=Background=<br />
One break-out session at the CURATEcamp iPRES 2012 was affectionately branded "file id confessional" where we commiserated on the state of our file id tools and processes. We also talked about:<br />
<br />
*We can do better job specifying and documenting our file id requirements / use cases<br />
*We're all hooked on that FITS.xml but FITS needs performance optimization ASAP (also, Is Harvard up for extra dev?)<br />
*Apache Tika is very actively supported and useful tool for file id and content extraction. How much of our file id requirements can it in fact cover?<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] use case (see also [http://actionplan.fcla.edu/ DAITSS action plans])<br />
* Jason Scott's "[http://ascii.textfiles.com/archives/3645 Let's Just Solve the Problem]" campaign to boldly catalog as much file format info as possible in the month of November.<br />
* also, CURATEcamp iPres participant Paul Wheatley has since posted: [http://www.openplanetsfoundation.org/blogs/2012-10-19-practitioners-have-spoken-we-need-better-characterisation We Need Better Characterization] as well as link to [http://willsworld.blogs.edina.ac.uk/2012/10/18/online-hack-event/ Online Hack Event]. This led to Twitter discussion between @pjvangarderen @anjacks0n @prwheatley about this 24 hr hackathon event.<br />
<br />
=What=<br />
<br />
24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions. <br />
<br />
Project proposals can be made by anyone.<br />
<br />
We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).<br />
<br />
=When: '''Fri Nov 16'''=<br />
<br />
* Friday, November 16, 2012<br />
** [http://wiki.opf-labs.org/display/KB/2012-11-13+OPF+Hackathon+-+Emulation%2C+learn+from+the+experts OPF Emulation Hackathon] is Nov 13-15. Freiburg, Germany. Sorry, Nov 16th was chosen somewhat haphazardly. We didn't mean to compete with OPF Hackathon event. But emulation needs file characterization too? Maybe OPF Emulation Hackathon can hand off some "File Id for Emulation" use cases to the Nov 16 24 hr Hackathon...or better yet, extend the Freiburg event to include participation in the Nov 16 24 hr worldwide #fileidhack event. Great way to cap off their Hackathon week! --[[User:PeterVG|PeterVG]] 11:48, 23 Oct 2012 (PDT)<br />
* <strike>Friday, November 23, 2012</strike><br />
** RT @declan: @pjvangarderen neat idea! You know that date is the day after US Thanksgiving, right? people might be on vacation<br />
<br />
=How=<br />
* Twitter: [https://twitter.com/search/realtime?q=%23fileidhack #fileidhack] (made it shorter)<br />
* CURATEcamp Mediawiki: [[Special:UserLogin|Log-in]] and please help update this page<br />
<br />
Let's put together a schedule, tasklist, & volunteers to road-test these tools for Nov 16:<br />
* Google Hangout: [[Google Hangout for CURATEcamp|fire up a webcam]] <br />
* GoogleDocs: we can live edit any docs we feel the urge to produce<br />
* IRC: The chat room is on the irc.OFTC.net server, and the room name is #openarchives<br />
[irc://#openarchives@irc.OFTC.net]<br />
* GitHub: get those pull requests going<br />
<br />
=Why=<br />
* Because we'll probably get some useful stuff done<br />
* Because its fun to work with CURATEcamp people in a CURATEcamp way<br />
* Because doing a 24hr+ worldwide hack with real time collaboration tools is cool<br />
<br />
=Who ([[Special:UserLogin|Sign up]])=<br />
* '''GMT +12:00''' Digital Preservation Practical Implementers Guild (@DP_PIG)<br />
* ?<br />
* '''GMT +7:00''' [[User:Euan_Cochrane|Euan Cochrane]] (@euanc)<br />
* ?<br />
* '''GMT +2:00''' [[User:Maurice_de_Rooij|TechMaurice]] (NANETH)<br />
* '''GMT +1:00''' [[User:Nicholas_Clarke|Nicholas Clarke]] (@nclarkedk) - netarkivet.dk<br />
* '''GMT +0:00''' [[User:Andy_Jackson|Andy Jackson]] (@anjacks0n), Paul Wheatley (@prwheatley), BL digital preservation team - Maureen (@mopennock) PeteC, PeterM, Lynn, William, and maybe more...; [[User:David Underdown|David Underdown]] (@davidunderdown9) and maybe some more TNA folk<br />
* ?<br />
* '''GMT -5:00''' Kara Van Malssen (@kvanmalssen), Dave Rice (@dericed), Ben Fino-Radin (@benfinoradin), Gary McGath (@Garym03062), @anarchivist<br />
* '''GMT -5:00''' @lljohnston @blefurgy et al!<br />
* ?<br />
* '''GMT -8:00''' [http://artefactual.com/team Artefactual]: peter (@pjvangarderen), courtney (@snarkivist), evelyn, joseph, mikeC (@mcantelon), mikeG, austin, dan...plus any VanCity people wanting to participate from [http://artefactual.com/contact.html Artefactual office].<br />
<br />
=Project Proposals=<br />
* Document file id requirements / use cases<br />
* ArchiveTeam "Just Solve the Problem" wiki scraping -> structured data (CSV?, XML?, RDF?); as an ongoing service?<br />
* [[Improving format ID coverage]]<br />
** Maybe incorporate [http://www.ace.net.nz/tech/TechFileFormat.html "Almost Every file format in the world!"]<br />
* [[Collecting format ID test files]]<br />
* [[Improving identification methods]]<br />
** Develop a Format ID [http://digitalcontinuity.org/post/7327791836/emulation-workbench-for-digital-object-format-analysis "Emulation Workbench"] for format analysis<br />
** Document software input and output formats to use in limiting the option set for files of a particular time period (if we know all formats that were creatable during a period when a file was created then we can limit results to only those formats), and for use in [http://digitalcontinuity.org/post/7325561455/mining-application-documentation-for-file-format format intelligence mining].<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] testing<br />
** @archivematica team & volunteers<br />
* @kvanmalssen Improved file id /characterization support for AV files in existing tools like Tika and FITS. An update of Exiftool and inclusion of MediaInfo would be a good start. Or maybe test applicability of ffprobe/avprobe for this task.<br />
** @dericed This is exactly what ffprobe/avprobe does. Whereas the many of the digipres tools do identification by sampling x bytes from the head and tail, ffprobe/avprobe incorporate one of the many extensive demuxing libraries to manage identification of the contents.<br />
** @kvanmalssen - Yes, so can we get avprobe to output in a structured way? And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** @dericed - Yes ffprobe/avprobe have the -print_format (-of) option so you can get json, xml, csv, or others. There's also an xsd published for the output. I suppose ffprobe could be incorporated into FITS but not sure if this is an efficient idea. The premise of FITS seems to put all preservation metadata considerations on the container (file format) but in AV collections the codecs and contained bitstreams are far more significant to consider.<br />
** @kvanmalssen - Issue is we need AV support (including track/bitstream support) in these general tools so people can process mixed collections. That's what I'd like to figure out.<br />
And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** See also [[Improving identification methods]], which could perhaps be split into two or three and one of which merged with the above tweet discussion? [[User:Andy Jackson|Andy Jackson]] 15:20, 22 October 2012 (PDT)<br />
* FITS or Tika bugfix marathon (e.g. [https://issues.apache.org/jira/browse/TIKA-539 this one]).<br />
** Perhaps consider refactoring FITS to re-use existing dependency management tools like Maven and apt/yum/etc instead of manual dependency management? [[User:Andy Jackson|Andy Jackson]] 05:16, 23 October 2012 (PDT)<br />
*** I'm willing to put a fork of FITS on Github if a couple of people say they want it. --[[User:Gary McGath|Gary McGath]] 13:27, 11 November 2012 (PST)<br />
* [[User:Maurice_de_Rooij|TechMaurice]]: Replace container identification function of [https://github.com/openplanets/fido FIDO] using PRONOM container signature.<br />
''Should we take a poll a day in advance to select 2 or 3 projects or should we just let everyone work on whatever proposal they wish?''<br />
<br />
=Preparation TODO=<br />
* GitHub How To<br />
** Set up temporary FITS and/or Tika forks that we can work on?<br />
* Set up Archivematica instances to test FPR<br />
* Easier signature development tools and/or signature contribution tracking, as outlined in [[Improving format ID coverage]]<br />
* Example file contribution How To document, c.f. [[Collecting format ID test files]]<br />
* Prep Archivematica dev VMs (incl Tika checkout), spin up & grant IPs/SSH to Hackfest participants upon request (Artefactual: [http://artefactual.com/austin-trask.html Austin])</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_24_hour_worldwide_file_id_hackathon_Nov_16_2012&diff=2328CURATEcamp 24 hour worldwide file id hackathon Nov 16 20122012-11-07T20:28:27Z<p>Courtney C. Mumma: /* Why */</p>
<hr />
<div>[[Main Page]] > CURATEcamp iPRES 2012 > CURATEcamp and Open Planets Foundation 24 hour file id hackathon Nov 16 2012<br />
<br />
=Background=<br />
One break-out session at the CURATEcamp iPRES 2012 was affectionately branded "file id confessional" where we commiserated on the state of our file id tools and processes. We also talked about:<br />
<br />
*We can do better job specifying and documenting our file id requirements / use cases<br />
*We're all hooked on that FITS.xml but FITS needs performance optimization ASAP (also, Is Harvard up for extra dev?)<br />
*Apache Tika is very actively supported and useful tool for file id and content extraction. How much of our file id requirements can it in fact cover?<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] use case (see also [http://actionplan.fcla.edu/ DAITSS action plans])<br />
* Jason Scott's "[http://ascii.textfiles.com/archives/3645 Let's Just Solve the Problem]" campaign to boldly catalog as much file format info as possible in the month of November.<br />
* also, CURATEcamp iPres participant Paul Wheatley has since posted: [http://www.openplanetsfoundation.org/blogs/2012-10-19-practitioners-have-spoken-we-need-better-characterisation We Need Better Characterization] as well as link to [http://willsworld.blogs.edina.ac.uk/2012/10/18/online-hack-event/ Online Hack Event]. This led to Twitter discussion between @pjvangarderen @anjacks0n @prwheatley about this 24 hr hackathon event.<br />
<br />
=What=<br />
<br />
24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions. <br />
<br />
Project proposals can be made by anyone.<br />
<br />
We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).<br />
<br />
=When: '''Fri Nov 16'''=<br />
<br />
* Friday, November 16, 2012<br />
** [http://wiki.opf-labs.org/display/KB/2012-11-13+OPF+Hackathon+-+Emulation%2C+learn+from+the+experts OPF Emulation Hackathon] is Nov 13-15. Freiburg, Germany. Sorry, Nov 16th was chosen somewhat haphazardly. We didn't mean to compete with OPF Hackathon event. But emulation needs file characterization too? Maybe OPF Emulation Hackathon can hand off some "File Id for Emulation" use cases to the Nov 16 24 hr Hackathon...or better yet, extend the Freiburg event to include participation in the Nov 16 24 hr worldwide #fileidhack event. Great way to cap off their Hackathon week! --[[User:PeterVG|PeterVG]] 11:48, 23 Oct 2012 (PDT)<br />
* <strike>Friday, November 23, 2012</strike><br />
** RT @declan: @pjvangarderen neat idea! You know that date is the day after US Thanksgiving, right? people might be on vacation<br />
<br />
=How=<br />
* Twitter: [https://twitter.com/search/realtime?q=%23fileidhack #fileidhack] (made it shorter)<br />
* CURATEcamp Mediawiki: [[Special:UserLogin|Log-in]] and please help update this page<br />
<br />
Let's put together a schedule, tasklist, & volunteers to road-test these tools for Nov 16:<br />
* Google Hangout: [[Google Hangout for CURATEcamp|fire up a webcam]] <br />
* GoogleDocs: we can live edit any docs we feel the urge to produce<br />
* IRC: use existing channel or create one just for event?<br />
* GitHub: get those pull requests going<br />
<br />
=Why=<br />
* Because we'll probably get some useful stuff done<br />
* Because its fun to work with CURATEcamp people in a CURATEcamp way<br />
* Because doing a 24hr+ worldwide hack with real time collaboration tools is cool<br />
<br />
=Who ([[Special:UserLogin|Sign up]])=<br />
* '''GMT +12:00''' Digital Preservation Practical Implementers Guild (@DP_PIG)<br />
* ?<br />
* '''GMT +7:00''' [[User:Euan_Cochrane|Euan Cochrane]] (@euanc)<br />
* ?<br />
* '''GMT +2:00''' [[User:Maurice_de_Rooij|TechMaurice]] (NANETH)<br />
* '''GMT +1:00''' [[User:Nicholas_Clarke|Nicholas Clarke]] (@nclarkedk) - netarkivet.dk<br />
* '''GMT +0:00''' [[User:Andy_Jackson|Andy Jackson]] (@anjacks0n), Paul Wheatley (@prwheatley), BL digital preservation team - Maureen (@mopennock) PeteC, PeterM, Lynn, William, and maybe more...; [[User:David Underdown|David Underdown]] (@davidunderdown9) and maybe some more TNA folk<br />
* ?<br />
* '''GMT -5:00''' Kara Van Malssen (@kvanmalssen), Dave Rice (@dericed), Ben Fino-Radin (@benfinoradin), Gary McGath (@Garym03062), @anarchivist<br />
* '''GMT -5:00''' @lljohnston @blefurgy et al!<br />
* ?<br />
* '''GMT -8:00''' [http://artefactual.com/team Artefactual]: peter (@pjvangarderen), courtney (@snarkivist), evelyn, joseph, mikeC (@mcantelon), mikeG, austin, dan...plus any VanCity people wanting to participate from [http://artefactual.com/contact.html Artefactual office].<br />
<br />
=Project Proposals=<br />
* Document file id requirements / use cases<br />
* ArchiveTeam "Just Solve the Problem" wiki scraping -> structured data (CSV?, XML?, RDF?); as an ongoing service?<br />
* [[Improving format ID coverage]]<br />
** Maybe incorporate [http://www.ace.net.nz/tech/TechFileFormat.html "Almost Every file format in the world!"]<br />
* [[Collecting format ID test files]]<br />
* [[Improving identification methods]]<br />
** Develop a Format ID [http://digitalcontinuity.org/post/7327791836/emulation-workbench-for-digital-object-format-analysis "Emulation Workbench"] for format analysis<br />
** Document software input and output formats to use in limiting the option set for files of a particular time period (if we know all formats that were creatable during a period when a file was created then we can limit results to only those formats), and for use in [http://digitalcontinuity.org/post/7325561455/mining-application-documentation-for-file-format format intelligence mining].<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] testing<br />
** @archivematica team & volunteers<br />
* @kvanmalssen Improved file id /characterization support for AV files in existing tools like Tika and FITS. An update of Exiftool and inclusion of MediaInfo would be a good start. Or maybe test applicability of ffprobe/avprobe for this task.<br />
** @dericed This is exactly what ffprobe/avprobe does. Whereas the many of the digipres tools do identification by sampling x bytes from the head and tail, ffprobe/avprobe incorporate one of the many extensive demuxing libraries to manage identification of the contents.<br />
** @kvanmalssen - Yes, so can we get avprobe to output in a structured way? And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** @dericed - Yes ffprobe/avprobe have the -print_format (-of) option so you can get json, xml, csv, or others. There's also an xsd published for the output. I suppose ffprobe could be incorporated into FITS but not sure if this is an efficient idea. The premise of FITS seems to put all preservation metadata considerations on the container (file format) but in AV collections the codecs and contained bitstreams are far more significant to consider.<br />
** @kvanmalssen - Issue is we need AV support (including track/bitstream support) in these general tools so people can process mixed collections. That's what I'd like to figure out.<br />
And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** See also [[Improving identification methods]], which could perhaps be split into two or three and one of which merged with the above tweet discussion? [[User:Andy Jackson|Andy Jackson]] 15:20, 22 October 2012 (PDT)<br />
* FITS or Tika bugfix marathon (e.g. [https://issues.apache.org/jira/browse/TIKA-539 this one]).<br />
** Perhaps consider refactoring FITS to re-use existing dependency management tools like Maven and apt/yum/etc instead of manual dependency management? [[User:Andy Jackson|Andy Jackson]] 05:16, 23 October 2012 (PDT)<br />
* [[User:Maurice_de_Rooij|TechMaurice]]: Replace container identification function of [https://github.com/openplanets/fido FIDO] using PRONOM container signature.<br />
''Should we take a poll a day in advance to select 2 or 3 projects or should we just let everyone work on whatever proposal they wish?''<br />
<br />
=Preparation TODO=<br />
* GitHub How To<br />
** Set up temporary FITS and/or Tika forks that we can work on?<br />
* Set up Archivematica instances to test FPR<br />
* Easier signature development tools and/or signature contribution tracking, as outlined in [[Improving format ID coverage]]<br />
* Example file contribution How To document, c.f. [[Collecting format ID test files]]<br />
* Prep Archivematica dev VMs (incl Tika checkout), spin up & grant IPs/SSH to Hackfest participants upon request (Artefactual: [http://artefactual.com/austin-trask.html Austin])</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_24_hour_worldwide_file_id_hackathon_Nov_16_2012&diff=2324CURATEcamp 24 hour worldwide file id hackathon Nov 16 20122012-11-06T23:28:28Z<p>Courtney C. Mumma: /* Preparation TODO */</p>
<hr />
<div>[[Main Page]] > CURATEcamp iPRES 2012 > CURATEcamp and Open Planets Foundation 24 hour file id hackathon Nov 16 2012<br />
<br />
=Background=<br />
One break-out session at the CURATEcamp iPRES 2012 was affectionately branded "file id confessional" where we commiserated on the state of our file id tools and processes. We also talked about:<br />
<br />
*We can do better job specifying and documenting our file id requirements / use cases<br />
*We're all hooked on that FITS.xml but FITS needs performance optimization ASAP (also, Is Harvard up for extra dev?)<br />
*Apache Tika is very actively supported and useful tool for file id and content extraction. How much of our file id requirements can it in fact cover?<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] use case (see also [http://actionplan.fcla.edu/ DAITSS action plans])<br />
* Jason Scott's "[http://ascii.textfiles.com/archives/3645 Let's Just Solve the Problem]" campaign to boldly catalog as much file format info as possible in the month of November.<br />
* also, CURATEcamp iPres participant Paul Wheatley has since posted: [http://www.openplanetsfoundation.org/blogs/2012-10-19-practitioners-have-spoken-we-need-better-characterisation We Need Better Characterization] as well as link to [http://willsworld.blogs.edina.ac.uk/2012/10/18/online-hack-event/ Online Hack Event]. This led to Twitter discussion between @pjvangarderen @anjacks0n @prwheatley about this 24 hr hackathon event.<br />
<br />
=What=<br />
<br />
24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions. <br />
<br />
Project proposals can be made by anyone.<br />
<br />
We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).<br />
<br />
=When: '''Fri Nov 16'''=<br />
<br />
* Friday, November 16, 2012<br />
** [http://wiki.opf-labs.org/display/KB/2012-11-13+OPF+Hackathon+-+Emulation%2C+learn+from+the+experts OPF Emulation Hackathon] is Nov 13-15. Freiburg, Germany. Sorry, Nov 16th was chosen somewhat haphazardly. We didn't mean to compete with OPF Hackathon event. But emulation needs file characterization too? Maybe OPF Emulation Hackathon can hand off some "File Id for Emulation" use cases to the Nov 16 24 hr Hackathon...or better yet, extend the Freiburg event to include participation in the Nov 16 24 hr worldwide #fileidhack event. Great way to cap off their Hackathon week! --[[User:PeterVG|PeterVG]] 11:48, 23 Oct 2012 (PDT)<br />
* <strike>Friday, November 23, 2012</strike><br />
** RT @declan: @pjvangarderen neat idea! You know that date is the day after US Thanksgiving, right? people might be on vacation<br />
<br />
=How=<br />
* Twitter: [https://twitter.com/search/realtime?q=%23fileidhack #fileidhack] (made it shorter)<br />
* CURATEcamp Mediawiki: [[Special:UserLogin|Log-in]] and please help update this page<br />
<br />
Let's put together a schedule, tasklist, & volunteers to road-test these tools for Nov 16:<br />
* Google Hangout: [[Google Hangout for CURATEcamp|fire up a webcam]] <br />
* GoogleDocs: we can live edit any docs we feel the urge to produce<br />
* IRC: use existing channel or create one just for event?<br />
* GitHub: get those pull requests going<br />
<br />
=Why=<br />
* Because we'll probably get some useful shit done<br />
* Because its fun to work with CURATEcamp people in a CURATEcamp type of way<br />
* Because doing a 24hr+ worldwide hack with real time collaboration tools is cool<br />
<br />
=Who ([[Special:UserLogin|Sign up]])=<br />
* '''GMT +12:00''' Digital Preservation Practical Implementers Guild (@DP_PIG)<br />
* ?<br />
* '''GMT +7:00''' [[User:Euan_Cochrane|Euan Cochrane]] (@euanc)<br />
* ?<br />
* '''GMT +2:00''' [[User:Maurice_de_Rooij|TechMaurice]] (NANETH)<br />
* '''GMT +1:00''' [[User:Nicholas_Clarke|Nicholas Clarke]] (@nclarkedk) - netarkivet.dk<br />
* '''GMT +0:00''' [[User:Andy_Jackson|Andy Jackson]] (@anjacks0n), Paul Wheatley (@prwheatley), BL digital preservation team - Maureen (@mopennock) PeteC, PeterM, Lynn, William, and maybe more...; [[User:David Underdown|David Underdown]] (@davidunderdown9) and maybe some more TNA folk<br />
* ?<br />
* '''GMT -5:00''' Kara Van Malssen (@kvanmalssen), Dave Rice (@dericed), Ben Fino-Radin (@benfinoradin), Gary McGath (@Garym03062), @anarchivist<br />
* '''GMT -5:00''' @lljohnston @blefurgy et al!<br />
* ?<br />
* '''GMT -8:00''' [http://artefactual.com/team Artefactual]: peter (@pjvangarderen), courtney (@snarkivist), evelyn, joseph, mikeC (@mcantelon), mikeG, austin, dan...plus any VanCity people wanting to participate from [http://artefactual.com/contact.html Artefactual office].<br />
<br />
=Project Proposals=<br />
* Document file id requirements / use cases<br />
* ArchiveTeam "Just Solve the Problem" wiki scraping -> structured data (CSV?, XML?, RDF?); as an ongoing service?<br />
* [[Improving format ID coverage]]<br />
** Maybe incorporate [http://www.ace.net.nz/tech/TechFileFormat.html "Almost Every file format in the world!"]<br />
* [[Collecting format ID test files]]<br />
* [[Improving identification methods]]<br />
** Develop a Format ID [http://digitalcontinuity.org/post/7327791836/emulation-workbench-for-digital-object-format-analysis "Emulation Workbench"] for format analysis<br />
** Document software input and output formats to use in limiting the option set for files of a particular time period (if we know all formats that were creatable during a period when a file was created then we can limit results to only those formats), and for use in [http://digitalcontinuity.org/post/7325561455/mining-application-documentation-for-file-format format intelligence mining].<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] testing<br />
** @archivematica team & volunteers<br />
* @kvanmalssen Improved file id /characterization support for AV files in existing tools like Tika and FITS. An update of Exiftool and inclusion of MediaInfo would be a good start. Or maybe test applicability of ffprobe/avprobe for this task.<br />
** @dericed This is exactly what ffprobe/avprobe does. Whereas the many of the digipres tools do identification by sampling x bytes from the head and tail, ffprobe/avprobe incorporate one of the many extensive demuxing libraries to manage identification of the contents.<br />
** @kvanmalssen - Yes, so can we get avprobe to output in a structured way? And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** @dericed - Yes ffprobe/avprobe have the -print_format (-of) option so you can get json, xml, csv, or others. There's also an xsd published for the output. I suppose ffprobe could be incorporated into FITS but not sure if this is an efficient idea. The premise of FITS seems to put all preservation metadata considerations on the container (file format) but in AV collections the codecs and contained bitstreams are far more significant to consider.<br />
** @kvanmalssen - Issue is we need AV support (including track/bitstream support) in these general tools so people can process mixed collections. That's what I'd like to figure out.<br />
And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** See also [[Improving identification methods]], which could perhaps be split into two or three and one of which merged with the above tweet discussion? [[User:Andy Jackson|Andy Jackson]] 15:20, 22 October 2012 (PDT)<br />
* FITS or Tika bugfix marathon (e.g. [https://issues.apache.org/jira/browse/TIKA-539 this one]).<br />
** Perhaps consider refactoring FITS to re-use existing dependency management tools like Maven and apt/yum/etc instead of manual dependency management? [[User:Andy Jackson|Andy Jackson]] 05:16, 23 October 2012 (PDT)<br />
* [[User:Maurice_de_Rooij|TechMaurice]]: Replace container identification function of [https://github.com/openplanets/fido FIDO] using PRONOM container signature.<br />
''Should we take a poll a day in advance to select 2 or 3 projects or should we just let everyone work on whatever proposal they wish?''<br />
<br />
=Preparation TODO=<br />
* GitHub How To<br />
** Set up temporary FITS and/or Tika forks that we can work on?<br />
* Set up Archivematica instances to test FPR<br />
* Easier signature development tools and/or signature contribution tracking, as outlined in [[Improving format ID coverage]]<br />
* Example file contribution How To document, c.f. [[Collecting format ID test files]]<br />
* Prep Archivematica dev VMs (incl Tika checkout), spin up & grant IPs/SSH to Hackfest participants upon request (Artefactual: [http://artefactual.com/austin-trask.html Austin])</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_24_hour_worldwide_file_id_hackathon_Nov_16_2012&diff=2323CURATEcamp 24 hour worldwide file id hackathon Nov 16 20122012-11-06T23:27:06Z<p>Courtney C. Mumma: /* Project Proposals */</p>
<hr />
<div>[[Main Page]] > CURATEcamp iPRES 2012 > CURATEcamp and Open Planets Foundation 24 hour file id hackathon Nov 16 2012<br />
<br />
=Background=<br />
One break-out session at the CURATEcamp iPRES 2012 was affectionately branded "file id confessional" where we commiserated on the state of our file id tools and processes. We also talked about:<br />
<br />
*We can do better job specifying and documenting our file id requirements / use cases<br />
*We're all hooked on that FITS.xml but FITS needs performance optimization ASAP (also, Is Harvard up for extra dev?)<br />
*Apache Tika is very actively supported and useful tool for file id and content extraction. How much of our file id requirements can it in fact cover?<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] use case (see also [http://actionplan.fcla.edu/ DAITSS action plans])<br />
* Jason Scott's "[http://ascii.textfiles.com/archives/3645 Let's Just Solve the Problem]" campaign to boldly catalog as much file format info as possible in the month of November.<br />
* also, CURATEcamp iPres participant Paul Wheatley has since posted: [http://www.openplanetsfoundation.org/blogs/2012-10-19-practitioners-have-spoken-we-need-better-characterisation We Need Better Characterization] as well as link to [http://willsworld.blogs.edina.ac.uk/2012/10/18/online-hack-event/ Online Hack Event]. This led to Twitter discussion between @pjvangarderen @anjacks0n @prwheatley about this 24 hr hackathon event.<br />
<br />
=What=<br />
<br />
24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions. <br />
<br />
Project proposals can be made by anyone.<br />
<br />
We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).<br />
<br />
=When: '''Fri Nov 16'''=<br />
<br />
* Friday, November 16, 2012<br />
** [http://wiki.opf-labs.org/display/KB/2012-11-13+OPF+Hackathon+-+Emulation%2C+learn+from+the+experts OPF Emulation Hackathon] is Nov 13-15. Freiburg, Germany. Sorry, Nov 16th was chosen somewhat haphazardly. We didn't mean to compete with OPF Hackathon event. But emulation needs file characterization too? Maybe OPF Emulation Hackathon can hand off some "File Id for Emulation" use cases to the Nov 16 24 hr Hackathon...or better yet, extend the Freiburg event to include participation in the Nov 16 24 hr worldwide #fileidhack event. Great way to cap off their Hackathon week! --[[User:PeterVG|PeterVG]] 11:48, 23 Oct 2012 (PDT)<br />
* <strike>Friday, November 23, 2012</strike><br />
** RT @declan: @pjvangarderen neat idea! You know that date is the day after US Thanksgiving, right? people might be on vacation<br />
<br />
=How=<br />
* Twitter: [https://twitter.com/search/realtime?q=%23fileidhack #fileidhack] (made it shorter)<br />
* CURATEcamp Mediawiki: [[Special:UserLogin|Log-in]] and please help update this page<br />
<br />
Let's put together a schedule, tasklist, & volunteers to road-test these tools for Nov 16:<br />
* Google Hangout: [[Google Hangout for CURATEcamp|fire up a webcam]] <br />
* GoogleDocs: we can live edit any docs we feel the urge to produce<br />
* IRC: use existing channel or create one just for event?<br />
* GitHub: get those pull requests going<br />
<br />
=Why=<br />
* Because we'll probably get some useful shit done<br />
* Because its fun to work with CURATEcamp people in a CURATEcamp type of way<br />
* Because doing a 24hr+ worldwide hack with real time collaboration tools is cool<br />
<br />
=Who ([[Special:UserLogin|Sign up]])=<br />
* '''GMT +12:00''' Digital Preservation Practical Implementers Guild (@DP_PIG)<br />
* ?<br />
* '''GMT +7:00''' [[User:Euan_Cochrane|Euan Cochrane]] (@euanc)<br />
* ?<br />
* '''GMT +2:00''' [[User:Maurice_de_Rooij|TechMaurice]] (NANETH)<br />
* '''GMT +1:00''' [[User:Nicholas_Clarke|Nicholas Clarke]] (@nclarkedk) - netarkivet.dk<br />
* '''GMT +0:00''' [[User:Andy_Jackson|Andy Jackson]] (@anjacks0n), Paul Wheatley (@prwheatley), BL digital preservation team - Maureen (@mopennock) PeteC, PeterM, Lynn, William, and maybe more...; [[User:David Underdown|David Underdown]] (@davidunderdown9) and maybe some more TNA folk<br />
* ?<br />
* '''GMT -5:00''' Kara Van Malssen (@kvanmalssen), Dave Rice (@dericed), Ben Fino-Radin (@benfinoradin), Gary McGath (@Garym03062), @anarchivist<br />
* '''GMT -5:00''' @lljohnston @blefurgy et al!<br />
* ?<br />
* '''GMT -8:00''' [http://artefactual.com/team Artefactual]: peter (@pjvangarderen), courtney (@snarkivist), evelyn, joseph, mikeC (@mcantelon), mikeG, austin, dan...plus any VanCity people wanting to participate from [http://artefactual.com/contact.html Artefactual office].<br />
<br />
=Project Proposals=<br />
* Document file id requirements / use cases<br />
* ArchiveTeam "Just Solve the Problem" wiki scraping -> structured data (CSV?, XML?, RDF?); as an ongoing service?<br />
* [[Improving format ID coverage]]<br />
** Maybe incorporate [http://www.ace.net.nz/tech/TechFileFormat.html "Almost Every file format in the world!"]<br />
* [[Collecting format ID test files]]<br />
* [[Improving identification methods]]<br />
** Develop a Format ID [http://digitalcontinuity.org/post/7327791836/emulation-workbench-for-digital-object-format-analysis "Emulation Workbench"] for format analysis<br />
** Document software input and output formats to use in limiting the option set for files of a particular time period (if we know all formats that were creatable during a period when a file was created then we can limit results to only those formats), and for use in [http://digitalcontinuity.org/post/7325561455/mining-application-documentation-for-file-format format intelligence mining].<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] testing<br />
** @archivematica team & volunteers<br />
* @kvanmalssen Improved file id /characterization support for AV files in existing tools like Tika and FITS. An update of Exiftool and inclusion of MediaInfo would be a good start. Or maybe test applicability of ffprobe/avprobe for this task.<br />
** @dericed This is exactly what ffprobe/avprobe does. Whereas the many of the digipres tools do identification by sampling x bytes from the head and tail, ffprobe/avprobe incorporate one of the many extensive demuxing libraries to manage identification of the contents.<br />
** @kvanmalssen - Yes, so can we get avprobe to output in a structured way? And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** @dericed - Yes ffprobe/avprobe have the -print_format (-of) option so you can get json, xml, csv, or others. There's also an xsd published for the output. I suppose ffprobe could be incorporated into FITS but not sure if this is an efficient idea. The premise of FITS seems to put all preservation metadata considerations on the container (file format) but in AV collections the codecs and contained bitstreams are far more significant to consider.<br />
** @kvanmalssen - Issue is we need AV support (including track/bitstream support) in these general tools so people can process mixed collections. That's what I'd like to figure out.<br />
And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** See also [[Improving identification methods]], which could perhaps be split into two or three and one of which merged with the above tweet discussion? [[User:Andy Jackson|Andy Jackson]] 15:20, 22 October 2012 (PDT)<br />
* FITS or Tika bugfix marathon (e.g. [https://issues.apache.org/jira/browse/TIKA-539 this one]).<br />
** Perhaps consider refactoring FITS to re-use existing dependency management tools like Maven and apt/yum/etc instead of manual dependency management? [[User:Andy Jackson|Andy Jackson]] 05:16, 23 October 2012 (PDT)<br />
* [[User:Maurice_de_Rooij|TechMaurice]]: Replace container identification function of [https://github.com/openplanets/fido FIDO] using PRONOM container signature.<br />
''Should we take a poll a day in advance to select 2 or 3 projects or should we just let everyone work on whatever proposal they wish?''<br />
<br />
=Preparation TODO=<br />
* GitHub How To<br />
** Set up temporary FITS and/or Tika forks that we can work on?<br />
* Easier signature development tools and/or signature contribution tracking, as outlined in [[Improving format ID coverage]]<br />
* Example file contribution How To document, c.f. [[Collecting format ID test files]]<br />
* Prep Archivematica dev VMs (incl Tika checkout), spin up & grant IPs/SSH to Hackfest participants upon request (Artefactual: [http://artefactual.com/austin-trask.html Austin])</div>Courtney C. Mummahttps://wiki.curatecamp.org/index.php?title=CURATEcamp_24_hour_worldwide_file_id_hackathon_Nov_16_2012&diff=2161CURATEcamp 24 hour worldwide file id hackathon Nov 16 20122012-10-29T17:18:38Z<p>Courtney C. Mumma: /* Background */</p>
<hr />
<div>[[Main Page]] > CURATEcamp iPRES 2012 > CURATEcamp and Open Planets Foundation 24 hour file id hackathon Nov 16 2012<br />
<br />
=Background=<br />
One break-out session at the CURATEcamp iPRES 2012 was affectionately branded "file id confessional" where we commiserated on the state of our file id tools and processes. We also talked about:<br />
<br />
*We can do better job specifying and documenting our file id requirements / use cases<br />
*We're all hooked on that FITS.xml but FITS needs performance optimization ASAP (also, Is Harvard up for extra dev?)<br />
*Apache Tika is very actively supported and useful tool for file id and content extraction. How much of our file id requirements can it in fact cover?<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] use case (see also [http://actionplan.fcla.edu/ DAITSS action plans])<br />
* Jason Scott's "[http://ascii.textfiles.com/archives/3645 Let's Just Solve the Problem]" campaign to boldly catalog as much file format info as possible in the month of November.<br />
* also, CURATEcamp iPres participant Paul Wheatley has since posted: [http://www.openplanetsfoundation.org/blogs/2012-10-19-practitioners-have-spoken-we-need-better-characterisation We Need Better Characterization] as well as link to [http://willsworld.blogs.edina.ac.uk/2012/10/18/online-hack-event/ Online Hack Event]. This led to Twitter discussion between @pjvangarderen @anjacks0n @prwheatley about this 24 hr hackathon event.<br />
<br />
=What=<br />
<br />
24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions. <br />
<br />
Project proposals can be made by anyone.<br />
<br />
We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).<br />
<br />
=When: '''Fri Nov 16'''=<br />
<br />
* Friday, November 16, 2012<br />
** [http://wiki.opf-labs.org/display/KB/2012-11-13+OPF+Hackathon+-+Emulation%2C+learn+from+the+experts OPF Emulation Hackathon] is Nov 13-15. Freiburg, Germany. Sorry, Nov 16th was chosen somewhat haphazardly. We didn't mean to compete with OPF Hackathon event. But emulation needs file characterization too? Maybe OPF Emulation Hackathon can hand off some "File Id for Emulation" use cases to the Nov 16 24 hr Hackathon...or better yet, extend the Freiburg event to include participation in the Nov 16 24 hr worldwide #fileidhack event. Great way to cap off their Hackathon week! --[[User:PeterVG|PeterVG]] 11:48, 23 Oct 2012 (PDT)<br />
* <strike>Friday, November 23, 2012</strike><br />
** RT @declan: @pjvangarderen neat idea! You know that date is the day after US Thanksgiving, right? people might be on vacation<br />
<br />
=How=<br />
* Twitter: [https://twitter.com/search/realtime?q=%2324hrworldwidefileidhack #fileidhack] (made it shorter)<br />
* CURATEcamp Mediawiki: [[Special:UserLogin|Log-in]] and please help update this page<br />
<br />
Let's put together a schedule, tasklist, & volunteers to road-test these tools for Nov 16:<br />
* Google Hangout: [[Google Hangout for CURATEcamp|fire up a webcam]] <br />
* GoogleDocs: we can live edit any docs we feel the urge to produce<br />
* IRC: use existing channel or create one just for event?<br />
* GitHub: get those pull requests going<br />
<br />
=Why=<br />
* Because we'll probably get some useful shit done<br />
* Because its fun to work with CURATEcamp people in a CURATEcamp type of way<br />
* Because doing a 24hr+ worldwide hack with real time collaboration tools is cool<br />
<br />
=Who ([[Special:UserLogin|Sign up]])=<br />
* '''GMT +12:00''' Digital Preservation Practical Implementers Guild (@DP_PIG)<br />
* ?<br />
* '''GMT +7:00''' [[User:Euan_Cochrane|Euan Cochrane]] (@euanc)<br />
* ?<br />
* '''GMT +2:00''' [[User:Maurice_de_Rooij|TechMaurice]] (NANETH)<br />
* '''GMT +1:00''' [[User:Nicholas_Clarke|Nicholas Clarke]] (@nclarkedk) - netarkivet.dk<br />
* '''GMT +0:00''' [[User:Andy_Jackson|Andy Jackson]] (@anjacks0n), Paul Wheatley (@prwheatley), BL digital preservation team - Maureen (@mopennock) PeteC, PeterM, Lynn, William, and maybe more...; [[User:David Underdown|David Underdown]] (@davidunderdown9) and maybe some more TNA folk<br />
* ?<br />
* '''GMT -5:00''' Kara Van Malssen (@kvanmalssen), Dave Rice (@dericed), Ben Fino-Radin (@benfinoradin), Gary McGath (@Garym03062), @anarchivist<br />
* '''GMT -5:00''' @lljohnston @blefurgy et al!<br />
* ?<br />
* '''GMT -8:00''' [http://artefactual.com/team Artefactual]: peter (@pjvangarderen), courtney (@snarkivist), evelyn, joseph, mikeC (@mcantelon), mikeG, austin, dan...plus any VanCity people wanting to participate from [http://artefactual.com/contact.html Artefactual office].<br />
<br />
=Project Proposals=<br />
* Document file id requirements / use cases<br />
* ArchiveTeam "Just Solve the Problem" wiki scraping -> structured data (CSV?, XML?, RDF?); as an ongoing service?<br />
* [[Improving format ID coverage]]<br />
* [[Collecting format ID test files]]<br />
* [[Improving identification methods]]<br />
** Develop a Format ID [http://digitalcontinuity.org/post/7327791836/emulation-workbench-for-digital-object-format-analysis "Emulation Workbench"] for format analysis<br />
** Document software input and output formats to use in limiting the option set for files of a particular time period (if we know all formats that were creatable during a period when a file was created then we can limit results to only those formats), and for use in [http://digitalcontinuity.org/post/7325561455/mining-application-documentation-for-file-format format intelligence mining].<br />
* Archivematica / Tika integration <br />
** @archivematica team & volunteers<br />
* Archivematica [https://www.archivematica.org/wiki/Format_policy_registry_requirements Format Policy Registry] testing<br />
** @archivematica team & volunteers<br />
* @kvanmalssen Improved file id /characterization support for AV files in existing tools like Tika and FITS. An update of Exiftool and inclusion of MediaInfo would be a good start. Or maybe test applicability of ffprobe/avprobe for this task.<br />
** @dericed This is exactly what ffprobe/avprobe does. Whereas the many of the digipres tools do identification by sampling x bytes from the head and tail, ffprobe/avprobe incorporate one of the many extensive demuxing libraries to manage identification of the contents.<br />
** @kvanmalssen - Yes, so can we get avprobe to output in a structured way? And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** @dericed - Yes ffprobe/avprobe have the -print_format (-of) option so you can get json, xml, csv, or others. There's also an xsd published for the output. I suppose ffprobe could be incorporated into FITS but not sure if this is an efficient idea. The premise of FITS seems to put all preservation metadata considerations on the container (file format) but in AV collections the codecs and contained bitstreams are far more significant to consider.<br />
** @kvanmalssen - Issue is we need AV support (including track/bitstream support) in these general tools so people can process mixed collections. That's what I'd like to figure out.<br />
And could it be incorporated in to a tool like FITS or Tika so that we can have a file id tool that supports mixed collections?<br />
** See also [[Improving identification methods]], which could perhaps be split into two or three and one of which merged with the above tweet discussion? [[User:Andy Jackson|Andy Jackson]] 15:20, 22 October 2012 (PDT)<br />
* FITS or Tika bugfix marathon (e.g. [https://issues.apache.org/jira/browse/TIKA-539 this one]).<br />
** Perhaps consider refactoring FITS to re-use existing dependency management tools like DROID and apt/yum/etc instead of manual dependency management? [[User:Andy Jackson|Andy Jackson]] 05:16, 23 October 2012 (PDT)<br />
* [[User:Maurice_de_Rooij|TechMaurice]]: Replace container identification function of [https://github.com/openplanets/fido FIDO] using PRONOM container signature.<br />
''Should we take a poll a day in advance to select 2 or 3 projects or should we just let everyone work on whatever proposal they wish?''<br />
<br />
=Preparation TODO=<br />
* GitHub How To<br />
** Set up temporary FITS and/or Tika forks that we can work on?<br />
* Easier signature development tools and/or signature contribution tracking, as outlined in [[Improving format ID coverage]]<br />
* Example file contribution How To document, c.f. [[Collecting format ID test files]]<br />
* Prep Archivematica dev VMs (incl Tika checkout), spin up & grant IPs/SSH to Hackfest participants upon request (Artefactual: [http://artefactual.com/austin-trask.html Austin])</div>Courtney C. Mumma