CURATEcamp 24 hour worldwide file id hackathon Nov 16 2012

From CURATEcamp
Revision as of 17:54, 20 October 2012 by PeterVG (talk | contribs) (How)
Jump to: navigation, search

Main Page > CURATEcamp iPRES 2012 > CURATEcamp 24 hour file id hackathon Nov 16 2012

Background

One break-out session at the CURATEcamp iPRES 2012 was affectionately branded "file id confessional" where we commiserated on the state of our file id tools and processes. We also talked about:

  • We can do better job specifying and documenting our file id requirements / use cases
  • We're all hooked on that FITS.xml but FITS needs performance optimization ASAP (also, Is Harvard up for extra dev?)
  • Apache Tika is very actively supported and useful tool for file id and content extraction. How much of our file id requirements can it in fact cover?
  • Archivematica Format Policy Registry use case
  • Jason Scott's "Let's Just Solve the Problem" campaign to boldly catalog as much file format info as possible in the month of November.
  • also, CURATEcamp iPres participant Paul Wheatley has since posted: We Need Better Characterization which led to Twitter discussion between @pjvangarderen @anjacks0n @prwheatley about this hackathon event.

What

24hour+ live hackathon event where multi-time zone teams work on common technical projects related to the CURATEcamp iPres 2012 file id discussions.

Project proposals can be made by anyone.

We will start the day with New Zealand (GMT +12:00) and end with North America West Coast wrapping up project(s), hopefully with one or two solid deliverables by 12 midnight-ish PST (GMT -8:00).

When

  • Friday, November 16, 2012
  • Friday, November 23, 2012
    • RT @declan: @pjvangarderen neat idea! You know that date is the day after US Thanksgiving, right? people might be on vacation

How

Let's put together a schedule, tasklist, & volunteers to road-test these tools for Nov 16:

  • Google Hangout: fire up a webcam
  • GoogleDocs: we can live edit any docs we feel the urge to produce
  • IRC: use existing channel or create one just for event?
  • GitHub: get those pull requests going

Why

  • Because we'll probably get some useful shit done
  • Because its fun to work with CURATEcamp people in a CURATEcamp type of way
  • Because doing a 24hr+ worldwide hack with real time collaboration tools is cool

Who (Sign up)

  • GMT +12:00 Euan Cochrane (@euanc)
  •  ?
  • GMT +0:00 Andy Jackson (@anjacks0n), Paul Wheatley (@prwheatley)
  • GMT -5:00 Kara Van Malssen (@kvanmalssen)
  •  ?
  • GMT -8:00 Artefactual: peter (@pjvangarderen), courtney (@snarkivist), evelyn, joseph, mikeC (@mcantelon), mikeG, austin, dan...plus any VanCity people wanting to participate from Artefactual office.

Project Proposals

  • Document file id requirements / use cases
  • ArchiveTeam "Just Solve the Problem" wiki scraping -> structured data (CSV?, XML?, RDF?); as an ongoing service?
  • Tika test cases
  • Tika signature enhancements
    • RT @anjacks0n: @pjvangarderen @prwheatley Will spend time beforehand on means to pool and test new signatures, and track progress.
  • Archivematica / Tika integration
  • Archivematica Format Policy Registry testing
  • RT @prwheatley: @anjacks0n @pjvangarderen We should be as inclusive as possible, so File, Droid or Tika as lined up at bottom here
    • RT @anjacks0n: @prwheatley @pjvangarderen Unfortunately, making and testing DROID magic is very difficult. The sig you submit is not that you test with.
    • RT @beet_keeper: @anjacks0n @prwheatley @pjvangarderen Do you know about the DROID signature development utility: GTRI submiited their...
    • RT @beet_keeper: @anjacks0n @prwheatley @pjvangarderen work with supporting XML output from this and it made testing at TNA end easier. It's quite useful.
    • RT @anjacks0n: @beet_keeper @prwheatley @pjvangarderen thanks, that helps. Will re-read Jay's guide and see if I can help collect and test new sigs.


Should we take a poll a day in advance to select 2 or 3 projects or should we just let everyone work on whatever proposal they wish?

Preparation TODO

  • GitHub How To
  • Prep Archivematica dev VMs (incl Tika checkout), spin up & grant IPs/SSH to Hackfest participants upon request (Artefactual: Austin)