Association of Moving Image Archivists & Digital Library Federation Hack Day 2015

From CURATEcamp
Revision as of 22:38, 28 November 2015 by Ashley Blewer! (talk | contribs) (ffmprovsr)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

PROJECT SIGN UP: https://docs.google.com/spreadsheets/d/1xLw1KHVW5fD2_p4Jx1hf758NIy8DVUqwX6YQGUU_aAM/edit?usp=sharing

>>> When, Where, What time?

  • Date: Wednesday, November 18, 2015
  • Time: ~9am-5pm (with option of continued work projects throughout the conference in our Developer Lounge, Parlor A)
  • Location: Hilton Portland & Executive Tower 921 SW 6th Ave, Portland, OR 97204
  • hashtag: #AVhack15
  • github account: AMIA Open Source github
  • IRC: #curatecamp_avpres_1 If using an IRC client the server is chat.freenode.net, or you can use your browser and connect to webchat.freenode.net. If you are unfamiliar with IRC, take a look at this ☞ brief introduction.
  • Code of Conducts: AMIA Code of Conduct and DLF Code of Conduct


How can I participate?

Sign up! As this will be a highly participatory event, registration is limited to those willing to get their hands dirty, so no onlookers please.

REMOTE PARTICIPATION will be available this year! Check in using the IRC channel and stay tuned to sign up for a project. You can then work offline/independently with your group in whatever way is easiest (video chat, chat, etc).

If you are unsure whether you can or want to participate in the hack day itself, you can still see the results by attending the AMIA closing plenary, where hack day projects will be presented, and the audience will have an opportunity to vote on their favorites.

What will be the format of the event?

In advance of the hack day, project ideas and edit-a-thon topics will be collected through the registration form and the event wiki. In advance of the event, participants will review and discuss submitted project ideas. We’ll then break into groups consisting of technologists and practitioners, and Wikipedia editors, selecting an idea or topic(s) to work on together for the day and (if desired) throughout the duration of the AMIA conference in the developers lounge.

The day itself will be structured something like this. Coffee/tea will be provided. Lunch is on your own.

9am – Welcome, introductions

9:30 - noon - Hacking & doc editing. Coffee and minimal snacks.

Noon-1pm – Lunch on your own.

1 - 4:30 - Hacking & doc editing. Coffee and minimal snacks.

4:30 - 5 - Wrap up.

Judging & presenting projects

At least one representative from each project group needs to be present for the demonstration of Hack Day projects to jurors on Friday, November 20, at 6:45 pm in the Hack Day Lounge (Parlor A), immediately following the Fair Use open session. Judging typically lasts for approximately 1.5 -2 hours. Snacks and drinks are encouraged.

Projects will be presented to attendees in a session on Saturday, 11:30am-12:00pm in Galleria South, where the jury will announce winners and attendees will vote on their favorite project.

Summary

In association with the annual conference, the Association of Moving Image Archivists will host its 3nd annual hack day on November 18th in Portland, OR. The event will be a unique opportunity for practitioners and managers of digital audiovisual collections to join with developers and engineers for an intense day of collaboration to develop solutions for digital audiovisual preservation and access. This year, we will be holding a concurrent Wikipedia Edit-a-thon[1] for those interested in adding to knowledge pool about audiovisual preservation and access. It will be fun and practical.

AMIA is once again thrilled to partner with the Digital Library Federation in organizing the hack day.

What if I’m not a developer?

Content managers and preservation practitioners are as central to the success of the event as having keen developers. YOU will be responsible for setting the agenda and the outcomes. The goal is to foster collaboration between audiovisual preservation specialists and technologists, to solve problems together and share expertise.

There is also a HACK THE DOCS stream, which includes a Wikipedia Edit-a-thon, creating or updating tool documentation, or reviewing and improving policy or procedural documentation. So even if you're not a developer, nor feel compelled to lend your digital preservation ideas to software and code development, you can contribute to creating new or updated content for the benefit of our community! You can read all about Wikipedia edit-a-thon events here.

Background

What is a hack day?

A hack day or hackathon is an event that brings together computer technologists and practitioners for an intense period of problem solving through computer programming. Within digital preservation and curation communities, hack days provide an opportunity for archivists, collection managers, and others to work together with technologists to develop software solutions for digital collections management needs. Hack days have been held independently by groups such as the Open Planets Foundation, as well as in association with preservation and access oriented conferences including Open Repositories and Museums and the Web.

The manifesto of a recent event at the Open Repositories conference framed the benefits this way: “Transparent, fun, open collaboration in diversely constituted teams...The creation of new professional networks over the ossification of old ones. Effective engagement of non-developers (researchers, repository managers) in development...Work done at the conference over presentation of something prepared earlier.”

Our Manifesto

Manifesto:

  • Transparent, fun, open collaboration in diversely constituted teams over individual brilliance and/or groups of like individuals in cut-throat competition.
  • The creation of new professional networks over the ossification of old ones
  • Effective engagement of non-developers (researchers, repository managers) in development over purely developer driven projects.
  • Work done at the conference over presentation of something prepared earlier (meaning not working on a project you a working on during your day job)

Hack Day Projects

Please update this project list with your TEAM NAME and summary project.

Below are loose ideas for projects, drawn from the initial suggestions of registrants. If you have a new project idea, or are interested in one of the project stubs below, sign up for a wiki login and add your thoughtful comments or possible starting points to the proposal, or contact the proposer via twitter.

====TMS mySQL to RDF mapping==== I would like to try and map then transform MySQL tables (hopefully TMS but another could be used) into RDF triples, documenting what steps need to be taken, what information is required to make a triple, what RDF format (turtle, XML etc) might be the easiest to use, etc. – @laurensx

====QC Script==== - Charlotte Johnson / Jessica Storm Currently we are scanning a lot of audio reels, doing minimal clean up and ingesting into our DAM system. The files are delivered by title, in folders with 5-15 .wav files on average.

We're in need of a script, or tool that can help us validate audio .wav files, as well as help us expedite prepping the material for ingest. Right now, all we can do is run an MD5 and compare that with what the vendor sent us. We'd like to take it a step further, but opening and checking every file individually is out of the question.

Ideally the script/tool would:

-Be executionable at a folder level -Be able to exclude certain file types (i.e., MD5, ptx, pdf..) -Make sure each file is playable -Look for drop outs and/or pops in each wav -Get the sample rate / kHz & bit rate of each wav -Output a "QC" report with results and metadata -create MD5s for each wav (and possibly compare that MD5s with provided MD5)


Any help we can get quickly validating that the content is "good" would be helpful. The tools we've experimented using, and scripts we've tried, just aren't doing what we need.

Scripts for using wget that retain organizational metadata

-@DaleLore Developing a script that uses the wget command line to archive webpages and includes important metadata for organization.

DPX header metadata

Our project is about better editing/writing of DPX header metadata. We would like to focus on:

    • enabling more complete metadata editing of all header fields in the DPX standard. Commercially available tools like Pomfort DPX Header Editor do not have this ability.
    • batch editing of these header fields
    • enable embedding of individual file checksum values into a/some header fields of those individual DPX files

We think this would be a great project for any archive working (or planning on working) with DPX files. Hopefully the project could be the beginnings of a tool such as BWF MetaEdit

— @hbmcd4 @jasmynrc

Exporting OpenRefine clusters & TIDY: Tool for Improving Data Yourself

OpenRefine with TIDY add-on (Mac release) available here

TIDY beta visualisation here


GROUP: Data Detox (the makers of TIDY) Kathryn Gronsbell | Cora Johnson-Roberson | Michelle Roell | Caleb Sayan

OBJECTIVE: Export OpenRefine clusters for review and normalisation opportunities

OVERVIEW: Add export cluster feature to OpenRefine to work with suggested data outside of the system. Create graphic representation of suggested cluster reconciling opportunities using the foundational JSON export. Harnesses the power of the OpenRefine algorithms for whatever you need! This would allow you to review the recommended clusters and be better informed / make decisions when choosing between terms from a messy data set. The output allows you to manipulate the data however you want or need, so that you can move through all "similar" data according to OpenRefine's super powerful algorithms. We:

-Added “Export Clusters” to Cluster & Edit screen in OpenRefine project. This feature exports suggested cluster information (JavaScript objects) as JSON includes:

    • Project Name
    • Column Name
    • Timestamp
    • Method
    • Keying function
    • JavaScript objects (clusters)

as output filename formatted: clusters_[projectname]_[columnname]_[timedate].json


Screenshot of added export button: https://drive.google.com/file/d/0B7Vrvqrwpk98c1k3dzRoSDBSRUE/view?usp=sharing

Sample cluster export: https://www.dropbox.com/s/uedgsw3t4gc7857/clusters_OHHLA_artist_2015-11-18T19-28-08.412Z.json?dl=0


scratch/working doc

Non-verbal search engine

Using icons from the Noun project website to create a non verbal icon based search engine. - @textilehive

File name generator

I see pretty awful file names for projects and there must be a way to build a small tool to help suggest a consistent, clear naming convention project by project. - Michelle Roell

MiniDV integration into last year's hackday project, vrecord

Being able to extend vrecord's capabilities beyond BlackMagic by adding support for firewire-based media connections would help on-the-fly Mac-based migration stations. A caveat to this being a functional project is that we'd need a working MiniDV deck and I'm not willing to tote one across the country with me. But I love you. Kinda. — @ablwr

Re-equalizing WAVs

I was recently given a Windows application that re-equalizes wav files that were transferred from their original magnetic audio carriers at the wrong speed (sometimes necessary). It's 32-bit only, though, so I can't use it. Came with the original Forth source code, so my project idea would be to either make a 64-bit version (which I could probly learn on my own) or port it to something more, uh, widely used so that maybe it could find a wider audience. Originally developed by Jay McKnight, formerly of Ampex, now Magnetic Reference Labs. — @CoatesBrendan

PBCore/SIP Comparison

A PBCore/SIP comparison. Basically using a PBCore XML record to build a manifest of files and compare the technical data of the metadata record with the actual files. This could use MediaInfo on files mentioned listed in the XML.  - Henry Borchers

Updated PBCore Validator

Continue the work on last year's PBcore Tools project, incorporating PBCore 2.1 and improving documentation and usability

--@rhfraim

ffmprovsr

Team Name: ffmprovsr / ffmpegged

Project Objective: To facilitate better understanding of ffmpeg through collaborative sharing of useful scripts and detailed flag-level description of how each script works so archivists can copy-paste and produce their own scripts but also understand how and why they work.

Project Proposal:

OK OK OK, just putting out some feelers here. I worked on ffmpeg documentation [2] during the first hack day and last year I hastily built an app that exists as a guide/command line generator for ffmpeg [3] and I think it'd be fun to combine and continue to build up these two projects into something better because ffmpeg continues to live on as a mysterious but necessary component of a/v archival practice. This project would be mostly R&D with some basic front-end web development skills (building forms). I feel this is a little out of the scope of hack day (and those greedy for rewards may seek refuge elsewhere) in that it's more of a REMIX project and a mostly-hack-the-docs-with-some-coding project, but if there is interest (there was last year, for ffmprovisr) -- we will build the hell outta this! -- @ablwr

Members; Ashley Blewer, Rebecca Fraimow, Rebecca Dillmeier, Reto Kromer, Jonathan Farbowitz, Catriona Schlosser, Ben Turkus, Kelly Haydon, Samuel Gutterman, Eddy Colloton, Nicole Martin

HACK THE DOCS: OAIS Edit-a-thon

— @shirapeltzman

Team Name: OIA-YES

Project Objective: To review and evaluate (from an AV perspective) the OAIS Reference Model with the aim to contribute to the DPC-hosted wiki evaluating it. We identify areas of the standard that could be improved upon, clarified, or expanded to better encompass moving images and other complex digital objects.


Project Proposal: I'd like to propose an OAIS review/revise-a-thon for Hack Day, wherein a group of us could contribute to the OAIS Community Forum Wiki hosted by the DPC.

Paul Wheatley has been updating the active topics for discussion page, and this is a great place for any participants to get involved, and respond to some of the latest proposals and points for discussion. They can of course also start new topics and add them here:

http://wiki.dpconline.org/index.php?title=Active_Topics_and_News

Since its approval in 2002 as an ISO standard (14721), the OAIS reference model has become a--if not the--foundational text for the majority of digital preservation research and resource development. Since then the digital preservation community has grown significantly, sparking an expanded understanding of what precisely constitutes "digital preservation". The Digital Preservation Coalition has responded to these shifts by issuing an open to call to "review and reform" the OAIS standard in advance of its upcoming ISO review in 2017. The opportunity to contribute to this process presents us with a unique opportunity to ensure our voices/concerns heard as moving image archivists and make an impact on OAIS' next iteration.

Members: Shira Peltzman,Erwin Verbruggen, Julia Kim

-- I'm interested in participating in this project (remotely): @kvanmalssen

HACK THE DOCS: AMPAS Film & TV Wikipedia Updates

— Michelle Roell

Edit-a-thon specifically about Film/TV at AMPAS. Perhaps some of these topics are relevant?