Association of Moving Image Archivists & Digital Library Federation Hack Day 2015
>>> When, Where, What time?
- Date: Wednesday, November 18, 2015
- Time: ~9am-5pm (with option of continued work projects throughout the conference in our Developer Lounge, Parlor A)
- Location: Hilton Portland & Executive Tower 921 SW 6th Ave, Portland, OR 97204
- hashtag: #AVhack15
- github account: AMIA Open Source github
- IRC: #curatecamp_avpres_1 If using an IRC client the server is chat.freenode.net, or you can use your browser and connect to webchat.freenode.net. If you are unfamiliar with IRC, take a look at this ☞ brief introduction.
- Code of Conducts: AMIA Code of Conduct and DLF Code of Conduct
- 1 How can I participate?
- 2 What will be the format of the event?
- 3 Judging & presenting projects
- 4 Summary
- 5 Background
- 6 Hack Day Projects
- 6.1 Scripts for using wget that retain organizational metadata
- 6.2 DPX header metadata
- 6.3 Exporting OpenRefine clusters & TIDY: Tool for Improving Data Yourself
- 6.4 Non-verbal search engine
- 6.5 File name generator
- 6.6 MiniDV integration into last year's hackday project, vrecord
- 6.7 Re-equalizing WAVs
- 6.8 PBCore/SIP Comparison
- 6.9 Updated PBCore Validator
- 6.10 ffmprovsr
- 6.11 HACK THE DOCS: OAIS Edit-a-thon
- 6.12 HACK THE DOCS: AMPAS Film & TV Wikipedia Updates
How can I participate?
Sign up! As this will be a highly participatory event, registration is limited to those willing to get their hands dirty, so no onlookers please.
REMOTE PARTICIPATION will be available this year! Check in using the IRC channel and stay tuned to sign up for a project. You can then work offline/independently with your group in whatever way is easiest (video chat, chat, etc).
If you are unsure whether you can or want to participate in the hack day itself, you can still see the results by attending the AMIA closing plenary, where hack day projects will be presented, and the audience will have an opportunity to vote on their favorites.
What will be the format of the event?
In advance of the hack day, project ideas and edit-a-thon topics will be collected through the registration form and the event wiki. In advance of the event, participants will review and discuss submitted project ideas. We’ll then break into groups consisting of technologists and practitioners, and Wikipedia editors, selecting an idea or topic(s) to work on together for the day and (if desired) throughout the duration of the AMIA conference in the developers lounge.
The day itself will be structured something like this. Coffee/tea will be provided. Lunch is on your own.
9am – Welcome, introductions
9:30 - noon - Hacking & doc editing. Coffee and minimal snacks.
Noon-1pm – Lunch on your own.
1 - 4:30 - Hacking & doc editing. Coffee and minimal snacks.
4:30 - 5 - Wrap up.
Judging & presenting projects
At least one representative from each project group needs to be present for the demonstration of Hack Day projects to jurors on Friday, November 20, at 6:45 pm in the Hack Day Lounge (Parlor A), immediately following the Fair Use open session. Judging typically lasts for approximately 1.5 -2 hours. Snacks and drinks are encouraged.
Projects will be presented to attendees in a session on Saturday, 11:30am-12:00pm in Galleria South, where the jury will announce winners and attendees will vote on their favorite project.
In association with the annual conference, the Association of Moving Image Archivists will host its 3nd annual hack day on November 18th in Portland, OR. The event will be a unique opportunity for practitioners and managers of digital audiovisual collections to join with developers and engineers for an intense day of collaboration to develop solutions for digital audiovisual preservation and access. This year, we will be holding a concurrent Wikipedia Edit-a-thon for those interested in adding to knowledge pool about audiovisual preservation and access. It will be fun and practical.
AMIA is once again thrilled to partner with the Digital Library Federation in organizing the hack day.
What if I’m not a developer?
Content managers and preservation practitioners are as central to the success of the event as having keen developers. YOU will be responsible for setting the agenda and the outcomes. The goal is to foster collaboration between audiovisual preservation specialists and technologists, to solve problems together and share expertise.
There is also a HACK THE DOCS stream, which includes a Wikipedia Edit-a-thon, creating or updating tool documentation, or reviewing and improving policy or procedural documentation. So even if you're not a developer, nor feel compelled to lend your digital preservation ideas to software and code development, you can contribute to creating new or updated content for the benefit of our community! You can read all about Wikipedia edit-a-thon events here.
What is a hack day?
A hack day or hackathon is an event that brings together computer technologists and practitioners for an intense period of problem solving through computer programming. Within digital preservation and curation communities, hack days provide an opportunity for archivists, collection managers, and others to work together with technologists to develop software solutions for digital collections management needs. Hack days have been held independently by groups such as the Open Planets Foundation, as well as in association with preservation and access oriented conferences including Open Repositories and Museums and the Web.
The manifesto of a recent event at the Open Repositories conference framed the benefits this way: “Transparent, fun, open collaboration in diversely constituted teams...The creation of new professional networks over the ossification of old ones. Effective engagement of non-developers (researchers, repository managers) in development...Work done at the conference over presentation of something prepared earlier.”
- Transparent, fun, open collaboration in diversely constituted teams over individual brilliance and/or groups of like individuals in cut-throat competition.
- The creation of new professional networks over the ossification of old ones
- Effective engagement of non-developers (researchers, repository managers) in development over purely developer driven projects.
- Work done at the conference over presentation of something prepared earlier (meaning not working on a project you a working on during your day job)
Hack Day Projects
Please update this project list with your TEAM NAME and summary project.
Below are loose ideas for projects, drawn from the initial suggestions of registrants. If you have a new project idea, or are interested in one of the project stubs below, sign up for a wiki login and add your thoughtful comments or possible starting points to the proposal, or contact the proposer via twitter.
====TMS mySQL to RDF mapping====
I would like to try and map then transform MySQL tables (hopefully TMS but another could be used) into RDF triples, documenting what steps need to be taken, what information is required to make a triple, what RDF format (turtle, XML etc) might be the easiest to use, etc.
- Charlotte Johnson / Jessica Storm
Currently we are scanning a lot of audio reels, doing minimal clean up and ingesting into our DAM system. The files are delivered by title, in folders with 5-15 .wav files on average.
We're in need of a script, or tool that can help us validate audio .wav files, as well as help us expedite prepping the material for ingest. Right now, all we can do is run an MD5 and compare that with what the vendor sent us. We'd like to take it a step further, but opening and checking every file individually is out of the question.
Ideally the script/tool would:
-Be executionable at a folder level -Be able to exclude certain file types (i.e., MD5, ptx, pdf..) -Make sure each file is playable -Look for drop outs and/or pops in each wav -Get the sample rate / kHz & bit rate of each wav -Output a "QC" report with results and metadata -create MD5s for each wav (and possibly compare that MD5s with provided MD5)
Any help we can get quickly validating that the content is "good" would be helpful. The tools we've experimented using, and scripts we've tried, just aren't doing what we need.
Scripts for using wget that retain organizational metadata
-@DaleLore Developing a script that uses the wget command line to archive webpages and includes important metadata for organization.
DPX header metadata
Our project is about better editing/writing of DPX header metadata. We would like to focus on:
- enabling more complete metadata editing of all header fields in the DPX standard. Commercially available tools like Pomfort DPX Header Editor do not have this ability.
- batch editing of these header fields
- enable embedding of individual file checksum values into a/some header fields of those individual DPX files
We think this would be a great project for any archive working (or planning on working) with DPX files. Hopefully the project could be the beginnings of a tool such as BWF MetaEdit
— @hbmcd4 @jasmynrc
Exporting OpenRefine clusters & TIDY: Tool for Improving Data Yourself
GROUP: Data Detox (the makers of TIDY) Kathryn Gronsbell | Cora Johnson-Roberson | Michelle Roell | Caleb Sayan
OBJECTIVE: Export OpenRefine clusters for review and normalisation opportunities
OVERVIEW: Add export cluster feature to OpenRefine to work with suggested data outside of the system. Create graphic representation of suggested cluster reconciling opportunities using the foundational JSON export. Harnesses the power of the OpenRefine algorithms for whatever you need! This would allow you to review the recommended clusters and be better informed / make decisions when choosing between terms from a messy data set. The output allows you to manipulate the data however you want or need, so that you can move through all "similar" data according to OpenRefine's super powerful algorithms. We:
- Project Name
- Column Name
- Keying function
as output filename formatted: clusters_[projectname]_[columnname]_[timedate].json
Screenshot of added export button: https://drive.google.com/file/d/0B7Vrvqrwpk98c1k3dzRoSDBSRUE/view?usp=sharing
Non-verbal search engine
Using icons from the Noun project website to create a non verbal icon based search engine. - @textilehive
File name generator
I see pretty awful file names for projects and there must be a way to build a small tool to help suggest a consistent, clear naming convention project by project. - Michelle Roell
MiniDV integration into last year's hackday project, vrecord
Being able to extend vrecord's capabilities beyond BlackMagic by adding support for firewire-based media connections would help on-the-fly Mac-based migration stations. A caveat to this being a functional project is that we'd need a working MiniDV deck and I'm not willing to tote one across the country with me. But I love you. Kinda. — @ablwr
I was recently given a Windows application that re-equalizes wav files that were transferred from their original magnetic audio carriers at the wrong speed (sometimes necessary). It's 32-bit only, though, so I can't use it. Came with the original Forth source code, so my project idea would be to either make a 64-bit version (which I could probly learn on my own) or port it to something more, uh, widely used so that maybe it could find a wider audience. Originally developed by Jay McKnight, formerly of Ampex, now Magnetic Reference Labs.
A PBCore/SIP comparison. Basically using a PBCore XML record to build a manifest of files and compare the technical data of the metadata record with the actual files. This could use MediaInfo on files mentioned listed in the XML. - Henry Borchers
Updated PBCore Validator
Continue the work on last year's PBcore Tools project, incorporating PBCore 2.1 and improving documentation and usability
Team Name: ffmprovsr / ffmpegged
Project Objective: To facilitate better understanding of ffmpeg through collaborative sharing of useful scripts and detailed flag-level description of how each script works so archivists can copy-paste and produce their own scripts but also understand how and why they work.
OK OK OK, just putting out some feelers here. I worked on ffmpeg documentation  during the first hack day and last year I hastily built an app that exists as a guide/command line generator for ffmpeg  and I think it'd be fun to combine and continue to build up these two projects into something better because ffmpeg continues to live on as a mysterious but necessary component of a/v archival practice. This project would be mostly R&D with some basic front-end web development skills (building forms). I feel this is a little out of the scope of hack day (and those greedy for rewards may seek refuge elsewhere) in that it's more of a REMIX project and a mostly-hack-the-docs-with-some-coding project, but if there is interest (there was last year, for ffmprovisr) -- we will build the hell outta this! -- @ablwr
Members; Ashley Blewer, Rebecca Fraimow, Rebecca, Reto Kromer, Jonathan Farbowitz, Catroina, Ben Turkus, Kelly Haydon, Sam-the-DPF-winna, Eddy Colloton, Nicole Martin
HACK THE DOCS: OAIS Edit-a-thon
Team Name: OIA-YES
Project Objective: To review and evaluate (from an AV perspective) the OAIS Reference Model with the aim to contribute to the DPC-hosted wiki evaluating it. We identify areas of the standard that could be improved upon, clarified, or expanded to better encompass moving images and other complex digital objects.
Project Proposal: I'd like to propose an OAIS review/revise-a-thon for Hack Day, wherein a group of us could contribute to the OAIS Community Forum Wiki hosted by the DPC.
Paul Wheatley has been updating the active topics for discussion page, and this is a great place for any participants to get involved, and respond to some of the latest proposals and points for discussion. They can of course also start new topics and add them here:
Since its approval in 2002 as an ISO standard (14721), the OAIS reference model has become a--if not the--foundational text for the majority of digital preservation research and resource development. Since then the digital preservation community has grown significantly, sparking an expanded understanding of what precisely constitutes "digital preservation". The Digital Preservation Coalition has responded to these shifts by issuing an open to call to "review and reform" the OAIS standard in advance of its upcoming ISO review in 2017. The opportunity to contribute to this process presents us with a unique opportunity to ensure our voices/concerns heard as moving image archivists and make an impact on OAIS' next iteration.
-- I'm interested in participating in this project (remotely): @kvanmalssen
Members; Erwin Verbruggen, Julia Kim
HACK THE DOCS: AMPAS Film & TV Wikipedia Updates
— Michelle Roell
Edit-a-thon specifically about Film/TV at AMPAS. Perhaps some of these topics are relevant?