AMIA/DLF Hack Day 2016!!!

SIGN UP! to participate!

>>> When, Where, What time?

Date: Wednesday, November 9, 2016
Time: ~9am-5pm (with option of continued work projects throughout the conference in our Hack Day Lounge, Riverboat room on the William Penn Level)

---Fri, November 11 at 6:30pm (present projects to judges in Riverboat)

---Sat, November 12 at 4:45pm (prizes! in Allegheny)

Location: Omni William Penn, Pittsburgh, PA
hashtag: #AVhack16
github account: AMIA Open Source github
Project sign up
NEW Slack channel this year for project collaboration & remote participation: avhack.slack.com. If you aren't already signed up and want to be, send an email to kara at avpreserve dot com.
IRC: #curatecamp_avpres_1 If using an IRC client the server is chat.freenode.net, or you can use your browser and connect to webchat.freenode.net. If you are unfamiliar with IRC, take a look at this ☞ brief introduction.
Code of Conducts: AMIA Code of Conduct and DLF Code of Conduct. Review the Codes of Conduct and please direct any questions about them to the event organizers, or use the #AVHack16 hashtag.

How do I sign up/participate?

Sign up! As this will be a highly participatory event, registration is limited to those willing to get their hands dirty, so no onlookers please.

REMOTE PARTICIPATION will be available this year! Check in using the IRC channel and stay tuned to sign up for a project. You can then work offline/independently with your group in whatever way is easiest (video chat, chat, etc).

If you are unsure whether you can or want to participate in the hack day itself, you can still see the results by attending the AMIA closing plenary, where hack day projects will be presented, and the audience will have an opportunity to vote on their favorites.

What will be the format of the event?

In advance of the hack day, project ideas and edit-a-thon topics will be collected through the registration form and the event wiki. In advance of the event, participants will review and discuss submitted project ideas. We’ll then break into groups consisting of technologists and practitioners, and Wikipedia editors, selecting an idea or topic(s) to work on together for the day and (if desired) throughout the duration of the AMIA conference in the developers lounge.

The day itself will be structured something like this. Coffee/tea will be provided. Lunch is on your own.

9am – Welcome, introductions

9:30 - noon - Hacking & doc editing. Coffee and minimal snacks.

Noon-1pm – Lunch on your own.

1 - 4:30 - Hacking & doc editing. Coffee and minimal snacks.

4:30 - 5 - Wrap up.

Judging & presenting projects

At least one representative from each project group needs to be present for the demonstration of Hack Day projects to jurors on TIME & DATE TK. Judging typically lasts for approximately 1.5 -2 hours. Snacks and drinks are encouraged.

Projects will be presented to attendees in a session on TIME & DATE TK, where the jury will announce winners and attendees will vote on their favorite project.

Summary

In association with the annual conference, the Association of Moving Image Archivists will host its 4th annual hack day on November 9h in Portland, OR. The event will be a unique opportunity for practitioners and managers of digital audiovisual collections to join with developers and engineers for an intense day of collaboration to develop solutions for digital audiovisual preservation and access. This year, we will be holding a concurrent Wikipedia Edit-a-thon[1] for those interested in adding to knowledge pool about audiovisual preservation and access. It will be fun and practical.

AMIA is once again thrilled to partner with the Digital Library Federation in organizing the hack day.

What if I’m not a developer?

Content managers and preservation practitioners are as central to the success of the event as having keen developers. YOU will be responsible for setting the agenda and the outcomes. The goal is to foster collaboration between audiovisual preservation specialists and technologists, to solve problems together and share expertise.

Background

What is a hack day?

A hack day or hackathon is an event that brings together computer technologists and practitioners for an intense period of problem solving through computer programming. Within digital preservation and curation communities, hack days provide an opportunity for archivists, collection managers, and others to work together with technologists to develop software solutions for digital collections management needs. Hack days have been held independently by groups such as the Open Planets Foundation, as well as in association with preservation and access oriented conferences including Open Repositories and Museums and the Web.

The manifesto of a recent event at the Open Repositories conference framed the benefits this way: “Transparent, fun, open collaboration in diversely constituted teams...The creation of new professional networks over the ossification of old ones. Effective engagement of non-developers (researchers, repository managers) in development...Work done at the conference over presentation of something prepared earlier.”

Our Manifesto

Manifesto:

Transparent, fun, open collaboration in diversely constituted teams over individual brilliance and/or groups of like individuals in cut-throat competition.
The creation of new professional networks over the ossification of old ones
Effective engagement of non-developers (researchers, repository managers) in development over purely developer driven projects.
Work done at the conference over presentation of something prepared earlier (meaning not working on a project you are working on during your day job)

Hack Day Projects

Please update this list with your name and summary of a project idea / problem you'd like to solve. Below are loose ideas for projects so far. If you have a new project idea, or are interested in one of the project stubs below, sign up for a wiki login and add your thoughtful comments or possible starting points to the proposal, or contact the proposer via twitter.

Need inspiration? Check out last year's projects.

DOCUMENTATION: WikiData for Digital Preservation

🏆 2016 WINNER of BEST FULFILLMENT OF HACK DAY MANIFESTO!

Join the effort to make WikiData useful for Digital Preservation by using Wikidata to describe software, file formats, virtual and emulated environments for computing, and hardware that is virtualized or emulated in those environments.

Project: Data Curation File Formats

Inspiration: Wikidata as a digital preservation knowledgebase

Here is a link to the WikiProject about file formats: https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/File_formats

PRONOM to wikidata property mapping: https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/File_formats/PRONOM

This query [2] gives us 905 file types and we could just filter out those with audio* video* as part of their mime type

Participants

Shira Peltzman (speltzman)
Sarah Romkey (sromkey)
Erwin Verbruggen (verwinv)
Kate Barbera (kmbarbera)
Jana Grazley (Jgrazley)
Katherine Thornton - remote (YULdigitalpreservation)

Results

Google doc with more working notes & links [3]

We decided that the right place to start would be to compare the data points on the best known/used format registries (namely PRONOM and LoCFDD) and crosswalk them between each other and the existing format descriptions in WikiData. Our efforts are being added to the WikiData File Formats project pages but are also captured in this Google Doc: [4]

There is a process in place for suggesting new properties in WikiData, so what we have done is made recommendations that the larger project group can consider. For example, some existing properties map well to data kept in PRONOM or LoCFDD, while others could be added. Some we felt are out of scope for the resource like WikiData.

DOCUMENTATION: Loggr | Artifact Logging Environmental Scan/Recommendations

-- Kathryn Gronsbell (h/t to Kelly Haydon for idea), combined with Charles Hosale's AVAA Idea | Standards, literature, and resource review (and recommendations?) for A/V artifact logging language and format. Documentation variations are extreme between organizations, vendors, and even QA/QC workflows within organizations. Possibility to add to the AV Artifact Atlas

Team Loggr includes: Kathryn Gronsbell, Charles Hosale, Savannah Campbell, Ethan Gates, Kristin MacDonough, Erica Titkemeyer, Ben Turkus. Special thanks to Kelly Haydon, Lisa Barrier.

Team Loggr is surveying anonymized audiovisual artifact logging samples from vendors and collecting organizations that represent how vastly inconsistent error and artifact metadata can be. We will create a template for logging artifacts that outputs simple, clear delimited data which sources vocabularies from community resources like the AV Artifact Atlas. This data could be used to prioritize QC procedures.

Problem: Inconsistent practices for logging artifacts in audiovisual material. Unable to parse (and therefore report) on errors and artifacts across collections. We gathered sample QC and transfer records and separated artifact, frequency, severity, and duration language to create a prototype.

Prototype: Basic schema and template for creating delimited, parseable records (e.g. in CSV) that represent the basic profile of an audiovisual artifact. This includes:

- Artifact type (name)
- Artifact severity (how much the issue impacts the clarity of the signal)
- Artifact frequency (how often the issue occurs in a given file)

The simplified Loggr template recommendation aims to:

- use authoritative vocabularies to describe artifact characteristics (e.g. AV Artifact Atlas)
- increase clarity of communication between metadata source and reviewer
- streamline delimited information into parseable fields

Expected outcome: By adopting the 3-element model, organizations could use the resulting information to prioritize quality control and assurance procedures. For example, a file with the same artifact type (“Ghosting”) could have a different priority rating depending on the severity and frequency of the artifact.

DRAFT ARTIFACT SCHEMA

Artifact

Type <controlled list>

Severity <numeric scale>

Frequency <controlled list>

Possible future steps

1. Expand Loggr vocabulary on AV Artifact Atlas (create dynamic feed?) 2. Create simplified submission process for AV Artifact Atlas 3. Aid easy identification of video artifacts by streamlining/bolstering the AV Artifact Atlas according to identified needs:

- Choose hosting/repository for submitted sample files
- Identify place where tasks can be assigned or logged
- Identify best mechanism to review and discuss sample files
- Enable faceted search of non-jargon keywords to create alternate access point for general audiences

Linked Film Description Framework

Participants:

Edward Anderson @anderson_edw
David Newbury @workergnome
Kara Van Malssen @kvanmalssen

Description: A linked open data driven web resource for facilitating retrieval of descriptive content about film titles from a variety of web resources. Descriptions include synopses, reviews, and marketing materials. This site is intended to support user stories such as: "As a content provider, I would like to find a context appropriate description of a film based on my user's need and platform."

Features include:

Autocomplete search on film titles
Return tombstone metadata for each film including title, director, year of release, country of origin
Using resources about a given film listed in WikiData, extract and display relevant descriptions in real time
Display the type (e.g. Synopsis, Review) and source (e.g. IMDB) for each description
Display descriptions in multiple languages
Dynamically generate a linked data graph of results, using schema.org and other standard vocabularies

This demo instance is using a test dataset from the British Film Institute.

In the future, additional functionality can be built into the tool to display descriptions appropriate to specific platforms and devices (e.g. short description for Apple TV).

Checksumthing: A Checksum Crosswalk Python Script #checksumthing

🏆 2016 WINNER of BEST SOLUTION TO THE STATED PROBLEM!

--Proposed by Morgan Morel | Ultimate checksum script! A crosswalk for different kinds of checksum sidecar files. Github repo for the project is at checksumthing

Project Summary

Different pieces of software used to produce checksums create sidecar files (typically files ending in .md5 or .sha1) with very different formatting, which creates headaches for archivists—we love standards and loathe disorder. Checksumthing is a python script that can solve this problem by allowing users transform the data inside sidecar files into a standardized format most convenient for them. For example, you can

Append text before or after the checksum value (like the path to the file or the filename)
Change the checksum text to all caps or all lowercase
Search for checksum files and transform files through a nested directory structure

Ceecksumthing currently supports MD5, SHA1, and SHA256 checksums. Right now the program only supports plaintext sidecar files. In the future, we hope to support CSV and other types of files.

Project Team Members

Morgan Morel
Jonathan Farbowitz
Joshua Ng
Henry Borchers
Reto Kromer
Crystal Sanchez

Project Hopper

See a project that you want to work on? The projects below have been suggested by folks but have no current captain. They are up for grabs!

NLE Color Correction Presets

--Dino Everett | Shareable Color Correction Presets for Adobe Premiere/Avid/Final Cut, etc for red faded film that small archivists can use as a starting point to tweak and color correct films in their collection when they don't have a color timer on staff.

Refactoring code in ltopers

--Reto Kromer | Code refactoring of ltopers on amiaopensource

The main idea is to add a script for data migration (see [5]) and may be considered for 2017, yet needs to have at least one LTO deck on-site for testing purposes.

RGB integration for FFV1

--Reto Kromer | Integration of tools for the new RGB 16-bit capabilities of FFV1.

Improve vrecord interface (COMPLETED)

--Savannah Campbell | ~~Add audio level monitors to vrecord interface.~~ Since the initial proposal time, audio level monitors have been added to vrecord.

Audio calibration tools

--Andrew Weaver | Work on digital tools for live audio signal analysis to aid with machine calibration and qc. (Possibly an ffplay style interface of filters such as spectrum, lissajous etc).

DOCUMENTATION: Improve Cable Bible

-- Ethan Gates | Add power and/or network cable documentation to The Cable Bible

DOCUMENTATION: Add preservation info to media format wikipedia pages

--Charles Hosale | Add preservation information (or links to preservation information) to wikipedia pages of common media

DOCUMENTATION: OAIS comments

--Kara Van Malssen | The OAIS review period is in full swing right now. I'd like to pick up on this topic where it was left off last year, taking a look at the comments that have been submitted to date, and identifying some specific feedback on the standard from the AV community.

Matroska [TBD]

--Idea by Dave Rice | Matroska specification work

Web Archiving + Metadata

--Idea by Lorena Ramirez Lopez | DIY web archiving that embeds more metadata (OSS option in lieu of Archive-It)

Association of Moving Image Archivists & Digital Library Federation Hack Day 2016

Contents