Association of Moving Image Archivists & Digital Library Federation Hack Day 2014

From CURATEcamp
Revision as of 12:44, 8 October 2014 by Mark Bussey (talk | contribs) (Video characterization tool analyzer)
Jump to: navigation, search

>>> When, Where, What time?

  • Date: Wednesday, October 8, 2014
  • Time: ~9am-5pm (with option of continued work projects throughout the conference in our Developer Lounge TBA location)
  • Location: Hyatt Regency Savannah, Savannah Room
  • hashtag: #AVhack14
  • IRC: #curatecamp_avpres_1 If using an IRC client the server is chat.freenode.net, or you can use your browser and connect to webchat.freenode.net. If you are unfamiliar with IRC, take a look at this ☞ brief introduction.

How can I participate?

Sign up! As this will be a highly participatory event, registration is limited to those willing to get their hands dirty, so no onlookers please.

If you are unsure whether you can or want to participate in the hack day itself, you can still see the results by attending the AMIA closing plenary, where hack day projects will be presented, and the audience will have an opportunity to vote on their favorites.

What will be the format of the event?

In advance of the hack day, project ideas and edit-a-thon topics will be collected through the registration form and the event wiki. In advance of the event, participants will review and discuss submitted project ideas. We’ll then break into groups consisting of technologists and practitioners, and Wikipedia editors, selecting an idea or topic(s) to work on together for the day and (if desired) throughout the duration of the AMIA conference in the developers lounge.

The day itself will be structured something like this. Coffee/tea will be provided. Lunch is on your own.

9am – Welcome, introductions

9:30 - noon - Hacking & Wikipedia editing. Snacks and coffee to be served.

Noon-1pm – Lunch on your own.

1 - 4:30 - Hacking & Wikipedia editing. Snacks and coffee will be served.

4:30 - 5 - Wrap up.

Closing plenary & prizes

Projects will be presented towards the end of the conference. Projects will be judged by a panel as well as by conference attendees.

Summary

In association with the annual conference, the Association of Moving Image Archivists will host its 2nd annual hack day on October 8, 2014 in Savannah, GA. The event will be a unique opportunity for practitioners and managers of digital audiovisual collections to join with developers and engineers for an intense day of collaboration to develop solutions for digital audiovisual preservation and access. This year, we will be holding a concurrent Wikipedia Edit-a-thon[1] for those interested in adding to knowledge pool about audiovisual preservation and access. It will be fun and practical.

AMIA is again partnering with the Digital Library Federation in organizing the hack day. A robust and diverse community of practitioners who advance research, teaching and learning through the application of digital library research, technology and services, DLF brings years of experience creating and hosting events designed to foster collaboration and develop shared solutions for common challenges.

What if I’m not a developer?

Content managers and preservation practitioners are as central to the success of the event as having keen developers. YOU will be responsible for setting the agenda and the outcomes. The goal is to foster collaboration between audiovisual preservation specialists and technologists, to solve problems together and share expertise.

The day will also include a Wikipedia Edit-a-thon. So even if you're not a developer, nor feel compelled to lend your digital preservation ideas to software and code development, you can contribute to creating new or updated content on Wikipedia for the benefit of our community! You can read all about Wikipedia edit-a-thon events here.

Background

What is a hack day?

A hack day or hackathon is an event that brings together computer technologists and practitioners for an intense period of problem solving through computer programming. Within digital preservation and curation communities, hack days provide an opportunity for archivists, collection managers, and others to work together with technologists to develop software solutions for digital collections management needs. Hack days have been held independently by groups such as the Open Planets Foundation, as well as in association with preservation and access oriented conferences including Open Repositories and Museums and the Web.

The manifesto of a recent event at the Open Repositories conference framed the benefits this way: “Transparent, fun, open collaboration in diversely constituted teams...The creation of new professional networks over the ossification of old ones. Effective engagement of non-developers (researchers, repository managers) in development...Work done at the conference over presentation of something prepared earlier.”

Our Manifesto

Manifesto:

  • Transparent, fun, open collaboration in diversely constituted teams over individual brilliance and/or groups of like individuals in cut-throat competition.
  • The creation of new professional networks over the ossification of old ones
  • Effective engagement of non-developers (researchers, repository managers) in development over purely developer driven projects.
  • Work done at the conference over presentation of something prepared earlier (meaning not working on a project you a working on during your day job)

Hack Day Project proposals

Below are loose ideas for projects to hack on! If you're interested in one of the project stubs below, sign up for a wiki login and add your thoughtful comments or possible starting points to the proposal, or contact the proposer via twitter or email. As the Hack Day approaches, we'll brainstorm further and consolidate like-minded projects.

Hacking on video capture via ffmpeg + qctools + decklink sdk

  •  @dericed
  • Interested? Your name + any comments/initial ideas
    • ** Ashley Blewer! - @ablwr -- dericed is my hero so I will follow him to the ends of the earth (and also hack video capture and purple dinosaurs)

PBCore XML Record Generator (data submitted via a form, which spits out PBCore XML)? Updated PBCore Record Validator?

  • casey_davis [at] wgbh [dawt] org @CaseyEDavis1
  • Interested? Your name + any comments/initial ideas
  • Crystal Sanchez- I would love to work on this- I think if we could have a dropdown with a few kinds of XML to spit out- that would be great. I am looking for XMP data in addition to PBCore and I can bring a template of fields. (we support XMP mapping for ingesting video to our DAMS). What other XML schemas? and a tool like this does not exist yet?- maybe even add Exiftool in the mix here to optionally write to supported file types?
  • Mark Bussey - have you looked at extracting the code from HydraDAM xml exporter? I'd be happy to take a stab at this this week - more generally have you looked at Oxygen?

Development of a UUID (universally unique identifier - String or Number) system for moving image physical/digital elements

There will be a UUID registrar. The registrar server would hold the UUID and pointer to metadata/item information. This would allow a wide range of possible usages from access information to relational trees. Because we do not want to limit this assignment for elements where there is no internet access there will be a system similar to MAC addresses/UPCs where a registered archivist/lab/individual could be given a UUID blocks for assignment offline and then register later online without collision. The UUID could be made into a 1D or 2D bar code or human readable marking on the element for instant access through the server pointer to metadata on the content and physical item.

  • tommy [at!] videofilmsolutions [dawt] com @VideoFilmSol
  • Interested? Your name + any comments/initial ideas


Video characterization tool analyzer

I would like to continue working on a project I began at the 2014 Open Repositories conference, a video characterization tool analyzer. The tool runs multiple command line video characterization applications on a given file/set of files and outputs the results in a format that is easy for comparative analysis. The aim of the tool is to identify differences in the outputs of these common applications, with the goal of submitting reports to their developers and eventually improving them. Read about the work started at OR2014

  • @kvanmalssen
  • Interested? Your name + any comments/initial ideas
  • Mark Bussey: in the Hydra community we use [2] as a wrapper for FITS, FFMPEG, and other tools of your choosing - if you're into ruby & rails, it might be interesting. There only Hydra dependency is that it's in the name and being maintained by the community, but you don't really need to be running Hydra to use it. There are a couple of repos that are using it in production and beginning to amass some technical metadata generated this way.

Broadcast Wave header support/testing

  • Further investigation of software support for Broadcast Wave header information (Audacity customization?) – justinkovar [at] utexas [dawt] edu @KovarSound
  • Interested? Your name + any comments/initial ideas


Video thumbnail summaries as metadata

I've been interested in using video preview thumbnails as a way to provide summarized access to digitized video that will unlikely get further description. You can read more about what I've done here: http://ronallo.com/blog/a-plugin-for-mediaelement-js-for-preview-thumbnails-on-hover-over-the-time-rail/ I could use help improving that JavaScript plugin or in turning the production of video thumbnails and the metadata track file into a service of some sort. I'm also happy to help as a developer on another project.

  • jronallo [at] gmail [dawt] com / @ronallo
  • Interested? Your name + any comments/initial ideas
    • Ashley Blewer! - @ablwr -- B-) Open Source Report Card once called me "a distinguished JavaScripter."
    • Nicholas Zoss - Servicizing the processing seems interesting. I'm interested in helping on this project as I'm able.
    • Jay Brown - sounds interesting and will help out as I can

Disk usage pie chart

Disk usage pie chart! I've been looking for a software tool that would allow us to calculate which projects are using the most server disk space in our collections, how old files are, and when they were last accessed, and then throws all of that data into visual form – like charts, graphs, and especially pie charts! I developed a web tool that just shows individual project sizes and how data is added or deleted from day to day, but it only shows a list of projects and their sizes. To convince my supervisors that certain projects are taking up too much room (and are never accessed), I have to create visuals using excel or other programs, which takes me hours but could easily be automated.

  • martinn [at] hrw [dawt] org
  • Interested? Your name + any comments/initial ideas

Video metadata wrangler

I would love to see a simple tool for writing metadata to video files. Most of the people I work with are not highly proficient in technology, so they work predominately with Adobe products. I would love something like- creating a form where users can type in their information to set fields, and then using Exiftool to write to the files, and/or create side-car XML files (if the file cannot be written to). I envision supporting dublin core? and being able to create various XML schemas (we use XMP). That way, collection managers would not need to have Adobe products (Premiere, Bridge is mostly what we use) to be able to manage metadata for their files as a part of their collections processing. they could use this free easy tool..!

  • @cristalyze
  • Interested? Your name + any comments/initial ideas


SMIL playlists

Harvard Library is currently commencing with a migration plan for SMIL playlists. SMIL (Synchronized Multimedia Integration Language) is a W3C-recommended XML structure for containing structural/technical metadata, time-tagging/excerpting, and relationships between multimedia files, also facilitating access to media files across servers. Harvard formerly used SMIL to aid in delivery of audio formats through RealPlayer but are moving away from that system. However, crucial metadata is contained in the SMIL playlists which need to be extracted and paired with the original Audio Decision Lists to create a new XML based off of the AES-60 convention. While a suite of tools exist for extracting header information from the SMIL files, there may be information contained in the body of the file that is still important to maintain with the more current metadata file and the software needs to be built out to accomodate this.

  • Based on the fairly Harvard-centric nature of this project (and the environment for implementing the tools) I am thinking of withdrawing this idea but welcome any interest or parallel issues from others so as to make it more agnostic!*
  • joeygheinen.jh [at] gmail [dawt] com

improving the SMIL application that adds timecode to TEI-encoded transcripts. Right now, you can edit the timecode but not the actual text. See attachment. It'd be great to be able to edit the text as you go. As it is now, you have to export the xml file, make the change in an XML editor and then re-import it into the SMIL tool again. It's not very fluid.

  • [twitter.com/kcariani @kcarani]
  • Interested? Your name + any comments/initial ideas

ArchivesSpace plugins for audio / visual materials

e.g., PBCore import / export; embed HTML5 video player for mp4 files.

  •  brianjhoffman [at] gmail [dawt] com
  • Interested? Your name + any comments/initial ideas


File format monitors/QA

  • My proposal is for building a File Format Obsolescence Analysis Engine. The purpose of the engine would be to provide information about--and options for--migrating and transcoding obsolete media file formats through simple and intuitive user interactions. The user provides the engine with an arbitrary file, which the engine then analyzes using any number of metadata forensics and validation tools (MediaInfo, JHOVE and DROID to name a few). The engine then decides whether the file needs to be migrated, or whether the current format can be considered "preservation ready". For the purpose of this proposal, "preservation ready" means that the file meets a list of minimum requirements, such as being stable and being supported by certain playback systems. However, determining a comprehensive list of these criteria is outside of the scope of this project. The output of the engine is two-fold: First it will generate a report about the input file. This report will contain the most salient aspects of the file and it's technical metadata in a format that is human readable, but can be easily parsed by a computer in order to facilitate scripting and automation. Second, the engine will move the input file to user-designated output folders according to its state (needs migration or preservation ready). These folders can be used simply to organize the files, or they may function as watch folders for transcoding engines or any other automation systems (which are out of the scope of this proposal). The two most important features of this engine are as follows: 1) The input and output should be as simple and intuitive as possible. The idea is to disseminate the engine as a general tool for the preservation community at large. Due to the wide range in technical skills available to potential users in this community it is critical that the tool be seen as "easy to use". 2) The engine needs to be built in a way that is extensible and easily updated. Due to the time constraints of this event, building a comprehensive analysis engine is out of the scope of this proposal. However, the engine's utility would be greatly enhanced if the framework is built in such a way that members of the community can easily update and add support for various file formats without compromising the previously mentioned usability. Thus, the idea would be to build a baseline that the community could then expand upon in the future.
  • @av_morgan
  • Interested? Your name + any comments/initial ideas
  • Plato is an open-source tool for instituting a preservation planning process for digital objects and integrating services for content characterisation, preservation action and automatic object comparison. Harvard Library is investigating use of this tool to help develop policies for file format migration. Ideally Plato would integrate along with a file format identification/characterization tool (FITS, DROID) and perhaps also to a designated migration tool if the requirements match (e.g. ImageMagick, ffmpeg) and a QA tool that may also communicate back to Plato so as to conform to the overall policy. At this stage we are merely investigating the functionality of this tool and are open to other ideas as to how Plato could be instituted into a broader preservation plan (though ideally with migration in mind). Taverna is another tool worth exploring that is similarly used for monitoring workflows. Any file format oddities are welcome from other institutions so as to experiment with implementing a more complex digital migration policy.
  • joeygheinen.jh [at] gmail [dawt] com
  • Interested? Your name + any comments/initial ideas

FFmpeg GUIs

I would love to see a good GUI for FFMPEG. Super is nice but it doesn't have all the formats and codecs that FFMPEG has and I think a program that transcodes into any format and does more than one format at the same time would be a great benefit to small budget archives. • srdbx [at] netvision [dawt] net [dawt] il

HTML front end for the Public Media Platform

The Public Media Platform is quickly becoming the most powerful way of getting access to content from all U.S. public television and radio stations, plus independent producers and more. The PMP is intended as an uber-API, allowing producers to store and share their content, and users to query the PMP data store via keyword, media type, date range, geo, etc. In response to a query, the PMP returns content in JSON format. This is ideal for JavaScript developers who can build a web front end by using JS to convert the JSON to HTML. But many potential users of the PMP aren't developers, and won't have access to the skills needed to use the JSON data.

This project has a simple deliverable: A small JavaScript application to convert the JSON output of the PMP into HTML5. So a single story or media archive returned in JSON format from the PMP would be available as either an HTML <section> or <article> with the appropriate headings etc. Even better if we can include ARIA landmarks, title and alt attributes, and classes and IDs for styling. But styling isn't part of this proposed project; let's solve the JSON-to-HTML part, and designers can take it from there.

Brian Hoffman I'm interested in this. Perhaps it would be a good case for using AngularJS or another SPA framework.

Wikipedia Edit-a-thon topic proposals

IRC: #curatecamp_avpres_2 If using an IRC client the server is chat.freenode.net, or you can use your browser and connect to webchat.freenode.net. If you are unfamiliar with IRC, take a look at this brief introduction.

We’ll be hosting a concurrent Wikipedia edit-a-thon, which will focus on topics related to digital preservation & access for audiovisual materials. While we encourage non-engineers to participate in the hack day portion, there’s a lot of work to be done to describe topics relevant to our community on Wikipedia as well. (via AMIA Announcement) Below are loose ideas for Wiki projects or topics to edit! If you're interested in one of the project stubs below, sign up for a wiki login and add your thoughtful comments, or contact the proposer via twitter or email.

Haven't edited a Wiki before? No problem! We will have a brief crash course early in the day, with help available anytime! It's easy to learn, we promise.

Have a new idea? Use this format:

  • Topic: [link to existing Wikipedia page goes here, or new topic]
  • Interested?
  • Sign up to edit this topic (OK to sign up for multiple topics):

AV Artifact Atlas

  • Interested?
    • Kristin MacDonough
    • Help make the AVAA more user friendly! The AVAA is intended for experts and non-experts, and while the resource provides a lot of useful information, it could benefit from more user input and feedback. I will be present to help make or make note of changes. --kristinmac @super_kmac
  • Sign up to edit this topic (OK to sign up for multiple topics):

Digital Preservation

  • Interested?
    • Kathryn Gronsbell
    • Check out the great efforts to maintain and expand the Digital Preservation wiki page! "The scope of this project is to reorganize and revise the content of the current Digital Preservation article so that it reflects the current state of the field and is better suited to ongoing updating and editing. We will also review related articles to determine their content and relationship to the main article. A further goal of this effort is to include links to relevant standards and best practices in the field of digital preservation." Thanks to Lauren Sorenson and Andrea Goethals for suggesting. -- kgronsbell @k_grons
    • To Do/Done list of topics
  • Sign up to edit this topic (OK to sign up for multiple topics):

FFmpeg Guides

  • Interested?
    • rhfraim
    • Layperson-understandable documentation for ffmpeg! The up-to-date information out there is mostly targeted at developers; there's not a ton out there that's both up-to-date and designed for a user community.
  • Sign up to edit this topic (OK to sign up for multiple topics):

Technical characteristics of tape format chart

  • When digitizing video tapes, it is important to preserve technical characteristics of the tape formats. I was thinking that it might be useful to have a resource (like a chart) that lists different tape formats and their important specifications (e.g. 8-bit or 10-bit, PAR, subsampling scheme...) so that one selects the right options when digitizing. – @ng_yvonne
  • Interested? Your name + any comments/initial ideas