Difference between revisions of "Association of Moving Image Archivists & Digital Library Federation Hack Day 2013"

From CURATEcamp
Jump to: navigation, search
(2. Integration of mediainfo generated metadata into a forensic imaging workflow)
(1. Timebased transcript/caption display)
Line 83: Line 83:
 
The projects below were discussed during a Google Hangout on November 1, 2013. For more information, please see the [[ Association of Moving Image Archivists & Digital Library Federation Hack Day 2013/PlanningHangout 1101 | notes]] from that conversation.
 
The projects below were discussed during a Google Hangout on November 1, 2013. For more information, please see the [[ Association of Moving Image Archivists & Digital Library Federation Hack Day 2013/PlanningHangout 1101 | notes]] from that conversation.
  
==1. Timebased transcript/caption display ==
+
==1. [http://wiki.curatecamp.org/index.php/Association_of_Moving_Image_Archivists_%26_Digital_Library_Federation_Hack_Day_2013/EIA-608_TT Timebased transcript/caption display] ==
 
Two proposals have merged into one:
 
Two proposals have merged into one:
  

Revision as of 18:00, 6 November 2013

Main Page > AMIA & DLF Hack Day 2013 > Association of Moving Image Archivists & Digital Library Federation Hack Day 2013

>>> When, Where, What time?

  • Date: November 6, 2013
  • Time: ~9am-5pm (with option of continued work projects throughout the conference in our Developer Lounge at Richmond Mariott, Apple Boardroom - available all day Thursday and Friday)
  • Location: Salon B at the Crowne Plaza Richmond Downtown in Richmond, VA
  • hashtag: #AVhack13
  • IRC: #curatecamp_avpres_1 If using an IRC client the server is chat.freenode.net, or you can use your browser and connect to webchat.freenode.net. If you are unfamiliar with IRC, take a look at this ☞ brief introduction.
  • Light breakfast, snacks and coffee will be provided throughout the day!

Contents

How can I participate?

Sign up! As this will be a highly participatory event, registration is limited to those willing to get their hands dirty, so no onlookers please.

If you are unsure whether you can or want to participate in the hack day itself, you can still see the results by attending the AMIA closing plenary, where hack day projects will be presented, and the audience will have an opportunity to vote on their favorites.

What will be the format of the event?

In advance of the hack day, project ideas will be collected through the registration form and the event wiki. In advance of the event, participants will review and discuss submitted project ideas. We’ll then break into groups consisting of technologists and practitioners, selecting an idea to work on together for the day and (if desired) throughout the duration of the AMIA conference in the developers lounge.

The day itself will be structured something like this. Breakfast, coffee/tea, and snacks will be provided. Lunch is on your own.

9am – Welcome, introductions, and breakfast

9:30 - noon - Hacking. Snacks and coffee to be served.

Noon-1pm – Lunch on your own.

1 - 4:30 - Hacking. Snacks and coffee will be served.

4:30 - 5 - Wrap up.

Closing plenary & prizes

Projects will be presented during the conference closing plenary, Saturday November 9 at 9:30am. Projects will be judged by a panel as well as by conference attendees.

Summary

In association with the annual conference, the Association of Moving Image Archivists will host its first ever hack day on November 6, 2013 in Richmond, VA. The event will be a unique opportunity for practitioners and managers of digital audiovisual collections to join with developers and engineers for an intense day of collaboration to develop solutions for digital audiovisual preservation and access. It will be fun and practical…and there will be prizes!

This year's hack day is a partnership between AMIA and the Digital Library Federation. A robust and diverse community of practitioners who advance research, teaching and learning through the application of digital library research, technology and services, DLF brings years of experience creating and hosting events designed to foster collaboration and develop shared solutions for common challenges.

What if I’m not a developer?

Content managers and preservation practitioners are as central to the success of the event as having keen developers. YOU will be responsible for setting the agenda and the outcomes. The goal is to foster collaboration between audiovisual preservation specialists and technologists, to solve problems together and share expertise.

Background

What is a hack day?

A hack day or hackathon is an event that brings together computer technologists and practitioners for an intense period of problem solving through computer programming. Within digital preservation and curation communities, hack days provide an opportunity for archivists, collection managers, and others to work together with technologists to develop software solutions for digital collections management needs. Hack days have been held independently by groups such as the Open Planets Foundation, as well as in association with preservation and access oriented conferences including Open Repositories and Museums and the Web.

The manifesto of a recent event at the Open Repositories conference framed the benefits this way: “Transparent, fun, open collaboration in diversely constituted teams...The creation of new professional networks over the ossification of old ones. Effective engagement of non-developers (researchers, repository managers) in development...Work done at the conference over presentation of something prepared earlier.”

Why an AMIA hack day?

An audiovisual preservation-themed CURATEcamp was held in April 2013, drawing over 120 registrants from at least 3 continents for a day of great conversations and lightning talks. CURATEcamp is as series of unconference-style events focused on connecting practitioners and technologists interested in digital curation. The event generated a lot of documentation and articulated many shared concerns. Topics covered included digitization of video, film scanning, digital storage strategies, proprietary digital video files in collections, and technical metadata for preservation. The participants of the event agreed that more work needed to be done and action taken, so the idea for an AMIA hack day was born.

Discussions between managers of audiovisual collections and solutions developers provided a fruitful starting point for a hack day project ideas, including:

  • Simple fixity tools to use when transferring files from one storage medium to another
  • Technical metadata extraction and making use of these reports (MediaInfo, ffprobe)
  • Simple cataloging tools for AV, with eye towards contemporary frameworks/schema
  • Discovery tools/UX for audiovisual collections, access at scale

Project proposals

Please register for the hack day (we're currently at capacity, but forming a wait list) and we will start adding your ideas here for voting in advance of the Hack Day!

Possible topics projects could touch on: fixity checking; transcoding; metadata validation; automating file movement; altering fdupes so that it will show user md5 checksum hash; alter Archivematica 1.0 code to bypass zipping the AIP.

Loose metadata projects ideas: Segmentation and time-based annotation of video segments on the web (maybe leveraging Media Fragments?); XSLT mapping; Turn CSV fields into PREMIS xml; Using geolocation information to facilitate new access pathways to video; RDFing PBCore, potentially to leverage in Fedora 4

Loose non-code projects ideas: Editing/adding wikipedia pages, create a manual for a tool or a workflow, create a webpage

Please submit your project ideas using the format below. Remember, the more specific the better. Have a look at the project descriptions from Open Repositories 2013 for inspiration.

Project Sign Up Sheet

Sign up for projects you are interested in here

Signing up in advance does not mean you are committed to work on that project. And it does not mean these are the only projects. There will still be an opportunity to add additional projects on the day of the event and sign up for those as well.

The projects below were discussed during a Google Hangout on November 1, 2013. For more information, please see the notes from that conversation.

1. Timebased transcript/caption display

Two proposals have merged into one:

Extraction of EIA-608/line 21 closed caption information: Ability to extract and reuse closed caption information from NTSC video.

+

Interactive Video/Transcript Streaming: This project would use the open source Interactive Video/Transcript viewer package as a baseline for streaming video and transcripts. This package has weak support and is becoming increasingly difficult to maintain. The hope is to come up with an approach to build or improve upon the existing system to reliably stream video files with their time coded transcripts across multiple browser and OS types.

Notes

Notes from the Nov 1 planning call

Possible starting points

Maybe: http://ccextractor.sourceforge.net/ Also: http://dev.w3.org/html5/webvtt/

+

The original IVT package is here: IVT.zip

Data set required

Uncompressed video files that contain line 21 closed caption information

Sample Data: CanadaVideoTranscripts.zip

The existing IVT player is running here: Live Site

Submitted by

Steven Villereal

+

Chris McNeave

Interested team members/participant roles

Who wants to work on this project?

2. Integration of mediainfo generated metadata into a forensic imaging workflow

Would like to generate and include mediainfo key/value pairs into DFXML for forensic disk images that contain audio or video files. This could be accomplished through the FIWalk utility's DGI interface.

+

Reconciling filenames with embedded technical metadata/named parameters: I'd like to explore if it would be possible to compare embedded technical metadata (file/MIME type/external signature) to existing media filenames to ensure that all files in a given directory are what they are supposed to be according to the extension. There can be messages/a report if any files do not match your named parameters.

Potential User Story: As a CONTENT MANAGER, I need to verify that files with an "mov" extension in a named directory (*.mov) are Quicktime files so that I can ensure filenames accurately represent embedded technical metadata.

Pre-conditions: Specifications of files already determined (ie all access files are qt wrapped .mov), Have associated utilities available to read metadata

Post conditions: Filenames include accurate extension, content manager is delivered a report of any/all inaccurately named files in directory.

Notes

Notes from the Nov 1 planning call

a pdf of what dfxml looks like + mocked up mediainfo: https://docs.google.com/file/d/0B1hVT_M0h1f_VnVqZnV4R0J1amc/edit

FFprobe output description: http://stackoverflow.com/questions/3199489/meaning-of-ffmpeg-output-tbc-tbn-tbr

Desired fields: https://docs.google.com/a/avpreserve.com/spreadsheet/ccc?key=0AusBkYeQJnetdGdyRXVYMVVsUlR6YS1OTjEwWkUyT0E&usp=sharing

Possible starting points

Registries for extension associations (ex. PRONOM: http://www.nationalarchives.gov.uk/PRONOM/Default.aspx)

MediaInfo: http://mediaarea.net/en/MediaInfo

Exiftool: http://www.sno.phy.queensu.ca/~phil/exiftool/

Georgetown University Lib File Analyzer?: https://github.com/Georgetown-University-Libraries/File-Analyzer

http://www.sleuthkit.org/sleuthkit/
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.149.5362&rep=rep1&type=pdf
https://raw.github.com/dfxml-working-group/dfxml_schema/v1.1.0/dfxml.xsd
http://mediaarea.net/en/MediaInfo

FFMPEG: http://www.ffmpeg.org/download.html

Submitted by

Donald Mennerich

+

Kathryn Gronsbell

Data set required

Forensic disk images containing audio and video files

Interested team members/participant roles

Jason Evans Groth (NCSU, jevansg@ncsu.edu) / TBD (could help some from either context)

3. RDFing PBCore

Let's see if we can come up with a RDF expression for PBCore. Could be useful for things like the up and coming Fedora 4.

Notes

Notes from the Nov 1 planning call

Possible starting points

http://pbcore.org/index.php
http://www.w3.org/TR/REC-rdf-syntax/
http://dublincore.org/documents/dc-rdf/
Bawstun app from WGBH...can output PBCore XML from EBUCore RDF...could be reverse engineered? https://github.com/curationexperts/bawstun/tree/master/app/models

Related skillsets

Any of: knowledge of pbcore, XML/RDF, OWL, metadata schema in general

Data set required

Sample PBCore (to be provided)

Submitted by

Kara Van Malssen (idea by Karen Cariani)

Interested team members/participant roles

Who wants to work on this project?

4. Format/codec evaluation/selection tool

"What format should I use when digitizing my videos?" This is by far the most heard question for video archiving consultants, I guess. But possible answers are complicated and very context-related, say: often frustrating for the asking non-specialist practicioners as for consultants. For a possible hack day project see description of submitted idea for a digitization workflow development tool above. A format/codec evaluation/selection tool could be part of or a first element of this bigger tool.

Notes

Notes from the Nov 1 planning call

Possible starting points

See idea for a digitization workflow development tool above.

Related skillsets

See idea for a digitization workflow development tool above.

Data set required

ee idea for a digitization workflow development tool above.

Submitted by

Yves Niederhäuser

Interested team members/participant roles

Who wants to work on this project?

5. Metadata schema developing and mapping tool

As metadata is becoming more and more important for most of video archiving aspects (as conservation, management, access etc.) and by the same time there is little help for non-specialist practicioners, an easy-to-use tool with a simple graphical interface could be one valuable element. The project could be to develop a tool for editing existing (or self-developed) metadata schemas/standards with export functionalities producing useful formats (like XML, stylesheets or whatever) useable in widespread programmes used for collection management/description (like FileMaker, Excel, Access or whatever). An additional part of such a tool could be a mapping and data transformation element, allowing users to map one existing schema (in different file formats like XML, CSV) to a target schema (like EBUCore) and transform existing data.

And here again, an online version of such a tool could collect and disseminate edited schemas, crosswalks, mapping schemas etc. and serve as exchange platform.

Possible starting points

  • any interest in creating mappings to allow [dp.la/info/about/faq/ DPLA] to expose richer metadata about sound/moving image content? DPLA crosswalks here or more info as needed...

Related skillsets

No idea, sorry…

Data set required

Different metadata schemas and datasets

AVPreserve is setting up an instance of MINT for use during the Hack Day if needed. Unfortunately, the source code does not appear to be available.

Submitted by

Yves Niederhäuser

Interested team members/participant roles

Who wants to work on this project?

Meghan Fitzgerald (Turner Broadcasting, meghan.fitzgerald@turner.com) / content manager/SME

6. Creating a Sample METS (Addressing METS Specification) for Digitization Project of Analog Audiovisual Collections

Several sets of specifications are already available for creating a METS schema. But I have not really heard of any complete METS example that is boilerplated to work for a real digitization project. University of Michigan, after several trials to look for existing schema that we can piggyback, is currently creating an example METS for outsourced digitization project that can be used from end to end. The application programmer at Digital Library Production Service department has created it out of the existing audio METS xml, VideoMD, and other spreadsheets that U of M has been using as interim means. And several related people are now discussing and examining that sample section by section. I would like to know if a group can sit and investigate this current sample and give comments/feedback about its possible limitations/errors/issues to make a better version out of it. If the whole sample is too big to work on in a day, I would like to propose to review the process history/provenance section only since that could be the most challenging section to tackle due to complicated video digitization process itself. If we can come up with anything that seems to work as a working sample, it can be shared/distributed and used at this standard-less age.

And here again, an online version of such a tool could collect and disseminate edited schemas, crosswalks, mapping schemas etc. and serve as exchange platform.

Notes

Notes from the Nov 1 planning call

Day of notes

Possible starting points

Here are very drafty draft that UM programmer created. There are many notes and it does not quite look complete but I believe this can be a starting point. More than anything, we are in need of any outsiders who can review this with fresh eyes and many other different experiences.

Both the video process history schema and example METS are located in the directory: http://www-personal.umich.edu/~grosscol/vprocesshistory/

Related skillsets

Knowledge of video/audio metadata, familiarity with audiovisual digitization project?

Data set required

Existing metadata set that are created from the digitization project at each institution

Submitted by

JungYun Oh

Interested team members/participant roles

Who wants to work on this project?

7. Produce easy-to-follow documentation for the installation and use of FFMPEG transcoding software

Specific usage topics might include batch transcoding, metadata extraction, common output profiles, and FFMPEG version upgrades. Evaluation of available GUI's might also be included as a secondary goal.

Possible starting points

http://www.ffmpeg.org/

http://avanti.arrozcru.com/

http://sourceforge.net/projects/ffmpeg-gui/

check also: http://www.reto.ch/training/2013/20130503/ (its in German, but commands are commands...)

Kathryn Gronsbell: Helpful hints for basic FFMPEG from Kelly Haydon https://docs.google.com/document/d/1zbThoqnEl50Yw_fG9prHSptlIjo6tdteieVq4XP4K_E/edit?usp=sharing

Related skillsets

Windows/MAC/Linux Operating Systems, Document Writing, Digital Media Transcoding

Data set required

Sample media files for transcode tests

Submitted by

Nash Bly

Interested team members/participant roles

Software Testers, Media Transcoders, Document Writers - Who wants to work on this project?

Working Documents

ffmpeg hackday notes - https://docs.google.com/document/d/1RFlXJGXChbIwNXs3Ka01sHj-RXNEAt1h9yWPpFvZUJ4/edit?usp=sharing

8. CURATEcamp-syle discussion

For those that are more interested in meeting up with other folks for discussion and brainstorming on specific topics, we are setting aside an area for a CURATEcamp style "unconference" breakout groups. Folks interested should come prepared with potential topics for discussion. These will be gathered on the morning of the event, and voted on by those registrants in the CURATEcamp stream. For more information, please visit the CURATEcamp website, and see the documentation from CURATEcamp AVpres 2013 held in April 2013.

Please note that while discussion groups are not discouraged, these groups will not be eligible for awards.

Depreciated topics:

Merged with Timecoded transcripts and FFMPEG documentation: Moving Image Research Collections Digital Video Repository

Several potential ideas for improving this DVR that can hopefully be integrated into other sites…
- Timecode-based tagging in videos or other ways to allow for user-generated metadata
- A way to connect related video material
- scripts for transcoding video (modifying an existing script)
- Issues in XACML restrictions / easy way to make records public/non-public

Possible starting points

DVR: http://mirc.sc.edu
Git: https://github.com/DGI-USC

Related skillsets

Drupal knowledge, Fedora/Islandora, ffmpeg, Python

Data set required

Video files, records, scripts, the DVR itself? (Providable.)

Submitted by

Ashley Blewer

Replaced by Format/Codec selection tool: Digitization workflow development tool

Where do I start once I decided digitization is the right thing to do for my video collection? How do I decide whether to built up infrastructure/know how in-house or to outsource digitization? How do I need to prepare analog tapes for best results and minimal risk? What information do I need and which requirements do I have to ask for in a call for tenders? What do I have to do and how to controll the quality of digitization? How do I store the new archive masters and access copies? Which codecs/formats are best in my case? A little stand-alone or online tool for video collections/non-specialist practicioners, maybe something like an interactive flow-chart or decision path, that helps to ask the right questions and produces an automatic report after running it could be a big help for lots of non-video-specialist collection managers and serve as starting point for consultations, evaluation of tenders, convincing of decision makers etc. A possible online version of a tool like this could integrate a "similar projects"-functionality, pointing collection managers to other projects/people with experience in similar cases and thereby built up/strenghten a network for exchange. I think there is still a big potential in bringing people of this field together!

Possible starting points

There are tons of online survey tools that maybe could be used as technical starting point; the right set of questions could be collected/priorized/structured during the hack day.

Related skillsets

Unknown technical/developer's skills and some video digitization and collection management expertise is needed for this project.

Data set required

None.

Submitted by

Yves Niederhäuser