Creating an artificial test set using emulation

From CURATEcamp
Revision as of 14:06, 13 November 2012 by Euan Cochrane (talk | contribs)
Jump to: navigation, search

Main Page > CURATEcamp iPRES 2012 > CURATEcamp 24 hour worldwide file id hackathon Nov 16 2012

Files for use in testing format ID tools are most useful if they have a known source and known content. It can be hard to source test files that have a known source and known content that are also free to use. A simple way to solve this problem would be to create files using original software using emulation or virtualisation software to run the original software.

Software used for creating files

Parameters to use (formats to create)

Content to include (significant properties)

Ideally every possible type of content should be included in each file and multiple instances of each in different configurations. This would ensure comprehensive testing options. It would also be useful to ensure content is not repeated in the test files so that it is easy to identify where in the file the content came from. A list of potential types of content is included in Appendix 1 of the Rendering Matters Report

Plan for the Hackathon day

  1. Pick a software application and paramater to use to create files and add the details here (link to google spreadsheet to be added) e.g. WordStar 7 for dos, "default save paramater (wordstar 7.0 format)" + details of user doing work.
  2. Set up emulation environment and install software.
  3. Get content to include -- copy to text file or text, copy to csv/tab delimited file for structured data, bmp/tiff for images and add to virtual disk file that can be attached to the emulated environment.