IS&T Archiving Conference CURATEcamp 2013/formats

From CURATEcamp
Jump to: navigation, search

Problem of Unidentifiable formats (e.g. research data from sensors)

  • They don’t consult them before buying sensors
  • Some things you can do about it
    • Expose problem files to tool developers
    • More of a knowledge base problem than tool problem - contribute to format registries
  • Hard problem to identify binary and proprietary formats - not many people working on it
  • Undocumented formats - not much hope

Normalization? - good or bad

  • Pro: Reduce number of formats
  • Con: Lose authenticity?
  • Content situational (email)
    • Need Tools to identify it and normalize it (some scientific data not possible)
    • When to consider it
      • Content warrants it (expensive to produce)
      • Format warrants it
        • Email - Difficult to preserve native formats long-term
  • Loss during transformation
    • We might be losing information already
      • RAW to other formats (DNG - an attempt to normalize RAW formats)
      • DNG can embed the original format in it
      • inherent problem with RAW - it needs to be interpreted
      • Nice to have when you want to go back to the RAW file
  • Transformations in general - Losslessness?