VersionControl Version control of "digital objects:" options discussion (Giarlo)
Version control of digital objects
Reverse directory (RED) objects discussed a year ago at CurateCamp
Have a prototype, using GIThub -- using Bags (BagIt). Didn't have to write much code.
Get a lot of functionality out of it -- do diffs between changes. Can version their content.
What can you use remotes for in GIT? what about branches? Get fetch and pull for object replication sounds sexy.
What is your institutions' method for dealing with versions?
When using Fedora over IRODs, decisions need to be made at both layers; where do you make it?
Kam Woods likes Bazaar instead of GIT. Subversion does badly (deletes when renaming) - can be an issue with GIT.
Bazaar has cross-platform gui, easy to work with.
GIT is Process and ram-intensive
Randy Fischer keeps original and latest (best) copies of documents.
Propilon has project with state of Kansas, developing repository, building Kalis, built on subversion -- can see state of documents at any point in time.
- of people satisfied with their methods? none.
Mark Matienzo: When looking at multiple versions, do we need a metaframework for looking at versioning?
If line-based, it doesn't work well with binary. Need tools to understand the changes across versions.
Mike Giarlo: agreed. A framework that applies the source control systems as plugins as needed, so Boar (on Google code) for binary, and something else for text.
Need to know what the difference really is, not just that it's different. What would a binary difference look like?
Is there any difference too small to version?
How do you visualize some obscure bit off a video codec?
You will never have a tool that can manage all those variations of binary files.
Reverse image search tool Tineye (Tinyeye) -- use to find hi-res version.
images.google com -- can put in image of your face.
can use fuzzy hashes to compare files (from forensics) to find similarities.
automated FRBRing? Jerry McDonough has done some hand-crafted.
One use case is to offer users choices of rendering. For a normalization strategy...
Peter Van Garderen: use things for what they're good at.
Mike Giarlo developed their prototype of GIT -- up on Github -- for Django. Will test on large files.
Every object is its own repository. An object is whatever the archivist says it is --
At Penn State, they sit in the library but report to IT.
would be helpful to know what portion of an image has been altered/changed, or if it's all of it.
github, mediawiki, google docs the three versioning tools out there.
History flow has interesting display of changes over time to wikipedia
Branches may be the same as different renditions.
remote - that's for syncing separate repositories.
Mike Giarlo will start with scalability testing on github this fall.