DLF 2011 - Preparing data for the Linked Data environment

From CURATEcamp
Jump to: navigation, search

XC Poster at the DLF Forum covered this topic

  • it does data normalization
  • start to think beyond "records" into more FRBRized structure (FRBR entities as subjects of triples)
  • poster is on extensiblecatalog.org/DLF2011
  • XC has an XML schema for FRBRized RDA data, though not yet Linked Data

Perception that there's a high technical barrier to implementing XC. XC would love to work with partners to make this better.

To move towards Linked Data, need to understand both enough about RDF data model and what RDF properties to use to move forward. OK if all the decisions about properties aren't perfect for now, can just pick some and work on it over time.

Big variations in what people understand.

Big challenge to understand really what the subject of any triple is. FRBR might be able to help here. But not in all cases (eg MARC 500 note about an author).

Concern about putting data "out there" that's rough, perhaps not best properties used.

Don't have to jump all the way in the deep end to do Linked Data. But remember LD doesn't necessarily have to be RDF. Though using commonly-implemented RDF properties is a benefit.

UC San Diego uses an RDF triple store under the hood, but can take in and spit out METS.

RDF based on an "open world assumption." Diane suggests this means your QC work needs to be focused more heavily on what's coming in (from others) rather than what we publish out.

Catalogers have historically seen conversion between formats as something tools do. But perhaps the future is for catalogers to be a more active part of these sorts of data migrations. But this would require more cataloger/metadata specialist input into how the tools were built. A continuum of tools that can process 100,000 triples, or 1.

What specific tools can help us? - XC toolkit - custom XSLT stylesheets - see also Karen Coyle's Code4Lib journal article from July 2011 http://journal.code4lib.org/articles/5468

Downside for XSLT processing for this use case, is you lose the (LD-based) provenance claims on the data. Is it useful to have triples that cite the original data and the stylesheet used to transform it? Also danger of losing clear assumptions when moving from closed- to open-world model.

To really "do" Linked Data, libraries need to be consumers of data in addition to producers of data.

Blog with some info on one person working to expose LD: http://library.caltech.edu/laura/?tag=cit-lod (cit-lod is the tag for this activity)

It's easy to throw up some data on the web, harder to do it in a persistent manner that could be used for production services.

Ad hoc experimentation is being done; prototyping is the way to go to overcome the fact that most legacy systems can't support Linked Data.