Leuven2012:WG Linked Data

From IntereditionWiki

Auto-suggested OAC Annotations for TEI

  • Thu., Jan 12
    • Caching Geonames list of cities with more than 15,000 population (~26000 items)
    • Given a list of names, looks them up in Geonames, and automatically generates an unconstrained OAC annotation with data about that city.
  • Fri., Jan 13
    • Made the geoname annotation creation a callable service, given input from Dirk's Distiller
      • the distiller has this function: grab a set of XML files, make an index of each word: list ist occurrences in xpath form augmented with the sequence number in the element defined by the xpath expression
      • distiller sophistication: works on a collection of documents, determine the frequencies of words; computes the average frequency; defines a threshold frequency as a factor times the average frequency; leaves out any word with frequency higher than the threshold
      • yesterday I finished the standalone desktop distiller in Perl
      • today I reworked it into a service, still a perl script, that runs via Apache's CGI
        • you can post a newline separated list of uri's to it; it will dereference them and get the TEI files (they should be in the same language), it generates the word index and outputs them as a tab delimited list of word - uri - xpath,word_number
      • wrote a find_charpos_by_word_number method in perl, that computes the character postion of the n-th word in element content, not counting the material in sub elements
      • translated this function into equivalent ruby (setting: Metafoor café, 60-70's music, a few trappisten-beers, a table with four hard working fellow bootcampers)
    • Minor changes to OAC server to support Moritz's and Jim's work
    • refactored constraints to be generic -- constraint type and constraint blob.
    • Worked on rendering of TEI files with XPath-constrained annotations into HTML

TEI -> RDF

  • Thu., Jan 12
    • We'll be extracting rdf relationships from tei encoded pages using xslt's. We've built a Drupal framework for demoing, but the transformations will be implementable in any language. We have some trivial example working now, details to be filled in over the next few days.
    • We've continued work on our data models. focusing on the header. Progress is being made with منتخب صوان الحكمة
  • Fri., Jan 13
    • Working on developing the XSLT knocked together yesterday, to pick out more of the useful TEI elements. Also have taken a sample TEI file (very simple and short version of منتخب صوان الحكمة) and specified the target output for the transform.
    • Next stage [after lunch] is to adapt the XSLT to produce the target output. Also, rather than concentrate on one TEI file produced by one person, we will look at customised TEI schemas. There is a TEI schema called "absolute-bare-bones" which is as minimal as TEI gets. Ideally our transform would be able to produce RDF for this schema.
    • End of day: XSLT from yesterday is very close to generating the target output from the test TEI file - it can extract all the information in the teiHeader to be stored as info about the document, using Dublin Core ontology, and the majority of information on resources within the document. Now we are looking at fully mapping a small subset of TEI to RDF equivalents. The subset is TEI-Bare[1], which is the minimal necessary subset of TEI. A demonstrative TEI-Bare test file has been generated. For tomorrow: finish XSLT of original test document, specify target output for TEI-Bare elements and run on test file (also look for other TEI-Bare files to test on, for demo).
  • Sat., Jan 14
    • Have got the test file working. Now looking at TEI-Bare. All kinds of bizarre edge-cases are causing complications but we've mapped each element in TEI-Bare to RDF using Dublin core vocabulary. Now in the process of ironing out problems caused by these edge-cases, for full robustness with TEI-Bare. Plan is ultimately to create an XSLT transformational tool gadget.

Annotating Client

Constraint Format Annotation Body Format

  • Thu., Jan. 12th
    • defining functional requirements
    • tests on how to achieve a pure client side javascript based solution
    • started implementing display of target resource
    • started implementing client/server interaction
    • fetching annotations
  • Fri., Jan. 13th
    • Render/Create annotations within the GWT client (Marco)
    • Create annotation bodies and render annotated spans in plaintext targets (Moritz)

Collation Interface

  • Thu., Jan 12
    • Toying with new front-end interface for CollateX. -- Doug
    • End of day-- Wasted much time looking for good JS-only version of FileUpload from form. Going with PHP now.

JS Exhibit