Würzburg2011/Live Minutes

From IntereditionWiki

Revision as of 12:29, 8 October 2011 by Grant.dickie (Talk | contribs)

Projects around the table

  • Interedition (of course)
  • Faust Digital Edition
  • TypeWrite
  • Wikipedia
  • eLaborate
  • Tile
  • Münster NGT Project
  • Annotation of Google Books data
  • T-PEN

Project Descriptions

Gregor Middell on goal of the Bootcamp

  • Interedition has one primary use case until now: CollateX
  • Perception that Interedidion = CollateX, but in fact it's about interoperability of tools for textual scholarship
  • Want to turn that a little by concentrating on a transcription/annotation use case too
  • So let's figure out what everybody is doing and see how we could connect
    • Federico: interesting topic is how projects could use each others' textual resources

Annotation of Google Books data

  • Use Case: Women Writers
  • Administration module for type of annotations
  • Possibility to select books of Google Corpus to 'selective/working library'
  • Have annotations as regions on images (idea is to deduct actual text from OCR)
    • Marco: how do you get to the books (no Google API)
      • Cesar: needed to be a little subversive to get to it, for text access is pretty random (it seems) and Google does not have a proper text service. Projec leaders eager to move therefore to a more open domain
    • Federico: format of annotations?
      • Cesar: annotations stored as a 'annotation Java object' into the Google DataStore
    • Marco: How about the limitations of Google's Google Java Programming Interface
      • Cesar: using 'Objectify'
    • Fede: object in Google Store is not very interoperable?
      • Cesar: we would provide a service to such annotations

Students crowd sourcing project Nachlass Franz Brümmer (Gregor Middell)

Nachlass Franz Brümmer (literary legacy of Franze Brümmer; whoiswho (aut)biographies of authors of German 19th century literature). Students are transcribing these. The tag other names and information in the biographies to create a network of biographic information. CKEditor, using HMTL for very simple annotation model. Z39.50 links to authority files for tagged names (to have a unambiguous reference for the network). LAMP, no formal text repository.

18th Connect::TypeWrite

(Nick Laiacona) Low quality 18th facsimiles, line by line transcription, crowed sourced on the basis of getting a free copy of the works edited.

Implementing button for crowd sources to report particular really bad/idiotic ocr'd pages: to improve automated learning approaches.

XML-TEI based information on the boxes/ares identifying lines. Very limited correction possibilities for wrongly recognized boxes.


Grant: going open to public? Interesting to get feedback on quality of OCR. (Nick: yes, but not live yet, beta November 2011, live probably spring 2012).


(Nick Laiacona) Transctiption tool for transcribing manuscripts of Herman Melville. (Image and area based, xml output in purpose built scheme.)


Huygens ING (Bram Buitendijk). Editor rights managed transcription annotation tool. Based on a hierarchy of pages. Categories of annotations. Arbitrary annotations on transcribed text. No intrinsic link between text and image. No image annotation.


(Marco Petris) Catma, a tool for the not so technical user. Desktop tool. Easy markup, indicated with colors (underlines, that may overlap). Contains a query tool to construct and run queries, results can be shown as frequency and distribution charts. Using TEI feature structure tags to express literary phenomena in text, because literary scholars found little tags of use to them.

Stand off markup, nasty bit is in the character offset within the range. Ranges pertain to in memory text (so that PDF, txt etc. can all be generated from that). No handling of changes in the base text yet, maybe by way of a proxy document (they don't want to touch the original text file).

Various application, e.g.: annotation/tag scheme for narrated time and narration time, extensive tag set for narratology analysis.

Open Annotation Collaboration

(J. Grant Dickie) Goal is to come up with an annotation model that works over domains and formats. MITH is working on an implementation for a streaming video annotation application. Like TILE drawing areas on frames and annotate these (but with metadata to indicate which frames are targeted). Based on MITHGrid by Jim Smith (also MITH): Datastore acting as an object repository, data store is client side, needed data is filtered out, client provides faceted browsing.

They are planning this for large scale data, but haven't tested that yet.

Münster NGT Project

Troy Griffits: "For hundreds of years people have tried to determine what the New Testament actually says". Project to get 1500 witnesses transcribed, aligned including indexed images etc.

Basically a pretty full fledged data model, with a 3 tier framework expressed in MySQL, Java, REST publishing to website, mobile data and .. open social (e.g.)

Don't want to redo the tools that exist, so wants to see what can be reused how. E.g. by using Open Social (approach).

Asaf concludes: so if we can agree on a publisher/subscriber vocabulary we can all use the same tools and any we like in any combination on our own data store. Like MicroServices it's implementation agnostic.


(I'm too fuzzy brained for a write up I'm afraid)

What would we want to do

  • Nick: Interested in Open Annotation and Open Social
  • Asaf: Interested in that as well; maybe we can have a concept implementation of one server with a text source and a completely separate resource for (its) annotations?
  • Bram: Very interested in the Open Social, how that could work for eLaborate e.g. But also in the Text Repositories model that Gregor put in the use case for the CFP.
  • Marco: Certainly interested in Open Social, but mostly in repositories (how to model objects that go into such repositories, and formats and data standards to be used; things underpinning Open Social i.e.)
  • Grant: Really into Open Social, wants to start hacking on that
  • Patrick: There's clear overlaps between the projects, but also much differences in actual implementation; maybe we should much less focus standardization of what we're doing, but more standardization of how we are approaching these task, so models for solving annotation/transcription needs.
  • Federico: interested in a list of all the things these transcription/annotation solutions have in common and whether we can come up with a more generic model.
  • Troy: certainly interested in a list of shared/common functionalities/features of transcription/annotation solutions. ("There's so much I want to steal form you guys."). Maybe we can see if we can componotize some of these tools/functionalities?
  • Joaquin/Cesar: interested in such, but can we also come up with a standard model for annotation
  • Gregor: put his wishes in the CFP, but Gregor doesn't want to force onto the group a priori solution to the textual model for transcription/annotation, as there are so many around.

From the Interedition perspective

What would be great is an implementation of any kind that shows that an OSDC (Open Source Development Community) is actually able to support the textual scholarship domain better in a common effort. An Open Social type of interface that is functional to a/any scholar and that is composed of several services created from existing components or tools that individuals in this group have already built, would really show the value of reuse and interoperability, and the point of creating sustainability within the tools, data and research and development communities that way. At the same time we show that institutional stakes are balanced by having the possibility to 'brand' solutions.

Break-off Groups

Annotation Group


  • Create a repository that accepts new annotations and returns a URI for an annotation

Specs of program: 1. Submit Anno - get URI 2. Submit content for Anno Bodies - get URI 3. Query a target URI and get a collection of annos 4. Dereference URI (Giving back appropriate RDF/JSON)

  • Figure out how to create/display/register constraints of a body or target

1. Given a URI, provide a checksum of the content to register what section is being linked 2. For text and images Front End Client + Sample Data for testing 1. Can query the service for Annotations and Render them 2. Supports usage of image and text mime-types 3. Sample data exercises different scenarios for usage (posting, querying, dereferencing, validations against the service)