Difference between revisions of "TheHague112012Minutes"

From IntereditionWiki

m (Gregor's presentation)
Line 1: Line 1:
== 21/11/2012, Morning ==
+
== 21/11/2012 Morning ==
  
 
For names, refer to the [[TheHague112012Participants|list of participants]].
 
For names, refer to the [[TheHague112012Participants|list of participants]].
Line 8: Line 8:
 
* range based model (for annotations)
 
* range based model (for annotations)
 
* variant graph (for collation)
 
* variant graph (for collation)
** Tara is using (with Collatex) a graph-based model more complex sthan Schmidt's one.
+
** Tara is using (with [http://collatex.sourceforge.net/ CollateX]) a graph-based model more complex sthan Schmidt's one.
  
 
Paolo is interested in a text model that can represent textual layers (graphical, alphabetical, linguistic; see Orlandi, [http://books.google.it/books?id=GLzFSAAACAAJ Informatica testuale])
 
Paolo is interested in a text model that can represent textual layers (graphical, alphabetical, linguistic; see Orlandi, [http://books.google.it/books?id=GLzFSAAACAAJ Informatica testuale])
Line 33: Line 33:
 
*** How do you transform XML into range-model? Any element (with opening and closing tags) becomes a range
 
*** How do you transform XML into range-model? Any element (with opening and closing tags) becomes a range
  
=== Setting the agenda: a round-up on the interests of each of the participants ===
+
=== Setting the agenda ===
  
 +
A round-up on the interests of each of the participants:
 
* practicality: what can we build on top of e.g. a range-based model (from the datastore to the presentation layer)
 
* practicality: what can we build on top of e.g. a range-based model (from the datastore to the presentation layer)
 
* query/search functions on top of a text model
 
* query/search functions on top of a text model
Line 49: Line 50:
  
 
See [[TheHague112012DivisionOfLabour]]
 
See [[TheHague112012DivisionOfLabour]]
 +
 +
== 21/11/2012 Afternoon ==
 +
 +
=== Bootcamp's agenda and goals ===
 +
 +
Interoperability is best achieved through web services. The same does Gregor's Java system (that he's showcased this morning). Also [http://collatex.sourceforge.net/ CollateX] does so now.
 +
 +
[http://www.juxtasoftware.org/ Juxta] today shows its underlying range-based model only if you use the API. For the standard user using juxtacommons.org, it's just XML-in/XML-out.
 +
 +
Moritz and Gregor discuss the JavaScript querying prototype they already built for Java. Client-side or server-side?
 +
 +
Ronald: is a DSL (domain specific language) the way for us to go?
 +
 +
''Bootcamp's goals:''
 +
* We want to use the TEI palette (vocabulary) to build a non-XML range-model-based annotation of plain text
 +
* We want to build tools (a Doman Specific Languge) for texts annotated this way, so we foster the development of this kind of texts encoding/annotation model (alternative to XML)
 +
* Our goal is to make a console on a website, rather than simply an API
 +
** so the user can test the potential of a querying language that queries texts marked with a range-based model
 +
 +
The point of doing of this is:
 +
* overcoming overlapping
 +
* exploiting the recursive potential lf LMNL (annotating annotations)
 +
** there's a prototype (with a console) in http://www.piez.org/
 +
 +
We should create a web interface. Gregor sets out an example of its workflow:
 +
* the text repository: textrepo.net
 +
** POST
 +
** <xml/>
 +
* creation of a new text: textrepo.net/12
 +
** GET
 +
** PUT → 2201 creator
 +
* creation of an annotation to text 12: textrepo.net/13
 +
** (but bear in mind that  annotations are texts in their own respect)
 +
* a query on text 13: textrepo.net/13?q=(and...())
 +
* a query on the whole repository (all texts in the repository)
 +
** textrepo.net/?q=...
 +
 +
At Faust's project they're keying/transcribing a MS twice: one transcription for the diplomatic layer, one transcription for the 'linguistic' (regularised) layer.
 +
Issue: how do you align the two texts? They do it now by collating them.
 +
Open issue: how will they eventually store (e. g. in XML) this alignment? They've been collaborating with TEI SIG on genetic editions for this, but they have not yet reached a solution for such granular alignment (word-level granularity), so there is no solution yet as to how to store the alignment in XML/TEI. I propose one XML file for each text (encoded at 'w'/word level), and a third XML file with linking of words in XML transcription file 1 (diplomatic) and wors in XML transcription file 2 ('regularised').
 +
 +
=== Search for related projects ===
 +
 +
We look for other (open source) DH projects possibly doing what we want to do, so we can build on what they already have done.
 +
* [http://textusproject.org/ Textus]
 +
** connected with Europeana (RTF for cultural artifacts)
 +
** they make you do arbitrary annotation (through XPath-like-identified ranges) to any webpage
 +
* Is [http://standoffmarkup.org/ Standoff Markup Editor] doing what we want to do?
 +
* Marco is working on [http://www.catma.de/ CATMA]

Revision as of 12:36, 24 November 2012

21/11/2012 Morning

For names, refer to the list of participants.

Discussion on text modelling

Two main alternatives to XML:

  • range based model (for annotations)
  • variant graph (for collation)
    • Tara is using (with CollateX) a graph-based model more complex sthan Schmidt's one.

Paolo is interested in a text model that can represent textual layers (graphical, alphabetical, linguistic; see Orlandi, Informatica testuale)

  • Tara: we can stretch Schmidt's graph model to represent the different Orlandi's layers. They're still graphs
  • Gregor: variant graphs are not suitable for this; they're not variants. LMNL (range-based model), instead, seems to be fit for Orlandi's "textual layers".

Gregor's presentation

Gregor describes his implementation of a range-based (LMNL-based) textual model, used so far for the back-end of the Faust Project.

  • the implementation is in Java
  • annotations that have a name and are namespaced (this comes from XML)
  • text is a sequence of events
  • layers have names like "TEIW" or "Europeana" and contain texts
  • layers have anchors, so one text can point to another or to ranges of another text
  • one pointer can point to multiple anchors. E. g. layer 'alignment' points to two different anchors (and aligns them)
  • layers can include whatever data (Json, an XML file etc.)
  • a TextRepository is just a collection of those layers. It's something I can query
  • you can create a graph of the layers existing in a text repository
  • TextStream
    • Gregor's model is differnt that XML
      • the SAX API works with XML trees
      • you can walk through the tree there
      • a range-based model, isntead does not have such easy stacks
      • How do you transform XML into range-model? Any element (with opening and closing tags) becomes a range

Setting the agenda

A round-up on the interests of each of the participants:

  • practicality: what can we build on top of e.g. a range-based model (from the datastore to the presentation layer)
  • query/search functions on top of a text model
  • variant graph vs. range-based models
  • processing (equivalent to XSLT?), querying (equivalent to XPath/XQuery)
  • variant graph: traversal patterns?
  • interfaces, APIs, JS libraries
  • problem of variation and how it is handled on different (conceptual) layers of a text
  • common model? can we find a generalized model incorporating features from the variant graph and a range-based model
  • bridge the gap!
  • integration scenarios; import of existing data, multiple use cases on top of those (what is the smallest thing that could possibly work? -- how do we get there)

Division of labour

See TheHague112012DivisionOfLabour

21/11/2012 Afternoon

Bootcamp's agenda and goals

Interoperability is best achieved through web services. The same does Gregor's Java system (that he's showcased this morning). Also CollateX does so now.

Juxta today shows its underlying range-based model only if you use the API. For the standard user using juxtacommons.org, it's just XML-in/XML-out.

Moritz and Gregor discuss the JavaScript querying prototype they already built for Java. Client-side or server-side?

Ronald: is a DSL (domain specific language) the way for us to go?

Bootcamp's goals:

  • We want to use the TEI palette (vocabulary) to build a non-XML range-model-based annotation of plain text
  • We want to build tools (a Doman Specific Languge) for texts annotated this way, so we foster the development of this kind of texts encoding/annotation model (alternative to XML)
  • Our goal is to make a console on a website, rather than simply an API
    • so the user can test the potential of a querying language that queries texts marked with a range-based model

The point of doing of this is:

  • overcoming overlapping
  • exploiting the recursive potential lf LMNL (annotating annotations)

We should create a web interface. Gregor sets out an example of its workflow:

  • the text repository: textrepo.net
    • POST
    • <xml/>
  • creation of a new text: textrepo.net/12
    • GET
    • PUT → 2201 creator
  • creation of an annotation to text 12: textrepo.net/13
    • (but bear in mind that annotations are texts in their own respect)
  • a query on text 13: textrepo.net/13?q=(and...())
  • a query on the whole repository (all texts in the repository)
    • textrepo.net/?q=...

At Faust's project they're keying/transcribing a MS twice: one transcription for the diplomatic layer, one transcription for the 'linguistic' (regularised) layer. Issue: how do you align the two texts? They do it now by collating them. Open issue: how will they eventually store (e. g. in XML) this alignment? They've been collaborating with TEI SIG on genetic editions for this, but they have not yet reached a solution for such granular alignment (word-level granularity), so there is no solution yet as to how to store the alignment in XML/TEI. I propose one XML file for each text (encoded at 'w'/word level), and a third XML file with linking of words in XML transcription file 1 (diplomatic) and wors in XML transcription file 2 ('regularised').

Search for related projects

We look for other (open source) DH projects possibly doing what we want to do, so we can build on what they already have done.

  • Textus
    • connected with Europeana (RTF for cultural artifacts)
    • they make you do arbitrary annotation (through XPath-like-identified ranges) to any webpage
  • Is Standoff Markup Editor doing what we want to do?
  • Marco is working on CATMA