MinutesWG3Birmingham

From IntereditionWiki

NB Due to a flue Workgroup leader Fotis Jannidis could not attend the meeting. Susan Schreibman was found willing to stand in as chair of the meeting. Some preliminary remarks from Fotis where put at the agenda as 'item 3'.

  1. Opening
  2. The goal of the meeting is to collect information, ideas and to concentrate these into a 'draft' visionary write up: what would 'interoperability' mean or imply for digital editions?
  3. Using the classification of the paper Fotis Jannidis sent around the day prior to the meeting we could conclude we are looking for ways to make interoperability possible on various levels:

    • Syntactic interoperability
      This should be the ability of systems to process a syntax string and recognise it (and initiate actions) as an identifier even if more than one such syntax occurs in the systems. example: URI. This ties in with Peter Robinsons concept of UTI (Universal Text Identifier), with PID (Persistent Identifier) etc.
    • Semantic interoperability
      The ability of systems to determine if two identifiers denote precisely the same referent; and if not, how the two referents are related. Example: Ontologies like CIDOC.
    • Community interoperability
      The ability of systems to collaborate and communicate using identifiers whilst respecting any rights and restrictions on usage of data associated with those identifiers in the systems. Example: These three form dependent layers: community interoperability is only possible if semantic interoperability is ensured; semantic interoperability is only possible if syntactic interoperability is ensured.
    • Development interoperability
      Collaboration in developing the tools.

  4. A first step for the members of the Working Group 3 would be to determine what is already out there (given standards, formats, tools, frameworks etc.) to realise our ideas. This should be probably done in conjunction with Working Group 1. A next step would be to agree to focus on some central or prioritized ideas and to get some work done on them. The suggested evaluation and aggregation of Working Groupg 1's survey results is deemed not relevant at the moment (the survey didn't actually run yet). Working Group 4 will decide in its upcoming meeting on a more specific planning on project level.
  5. NOTE: Agenda items 5 "Discussion on methodological aspects of digital scholarly editing" and 6 "Discussion on conditions (like formats, standards, protocols) to be applied to prototypes and/or derived applications and products" are combined.

    Dino Buzetti wants to add that the interaction, collaborative and international aspect is of importance. This is agreed and it should be tended to within Working Group 1. Dino further thinks it's important to research also why the field is so slow to take up on making digital editions and using digital tools. Ideas may be found in papers of Peter, and of Tito Landi. One of the bigger problems is the processing of digital material - there seem to be not enough possibilities as of yet. From Dino's point of view the main question would be how to make texts processable.

    Andrea Scotti adds that many presentations about digital editions ignore the part of the machine, the technology aspect. The technical aspect should be part of any curriculum on (digital) editing as well. At least some knowledge of the technical aspect is needed to eventually diminisch the major differences between current projects. Peter thinks this brings in the problem of interoperability of cultures. Joris concludes that these aspects should be surveyed and research and that they should be put on the checklist for the Roadmap (Working Group 4 action).

    Andrea thinks the transparency of formats is essential. How would we otherwise go about representing datasets on the web, and how would we have other people processing them.

    Susan wants to know what kind of interoperability (syntactic, semantic and community interoperability) is the most important, and should be focused first?

    Joris wants to add that the process of producing any edition should become far more on line and collaborative: many more digital editions are made than are visible on the web. Projects terminate before publishing e.g., people shift, technology dies. If editions would be digital born as a principle, the danger of loosing information to a slowly dieing off line machine would be far less. Even more so if a redundancy promoting strategy would be applied to the problem of durability.

    James points out the problems of copyright and rights management, e.g. by libraries.

    Dino sketches his ideas about a feasible infrastructure. It should ensure an authentication system. Only when people can trustu there material to be safe (and private) the would want to make use of any infrastructure. Also it would need a sufficiently large group of scholars who want to work like this, to get it started.

    Peter states that libraries will keep on being the problem. Dino answers that this is precisely why Working Group 1 should approach them.

    Terje would want to have literary scholars also define what they aim for with a certain scholarly edition. What is the goal, intended use and intended audience for a digital edition. This migth have consequences also for what kind and form of output an edition project should has? This could also very well be a book rather than internet?

    James wonders about the applicability of tools. What kind of tools belong to a digital edition? Is a tool aligning passages from different editions of the same text a tool for digital editions? Susan answers that such issues come up often around scholarly editions. So should Working Group 3 include work on answering the question "What is a digital edition?" At least the roadmap should have some point of view on it, so yes. Federico points to the criteria Peter and Edward have devised. Edward will look them up.

    Susan wants to know if we could define what a prototype would need to adhere to to qualify in our own eyes. Keywords that we can come up with are:

    • Identifiable
    • Quotable
    • Processable
    • Interoperable
    • Reusable
    • Preservable

    Susan thinks some metadata description at some level is necessary. E.g. from a legacy point of view: "this is a representation of..."

    Peter adds that being 'critical' is key for a scholarly edition. Susan would rather opt that critical (or "scholarly" for that matter) are phases, different states of an edition. We should rather not decide allready that only one phase or stage is to be put forward as the primary form of a digital scholarly edition.

    Malte would want to leave the choice of a representation to the user. Does there need to be a clear distinction or separation between data and representation? Dino adds that this brings in the difficulty of visualization and multiple representations; do these need there own interoperable tools? If yes, should we also define that? Susan concludes that there is a clear need for flexibility of output. Federico wonders if this is in fact possibly a form of self management.

    Marin wonders if the Action and Roadmap should concern editions which are digitised, digital born editions or both. Joris thinks the same tools sould apply, so the history and source of production should be irrelevant. Susan objects that it is much more complicated than just plainly "the same tools apply". Existing texts and editions have their own history of production, versions and propagation. Joris admits that this is true, but fails to see how metadata could not cover these aspects.

    Dino wonders whether the aim of the Action eventually should be a modern Gutenberg project or that it should strive to enhance digital scholarship. Peter adds that ASCII is still the most interoperable form of text, so whatever we do we will in all likelihood do not much better than project Gutenberg.

    Federico asks if we should consider our relation to such projects as Google Books, Google Scholar etc. etc. Joris: we cannot compete with Google in any way for production, and we do not need or want to. Google's doing any document. We're certainly aiming at a scholar's niche as to content and we want to add interoperable functionality to the text far beyond Google's perception of 'making available'. If this would result in anything that would grow in applicability beyond our specific content focus, this would be a secondary benefit, but not a goal per se. This is difficult: how to precisely define what we're not doing?

    Susan wants to know if it is important to have metadata about the integrity, reliability and about the presumptions with, conditions and circumstances under which an edition was produced? James answers that this would at least be functional for e.g. a teacher to decide if the material is good enough to use in class. Susan adds that such things are dependent on trust. Joris remarks that he would not know how to implement a thing as trust on a technical level. Of course there's rights management and authentication, but formalizing the trust one could put to the scholarly quality of a digital edition... Susan points to the approach of Nines: we could give an edition some sort of Imprimatur. This would not be formalized in a technical sense, but in a social sense. A technical stamp as proof of quality by a certain governing body. Joris would in that case rather opt for a stamp of approval by the whole of the community, something like "Ten out of twelve registered scholarly editors reated this edition as excellent". Susan wonders about the problem of granularity: on what level do we want to give these evaluations. Joris adds that Peter wants such formal garantuees right down to the level of the inidividual word.

    Terje: how many different versions will be put online?

    Dino phrases his skeptism about the idea Fotis refers to that was put forward in an article the author of which wants to solve the problem of semantic interoperability with an ontology. Susan opposes that there could be levels of semantic interoperability that could work, e.g. to help find things, get started on a subject. Dino agrees, but that level is not enough for text analysis; in that case you want to know precisely what is there and what is the context. The abstraction level of an ontology is to coarse for semantic analysis. In any case Andrea thinks that it is technically impossible to do any usefull analysis based on ontologies. In all cases an explicit representation of the textual content is needed. Joris wonders what ontologies are actually used for. Andrea later will give some examples. Joris adds that he finds a lot of fixed ontologies, and what he would like to see is dynamic ontologies, generated from the material: bottom up ontologies rather than top down ontologies.

    Jacob asks if a digital edition should contain text. Susan sugests that the main formal qualifier should be: 'is it processable?'. But do we want to state that it must be textual? Several attendants do. Edward adds that the moment an editor includes textual mark-up the source becomes text.

    Marin Dacos has a short presentation on revues.org (http://cleo.cnrs.fr/index791.html / http://www.revues.org). Susan remarks that such an open environment could be used to put e.g. Word-editions in a more sustainable environment (Word is converted to TEI). James mentions the TEI initiative to produce a tool to convert Word to TEI as well (and the other way round). Should be available by the end of the year 2008. Attendants wish for a page on the wiki with links to interesting tools.

    Susan opens the discussion on what standards or schemas we could agree on. Peter suggests that Dublin Core could be used as an interchange format. Marin remarks the problematic fact that everybody uses Dublin Core differently. In this respect any (XML) Scheme only ads a layer of ambiguity and unclearness because different usage result in different meanings of (possibly the same) XML form.

    Susan wonders if we are at least agreed on XML? Malte says no: we do not know what the situation is at the end of this project, when we finalize the Roadmap, so we cannot a prior opt that XML is a prefered solution. Susan: on this item the prototyping Working Group 2 will probably have something to say. Peter gives a sneak preview: we might want not to prescribe certain standards, but in stead of this prescribe what we want (others) to be able to do with the code. Joris adds we may not ever agree on the definition of 'model', but we migh get agreement on how we want to model.

    Another problem is presented: named entities having different spellings or forms (e.g. mistakes, or other transcriptions from a certain language).

    Terje adds the problem of text structure to the list (i.e. how to cope with a lot off different structure, different granularity of structure, multiple hierarchies etc. etc.)

    James remarks that people may simply not be willing to do this (i.e. serve up interoperable texts). They may have (given their context) viable reasons to opt for non-interoperabel text. Susan answers that the description of rights management may be part of a solution. Edward adds that it's mainly a 'social thing', about getting the credits. Susan remarks that funders are starting to demand that research results will be available in open access. Edward answers, that may be so, but we still are obliged to publish editions on cd-roms because then they can have an ISBN, which makes sure the author/editor will get the credits according to now current academic rules. Joris suggestes that some of the Action's IT recommendations can state the necessity of solving these kind of social interoperability problems.

    Susan recaptures: the idea is that material will be out there on servers anywhere and that we are now thinking of ways how to get it from there for our own use?

    Susan: what we probably will not talk about is preservation. Do we need to, if we already talk about reusability? Maybe only when material is in danger of actually disappearing? Joris suggest that there are prefectly vaible technical solutions to the problem of durability that are self balancing, managing etc. For example redundancy reliant data (peer to peer) networks. Susan: so we need an extra requirement? Replicating incoming material? Because there are real dangers for materials to disappear. Dino states that we cannot take care of preservation directly. We should recommend the makers of digital editions what they should do for sustainability. Susan adds that for this reason also we must bring in libraries in this Action, because they are much better at preservation. (Federico: Manfred Thaller is working in this area.) Karina points out that identification is tied in with discovery and duarbility. There's lots of work (national and international) going on on unique identifiers, e.g. for now living authors. The same may be done eventually for geographical names and historical person and placenames. We will have to keep an eye on what's going on in that area. Andrea announces a new tool for object querying. He will publish this in coming November (2008). The code will be available through SourceForge. Federico adds that the Bibliotheca Europeana will also make their code available.

    Susan: Do we need to describe what we want for the protoyping Working Group 2? E.g. what kind of test edition, which language, etc.? There may be technical and political elements playing a role. James: Or a text that appears in multiple witnesses, to make it more complex. Joris: this item will have to be added to the agenda of Working Group 2's upcoming meeting. Also Working Group 2 will have to decide on the themes for the prototypes for the next few years. For the first year, we decided on collation. But the themes for the next and their succession in time will have to be decided on. It is unclear how we will have to decide on this: making a list, and send out an e-mail asking people to react with their suggestions or wishes? After a short discussion it is agreed that the attendants of the Working Group 2 meeting should make a list, end then decide on how to proceed.

    Ronald wants to add that he is not sure that developers need a sort of set of guidelines for working on the prototypes. They need to experiment in all possible freedom. Developers do need functional requirements: what do you as a user want to be able to do? Joris: So, let the programmers decide on the programming tools, languages etc. to use. The scholars will be responsible for giving the necessary functional requirements and information in the form of use cases etc. Federico wants to know if this isn't risking that the programmers develop something quite different from what the humanities experts want? Ronald answers that this is a real problem, but that there are several approaches that can be used to minimize the risks. Joris adds: these are for example automatic acceptance tests, and unit tests. Such test cover the use cases dictated by the scholars (and possibly users of the software). Such autmated test garantue the proper working of functionality, even over time as redevelopment takes place.

    Andrea asks whether we can upload documents on the website (wiki); Joris will see to it that (registered) users will be able to do so. Susan wants to put the list of topics on the website as well, with some key information about why they are interesting. Dino: perhaps someone could write a short text or statement on interoperability?

  6. (Combined with prior item)

  7. Actions

    • Letting Working Group 4 decide in its upcoming meeting on a more specific planning on project level. (Action Chair)
    • Assess in how far Working group 1 could also survey the existing or de facto standards applicable to interoperablility in the domain of Digitial Humanities. (Working Group 1 Leader)
    • Take libraries into survey, especially to the aspect of rights management.(Working Group 1 Leader)
    • Put the criteria a 'digital edition' should adhere to onto the Wiki (based on prior theoretical work of Peter Robinson and Edward Vanhoutte) (Edward Vanhoutte)
    • Put up a Wiki page for links to interesting tools. (Action Chair)
    • Put Wiki up and open for contribution by Action partners. (Action Chair)
    • Draft short text on what we mean by interoperability. (Action Chair)
    • List of relevant topics on the wiki and why they are interesting. (Action Chair, based on agenda items 5 & 6)

  8. Next meeting (defered to Working Group 4 meeting)
  9. No other business
  10. Closing

Summary of High Level Specifications for the System:

  • Data should be processable
  • Modeling data
  • Depository/ digital archiving & preservation services
  • collaboration with archives and libraries
  • take into account the born digital
  • flexibility of output (one master file with a variety of outputs such as a critical edition, a reading edition, etc.)
  • Editions should be interoperable
  • Data should be reusable
  • Data should be in formats that are preservable
  • Data should be quotable
  • Editions and objects in the edition should be separately identifiable/ traceable / contain within it information for giving credit to the creators of the resource
  • Editions should have permanent identifiers
  • Editions should have built into them features of versioning; should be self-regulating so that is clear when changes are made
  • Editions should be critical / scholarly
  • Editions / objects should contain documentation (metadata) about their creation
  • Editions should be of high quality and be trustable (they should be what they purport to be)
  • Editions should have integrity
  • Editions will need community approval (NINES model)
  • Tracability and trust should be at all levels, from the edition to the object
  • Ability to pull objects and fragments of objects in from outside sources and mash them with other objects from other sources
  • Conversion tools from Word and other programmes into XML
  • Searchability / Discoverability
  • Multiple visualisation layers
  • Collaborative editing environment
  • Analytical and linguistic tools
  • Issues of copyright and sharing must be accounted for
  • There should be a level of transparency in the editing process
  • Ontologies for discoverability
  • Ontologies should be self-creting

High level requirements, actions derived from 5 & 6:

  • We need a mechanism to retrieve the underlying xml file from an html file published on the web. The tools are out there (meta tag in html etc.) we just need to agree on using them.
  • We need a way to retrieve the licence the text is using.
  • We need a registry of services.
  • We need a common format for documenting the services (it seems the TAPOR project has done work to this aspect).
  • We need some way of storing information about named entities and an interface for retrieving (harvesting) them. (It's suggested that new TEI modules should be derived and used for persons. etc.)
  • We need attention for interaction, collaboration, international perspective (Working Group 1)
  • Research towards the problem of uptake of tools (Peter/Tito Landi)
  • [JZ20090611: Curriculum should contain introduction on digital editions and methodology. -> check list Roadmap, item to be addressed.]
  • [JZ20090209: Andrea suggest transparancy of format. I: But the one format solution maximizes the applicability of the data, whereas we cannot know what kind of (web)services future users need. So it might be more practical to wrap any format with a metadata wrapper that states what kind of format it contains and how the data may be processed.]
  • [JZ20090209: Difficult one. There all important and needed. But syntactic interoperablility ties in with identification. There's no interoperability without identification and discovery. So this at least shoudl be suggested/prototyped etc. Community is important, so that's second, semantics could be dealt with later on.]
  • [JZ20090309: Aspect of durability over P2P redundancy networks should be in Roadmap contents.]
  • [JZ20090309: Aspect of rights management should be in Roadmap contents.]
  • Authentication/Authorisation checklist Roadmap[JZ20090309: These are 2 aspects: authentication (which ties in with rights management) and community building; sufficiently large for uptake and impact] [supported by DCR]
  • libraries should be in
  • user aspect, goal purpose of edition (daniel, terje) [disadviced by DCR]
  • Peter and Edward: ask what digital editions requirements (presumably they have them

keywords for digital edition:

  • Identifiable
  • Quotable [JZ20090309 Is this nog the same as Identifiable?]
  • Processable
  • Interoperable
  • Reusable
  • Preservable (instead of the term "stable", which confuses; "self managing", "self versioning"?)
  • [JZ20090309 This might be part of a core statement/manifest.]
  • [JZ20090309 Susan, essentially on versioning Repeated point, metadata wrapper]
  • Phases, stages od text (critical, etc.) should be marked and registered (also in metadatawrapper?)
  • Separation of text, textmodel and visualization representation - leave representation/visualization to users
  • Physical text vs on line born editions (there differences of transmission, can this all be captured using the same tools?)
  • [JZ20090309 The (results of the) Action should strive to deliver tools that enhance scholarship, not just representation / dissemination of texts. But the enhacement cannot evolve without the representation of text.]
  • This is difficult: how to precisely define what we're not doing?
  • [JZ20090309 Multiple level or higly granular possibility of evaluation / imprimatur / endorsement; eg. on versions and fragments. Stream of publications will not be auditable even for a large community of peers. Autopeering, social peering. something like that.]
  • [JZ20090309: As things stand: all possible versions come on line self managing.]
  • [JZ20090309 This is an important aspect to address, adding metadata is possibly adding unclearness. things like TEI and Dublin Core are inherently ambigiously used. What valua has a standard (format) if it isn't stable?] ---> *[JZ20090309 This is a principle: state what functional requirements are (to be) covered, which use case are (to be) served. Formats and standards are far less interesting and possibly paralyzing.]
  • [JZ20090309 Is ambiguity and alternative spelling of Named Entities an interoperability inhibitor? It is a problem, agreed, but not one that we need to focus right away. It's not a major risk.]
  • [JZ20090309 Is a problem, but not one that is solvable by forbidding or endorsing (certain kinds of) structure. Better to accept and deal with the different kinds. TextGrid does by high level TEI compliance for example.]
  • [JZ20090309 reg. Open Access, ISBN needs, DRM, IPR, status and credits. Is not a problem Interedition can *solve*. Interedition may do something to identify this as a problem and may raise awareness. Interedition should create a proof of concept and proof of viability of the concepts, but it should refrain form esthablishing yet another endorsement of de facto standards.]
  • [JZ20090309: Given distributed materials and editions. Yes, this is discoverability, we would want to support that very much.]
  • [JZ20090309: Identification, durability and discovery; thesa are important issues that we should address. They're connected and intertwined to make matters complex. Luckily there seems to be a simple and resilient technique available that ties in discovery, identification and durability (redundancy reliant peer to peer, registries of data and services, distributed infrastructure). It seems that contrary to what's suggested by some of the group, there are quite feasible and reliable solutions to theses problems. Trouble: institutions and for sure libraries are *not* using these technigques.]
  • Put survey / ethnographic study on impact failure of digital tools and methodology on research 'todo' agenda. (Action Chair)
  • Put curriculum and culture aspect on the research 'todo' agenda; what should be a common level of thechnical knowledge for digital humanists? (Action Chair)

Agreemeents

  • [Agreement: let the programmers decide on the programming tools, languages etc. to use. The scholars will be responsible for giving the necessary functional requirements and information in the form of use cases etc.]