Micro services are small cloud deployed web services supporting specific tasks in (any) digital work flow pertaining to larger scholarly tasks. They are the small reusable web published building blocks of digital scholarly tools. They provide the basic solution to interoperability and sustainability for digital scholarly tools Interedition is striving for.
In its simplest form a Micro Service
- is deployed as a cloud based application
- accepts a JSON formatted set of input data
- returns a JSON formatted set of output data
- adheres to the REST interface for request/response actions
- provides information on usage in response to a GET request for /doc
A Micro Service can be decorated to e.g.
- provide a (minimal) GUI (Graphical User Interface)
- provide multiple input/output formats (TEI, XML, txt as likely candidates)
- provide error messages in case of failure
Composite Micro Services
In all likelihood Micro Services will not be often used as stand alone services (though they may and can be). A usually pattern of usage will involve piping a JSON (or otherwise formatted) set of data through several Micro Services to generate a useful digital scholarly work flow. For example: the so called Gothenburg Model for collation defines a number of distinguishable steps in the process of textual collation of variant texts. These steps individually may be served by one or more micro services. Interedition as proof of concept provides now the prototype of a work flow for collation combined from several distinct microservices.
Gothenburg Model and Implementation
The Gothenburg Model defined any basic collation process as a chain of distinct steps in a work flow as shown in Figure 1. It's important to discern that this is a description of the functional steps in the work flow. But individual steps in the work flow can and may be served through a combination of multiple microservices.
Figure 1: Gothenburg model of collation
For example the tokenization process in the model above may be adapted to different contexts by putting in place different micro services. The implementation situation depicted below shows a tokenization process that comprises a plain text tokenizer and a normalizer that abstracts away from differences in spelling between tokens (i.e. semantic equivalents 'swert' and 'sward' are normalized to one token form). Similarly a morphological analyzer for classical Greek could replace the normalizer to adapt the process for Greek classical texts.
Figure 2: Middle Dutch specific implementation of the collation model
The above implementation is as proof of concept deployed in the cloud as an actual workflow. The plain text tokenizer is availabble at http://tlatokenizer.appspot.com; the fuzzy normalizer can be found at http://furious-wind-27.heroku.com/doc; the alignment service resides at http://gregor.middell.net/collatex/api/collate. These are all micro services with minimal or no GUI, expecting rather AJAX level programmatic interaction. For convenience we need an overarching interface to the workflow, combining access to the services on user level. The proof of concept of this is found at http://interedition-tools.appspot.com/
Not that reusable, interoperable and exchangeable micro services provide for networking work flows, benefiting many work flows from ready made micro services. Figure 3 demonstrates such.
Figure 3: Three work flows, utilizing networked micro services
Interedition has created a flexible and cooperative model that both serves institutional development and tends to the research needs of individual scholars. The cooperative model relies upon a microservice architecture to create large and specific workflows from very small components. This leverages the applicability and usefulness of tools developed within institutions, making them available to a wider audience of researchers. It also, and crucially, empowers individual researchers and developers to use and add to the existing cloud of microservices. The model also boosts the sustainability of the tools developed—microservices are much more easily maintained, scaled and replaced than large, complex, integrated applications. They are also easily replicated in multiple locations; this allows for resilience of the workflows that rely upon them, by allowing ‘fallbacks’ to identical services published elsewhere.
Also Microservices (and possibly Microrepositories) are cloud based solutions. We think cloud solutions are essential to digital solutions for the humanities, as opposed to grid solutions. Grid focuses on huge data capacity, storage and batched analysis operations. The key aspect of humanities research is real time interactive *engagement* with the research material and data. A way of working that is far more serviced by cloud computing (providing real time modest sized CPU and storage chunks, rather than huge but asynchronic storage and computing power).
Micro services are a good solution towards reusable components, bottom up tool development, sustainable and interoperable tools in textual scholarship. However the model is 'proof of concept' at best for now. And a number of problems and challenges is readily indentified:
- brokerage of services
- persistent identification of services
- hosting of services (or: business model)
- interaction model
We intend to present at least sensible approaches and principle solutions to these challenges in the final deliverable of this Action (the 'Roadmap')
- Development: Development
- CollateX: http://collatex.sourceforge.net/
- Normalizer: https://github.com/jorisvanzundert/fuzzy_normalizer
- Tokenizer: https://github.com/tla/Interedition