User:Dirk Roorda

From IntereditionWiki

Revision as of 09:24, 11 January 2012 by Dirk Roorda (Talk | contribs)

I work for Data Archiving and Networked Services (DANS) in the Netherlands. Driven by data, DANS ensures that access to digital research data keeps improving, through its services and by taking part in national & international projects and networks. DANS is an institute of the Royal Netherlands Academy of Arts and Sciences (KNAW) and the Netherlands Organisation for Scientific Research (NWO).

My home page at DANS

Dirk Roorda Phone: +31 6 13665023 Email:

Introductory for Bootcamp Leuven 2012

  • what you are working on
    Archiving data in general, processes for ingest, standards for metadata, ways of disseminating data. In particular: textual data.
  • what you hope to achieve in the bootcamp
    We have a dataset, a linguistical representation of the Hebrew Old Testament, query-able with a special front-end. I am looking for ways to make this data available for the long-term. Can we export the data to RDF and do meaningful things with it?
  • the sort of data you use in your work
    Recently we had a primitive digitisation of the correspondence of Descartes, which had to be converted to TEI. There are mathematical formulas in it. I did it using Perl and TeX.
  • the sort of data you produce in your work
    We transform and curate data.
  • how you might envision your tool(s) hooking together with other tools
    My main question is: if you have a specialised tool to do text analysis, how do you preserve the results of your analysis in such a way that it can be redone, even if you cannot preserve the original analysis tool?
  • what sort of text analysis you are interested in
    All kinds, and then: nothing in particular. It is the long-term archiving and use that I am interested in.
  • how you envision tools working together to achieve this
    Analysis tools are optimised to bring lots of data together and perform intensive and intricate computation on them.
    A long-term solution should not try to do the computation. It should store the results of the computations that have been done, and interlink them with the source material.
    So I am looking for efficient ways to store and link large portions of computation results.
    Search indexes are a good example.
    And RDF might be a good vehicle.