The READ research environment for Indic texts

sujato · June 23, 2017, 12:11am

Mark Allon and Ian McCrabb at Sydney Uni, together with Andrew Glass and others, have been involved in developing a text system, initially for Gandhari texts, which is called READ (which has to be one of the most comprehensively ungoogleable names ever!) I’m not aware of any live instances of it, I think it is still in development. It will be used as the new back end for Gandhari.org, and probably elsewhere, too. News about the project is found through their blog

But we can keep up with progress here.

Here is a description of the project from current presentations.

READ and READ Workbench together provide an integrated research environment, publishing platform and corpus development framework for ancient Sanskrit and Prakrit texts; a model that can be expanded to other writing systems.

Rationale: The READ project commenced in 2013 with funding from a consortium consisting of the University of Munich (LMU), Germany, the University of Washington (UW), Seattle, the University of Lausanne, Switzerland, the University of Sydney (USYD) and Prakaś Foundation, Sydney. These Universities are all engaged in the study and publication of ancient Buddhist documents preserved in the Gāndhārī language that originate from Afghanistan and Pakistan. Academic lead for the project is Stefan Baums (LMU) and the development team comprises Andrew Glass from Microsoft as software architect, Stephen White (ex Microsoft and USYD) as system developer and Ian McCrabb as analyst/designer and project manager (USYD).

READ is the result of the convergence of two streams; the work of Baums and Glass on gandhari.org and data modelling undertaken in support of McCrabb’s PhD dissertation at USYD. The project brief for READ was to develop a comprehensive research environment and publishing platform to support the transcription, translation and analysis of ancient Sanskrit and Prakrit texts: manuscripts, inscriptions, coins and other documents. A critical element of the brief was that READ be based on open source software, support the TEI standard and provide an API for integration with related systems.

READ is complementary to existing textual repositories and integrated with existing dictionaries. Whatever format existing transcriptions were developed in these can be consumed, elaborated upon, analyzed, and then published as research output in TEI. The data remains open source and can be exported as a full XML archive. In summary, READ has been designed to function as:

a linked repository of images, transcriptions, translations, metadata, and annotations;

a content management system encompassing multi-user editing, maintenance and version control;
a collaboration platform with comprehensive access and visibility control;
a research environment with access to a dictionary, catalog of texts, glossaries and bibliographies;
a publishing platform for individual transcription renditions or full scholarly editions;
the kernel of an integrated research network interfacing with GIS, data visualization and image analysis systems.

AndyL · June 23, 2017, 12:44am

That’s really interesting. The way it’s described makes it sound like a thing that exists, but I sure can’t find anything about it.

sujato · June 23, 2017, 12:50am

Yes, I too have little info. I’ve added some links above. Presumably they’ll update their blog when they’re ready.

Gabriel · June 23, 2017, 4:47am

which gives me an idea for my new spiritual project: AND (Advanced Non-Duality)

sujato · June 23, 2017, 8:28am

That sounds like it would go well with my Buddhist Unified Texts project: BUT.

stfnbms · February 7, 2018, 11:33pm

Just a few clarifications. Development of the ‘Research Environment for Ancient Documents’ software is primarily funded by the Bavarian Academy of Sciences and Humanities, through the Buddhist Manuscripts from Gandhāra project

http://www.gandhara.indologie.lmu.de

The programming is done by our colleague Stephen White in Venice. The initial feature design of READ was my own as part of planning this long‐term research project, and I would also like to mention my Munich colleague Andrea Schlosser who made many important contributions to UI design and debugging. We make the source code of the software available on the following GitHub page:

https://github.com/readsoftware/read

which also contains further description of the project and some screenshots of it being used by us in the Bavarian Academy project to edit Gāndhārī manuscripts. We now also use READ to present editorial work in progress on our ‘workshop’ page, under the above URL.

Since we make the software available under an open‐source license, it is also being used and modified by a number of other researchers in their own separate projects (including at the University of Sydney and several other institutions).

Andrew Glass and I do plan to use the READ software to improve the presentation of our complete corpus of Gāndhārī texts on my website Gandhari org, as described in my blog post that you link to. Overall, however, I intend to keep the look of Gandhari org very close to what it is now and what people have become accustomed to.

Last not least, since I came up with the name of this software back in 2013, a brief defense of it: The full name is of course ‘Research Environment for Ancient Documents,’ which is fairly unique and googlable. The acronym READ, on the other hand, I never intended to be unique (and in fact it is not), only to provide a handy abbreviation in running text.

Thank you for your interest in our software. Please do check it out in action at the website of the Bavarian Academy project, and of course feel free to download a copy of the source code from our GitHub repository (though you may want to wait until we have written proper documentation and installation instructions).

sujato · February 8, 2018, 12:26am

Thanks so much for the clarification and update. And thank you to yourself and all who are working on the Gandhari texts. I know it is painstaking and difficult work.

The web interface for READ is great. Just to note, though, for some reason the first example (British‐Library‐Sammlung Fragment 4) keeps crashing my browser. The others are fine.

May I make one request? I use the dictionaries often, and it would be really nice to have a proper URL for each entry. It’s awkward to not be able to share the results for a specific word.

stfnbms · February 8, 2018, 1:03am

Thank you for your thanks! Yes, this material is a lot of work, but quite satisfying when things come together. I also hope to make things a bit easier for the next generation with the Dictionary etc. that I write with Andrew and our complete source corpus.

Viewing BL 4 on our Munich server works fine here. Maybe it has to do with the browser you use? READ at this point is only tested against Firefox, but also works in Chrome by and large. This is not ideal, obviously, since many web users on Windows and MacOS have other browsers, and will be addressed when we have time. My main focus right now though is working on TEI support with Steve, so we can use READ in combination with other software and reduce the idiosyncracies of its data model.

Or maybe it is a memory issue. I use one single, very large image in my presentation of BL 4, where the other texts (by Andrea) are subdivided into sections.

Concerning links to articles in Andrew’s and my Dictionary, this is on the radar, and I am also trying to coordinate this with colleagues here in Germany who maintain other relevant dictionary collections. We had a way already to refer to articles in our Dictionary of Gāndhārī, but it is currently broken and on the list of things to fix.

Also keep in mind, however, that even after sixteen years of work, Andrew’s and my Dictionary is still very much work in progress. We do not even guarantee that the spelling of headwords will not change (since we use an artificially standardized orthography for those that I am still refining). The safest way to refer to a word for now might be to use its actual spelling in a given text (which is searchable in our Dictionary, as I explain in the Preface).

sujato · February 8, 2018, 2:23am

Oh, absolutely, these kind of resources remain useful for a long time.

Just to confirm, Firefox does indeed work fine. The crash occurs using Chrome 64 on Ubuntu. I agree, it seems to have something to do with the large image loading; it seems to churn, get slow, and then freeze.

Excellent, thanks.

I understand. In dictionary work, as in the Dhamma, there is gradual practice, gradual progress with no sudden penetration to final truth!