Ven. Anandajoti made me aware of this very interesting project:
(I know, the visualization of the tables is dreadful!)
The Github repository for it is here:
This repository contains the code and input data for the calculation of possible quotations and similar passages within the gretil corpus based on SIF-weighted averages of word vectors. The output is both a set of tables as well as the visual representation above.
We can do the same thing for pali texts and possibly between pali and sanskrit texts. It would show a whole lot of possible parallels and connections between suttas.
This will be of substantial benefit to parallels-research. There is only one very big catch: it needs a whole lot of computer power, far more than I have. I can probably adapt the program to work with the pali sources, but running it would be a very different matter.
So I guess the first step to do here is: visit this guy in Hamburg
Another very interesting project this guy has written is an allignment between sanskrit and tibetan texts: