Buddhavacana Citations, Nelson's "cut-and-paste" & SuttaCentral functionality

SuttaCentral has its “own” (?) style of citations (which I quite enjoy), which are based on another system, which is at least similar to what is used at AccessToInsight (I am really showing off my Pāli ignorance here, as I am sure the “MN 7 style” of citation has a name, I am just unsure of it).

In addition to this, there is an older style used by the Pāli Text Society (?) that looks like: DN ii 72.

In addition to these two systems, there is I think a third, that is more precise, in that it involves citation of the lines of a sutta themselves (similar to how one cites Shakespeare: IV.ii.56–57, to the exact line).

Does such a system exist, and does SuttaCentral make use of it, and if such a system exists in standardized format, does SuttaCentral eventually plan to make use of such precise citations?

1 Like

Yes, as you said there are a number of styles. The PTS style, based on the volume and page number of a specific—old, outdated, and often not very good—paper edition is still, unfortunately, the norm for citation. It is an embarrassment to the field and should be dispensed with in all cases. What, I have opinions!

Clearly in texts such as these, which exist and have always existed in multiple recensions, any sane referencing system must be based on meaningful semantic divisions in the texts themselves, rather than the arbitrary constraints of a specific publication.

The PTS made some efforts to introduce such a system, notably in DN and the Vinaya, but it was never applied consistently. Ven Nyanamoli developed a system for MN, which was retained in Ven Bodhi’s edition. However, these are still not very granular.

Since SC is based fundamentally on the notion of sutta parallels, we based our IDs on the idea of a “sutta”. This is fairly obvious in many cases, such as the Majjhima, but less so elsewhere. Is the Dhammapada, for example, a “sutta”? To a certain degree, such choices must be arbitrary.

  • Generally speaking, the numbering we use for Pali is based on our Pali text, the Mahasangiti edition, although adjusted in the cases of AN and SN to agree with Ven Bodhi’s edition.
  • The numbers for Chinese texts, based on the Taisho edition, are mostly either found in the texts or inferred.
  • For Tibetan texts we rely on the Derge edition.
  • For Sanskritic texts, our numbering system is essentially an arbitrary list, as these texts are not part of any organized collection.

As for more precise “chapter and verse” citations, these do not exist in any edition. In the Taisho, as you know, precise citation is possible, but only based on line number, not on the semantic divisions of the text (sentences, etc.)

Ideally a system would be based on the smallest meaningful textual element; either a simple sentence, or in the case of more complex sentences, a clause.

Our new version will introduce such a system for the first time, beginning with the 4 nikayas and the Vinaya. Each text will be divided into semantic segments, each of which will have an addressable and stable ID number.

These IDs will not replace older systems, but will add granularity. So, for example, MN 8#3 will still refer to the same thing as it does now, based on Ven Bodhi’s edition. However, you will also be able to specify MN 8#3.4, i.e. the fourth segment in the third section of the eighth discourse in the Majjhima Nikaya.

While our main focus will be on developing such a system for the core EBTs, ultimately we would like to have it for all our texts.

1 Like

[quote=“sujato, post:2, topic:5760”]
Ideally a system would be based on the smallest meaningful textual element; either a simple sentence, or in the case of more complex sentences, a clause.
[/quote]Does this imply that whenever someone types Buddhavacana, even in passing such as references to definitions of right view, it will appear as a link, à la MN 7 etc, like this:

[quote]Pretend I am just typing any text in any post here and then suddenly I reference that sometimes right view that is affected by taints and then continue talking[/quote]Is this the dynamic system you are describing, which would link right to what was referenced for original context?

I’m not sure exactly what you mean. If you are asking, will our reference system here on Discourse be updated to recognize the numbers of these segments and send you straight to them, then yes, that’s the plan.

If you’re asking, will writing a piece of text do the same thing, then no: text is not unique and can’t be addressed this way.

[quote=“sujato, post:4, topic:5760”]
If you’re asking, will writing a piece of text do the same thing, then no: text is not unique and can’t be addressed this way.
[/quote]I figured it would be hard to implement, but what you were describing (or rather my misunderstanding of you) reminded me of Werner Herzog’s interview with Ted Nelson from Herzog’s recent work Lo and Behold, Reveries of the Connected World, which concerns itself with the origins and development of the internet itself.

I don’t assume that the TV viewing habits of a practicing monk are at all conducive to having necessarily encountered this newer documentary film, but when you said:[quote=sujato]
Ideally a system would be based on the smallest meaningful textual element; either a simple sentence, or in the case of more complex sentences, a clause.
[/quote]I got to thinking that perhaps you happen to be familiar with Ted Nelson’s work, which predates Herzog’s newer documentary considerably.

He (Nelson) describes the “original intention” behind cut-and-paste functionality on the proto-internet (a system of networked computers he was experimenting with in the 1960s) in a very vague and maximalist maverick manner to be expected from a man from his time, but also in a very interesting way, even if it is a little vague.

I don’t claim to completely understand the ramifications of Nelson’s ideas, or how they would ever be practically implemented, but Nelson describes a “cut-and-paste” where any text is traceable to its (internet) source immediately by selecting that text, any text at all. You really would have to see the functionality to get what I am trying to say, but I will try to explain it as it appears in Nelson’s demonstrations of it in the film.

Nelson’s idea/hope/dream seems to be that the internet would be a contextualization engine (as opposed to the decontextualization engine it frequently becomes, being, at times, an incomprehensible “sea of data” as it were, ever difficult to navigate).

Any text would be traceable to its initial appearance on the internet immediately (why on earth he calls this connectivity-functionality “cut-and-paste” I do not know, I assume it is a holdover from terminology popular in the 60s). It goes deeper than this, however I know almost nothing about computers, and even less about the internet, still Nelson’s focus on text specifically, and contextualizing text, might be of interest (although possibly only as theory) to the aims of SuttaCentral.

Please do not feel the need to read any more if I am wasting your time. This is Herzog interviewing Nelson concerning his ideas about the implementation of the internet from 11:01 from Lo and Behold, Reveries of the Connected World:[quote]Herzog: [narration] Back to early times of speculative concepts of a connected world in the early 60s, many years before the first Apple personal computer, a young thinker, Ted Nelson, had his own ideas about creating a computer network.

The web, as we know it, took a different route, but Nelson’s ideas are still dormant.

Nelson: It was an experience of water and interconnection. I was with my parents in a rowboat in Chicago, so I must have been five years old. I was trailing my hand through the water, and I thought about how the water was moving around my fingers, opening on one side and closing on the other, and that changing system of relationships, where everything was kind of similar, kind of the same, and yet different, that was so difficult to visualize and express, and just generalizing that to the entire universe that the world is a system of ever-changing relationships and structures struck me as a vast truth, which it is, so expressing that interconnection has been the centre of all of my thinking, and all of my computer work has been about expressing and representing and showing the interconnection among writings especially.

And writing is the process of reducing a tapestry of interconnection to a narrow sequence. This is, in a sense, illicit. This is a wrongful compression of what should spread out. In today’s computers they have betrayed that because there’s no system for decent cut-and-paste, and they’ve changed the meaning of the words “cut-and-paste”, and pretended it was the same thing. So a guy named Larry Tesler, who I consider to be a good friend, nevertheless changed those words and I consider that to be a crime against humanity and he doesn’t understand why, because humanity has no decent writing tools.

In any case, this is the problem: interconnection and representation and sequentialization all [are] similar to the issue of water.

[the film jumps ahead (cutting Nelson off, oddly enough) to show Nelson demonstrating “authentic” cut-and-paste]

Nelson: So here we have a parallel presentation that shows the quotation connected to its original context: “In the beginning God created heaven and earth,” and where is that from? [the entire computer screen (I mean window?) moves to the left to reveal a screen behind it, with selected text that matches the selected text on the previous screen] That is from the King James Bible.

So we can step down to the next quotation: “Adam and Lilith began to fight,” and that is from The Alphabet of ben Sira. And so as we pull back we can see successive pages coming up the connect with their sources [he shows a complicated series of text boxes floating behind eachother] or with their linked contents.

Herzog: [voice-over] His vision of links never materialized.[/quote]It would seem that he imagined the entire internet as completely interlinked, as in, every single instance of quotations of a source text would be reachable from said source text and vice-versa, so that, theoretically, nothing could ever be out of context, because context would be rendered “super searchable”, it seems.

I do not know if this idea of functionality would ever appeal to SuttaCentral in addressing Buddhavacana (which is something intertextual and interconnected in-and-of itself), but that is what I thought you were talking about before, if that help to contextualize.

EDIT: I can see Nelson’s ideas potentially having merit in the arrangement of intertextual parallels, specifically, on SuttaCentral.

PS If interested, here is a probably-better explanation of Nelson’s work: https://qz.com/778747/an-early-internet-pioneer-says-the-construction-of-the-web-is-crippling-our-thinking/, where he is given more time to explain himself thoroughly than what Herzog allows for.

1 Like

From the linked article, contextualizing Nelson’s strange usage of “cut-and-paste”:[quote]Nelson, who was featured in Werner Herzog’s latest film, Lo and Behold, believes that instead of the existing formats we use online, where text often mirrors the constraints of paper, we should have a system of two-way links that would allow readers to see the context of any quotation. It’s a complex idea, best explained by Nelson in the video below: [video here]

There are a few offline examples, such as the Talmud and the Rosetta Stone, where text is read side-by-side. Nelson believes this is how online documents should be constructed.

“As far as I’m concerned, this is the way literature should develop,” he says. “I don’t consider this technology, I think it’s literature. Being able to see visible connections between pages seems to me absolutely fundamental.”

Nelsons says this setup would be the ideal format for reading annotations, additional details, correspondence, and disagreements: “It’s essentially a different genre of writing.”

As Nelson sees it, our current use of online documents is very limiting. He’s particularly disturbed by how we use the words cut and paste. When the Macintosh was introduced in 1984, cut came to mean “hide this piece that I’ve just marked in an invisible place,” and paste became “plug whatever’s in this invisible place to where I’m pointing.”

“To me that was an outrage because no one has yet got a decent re-arrangement system that allows you to see the all the parts of the arrangement as you’re writing,” Nelson says. “Those words meant something entirely different until 1984. Balzac, the French novelist, carried a razor blade around his neck for cutting up his manuscript. Tolstoy would cut up his manuscripts and leave all the pieces around the floor. This is true cut-and-paste, where you’re re-arranging on a large scale and able to see the relationships between parts.”

Ironically, Nelson is friends with Larry Tesler—the man responsible for our modern use of cut and paste—but still calls their mislabeling “a crime against humanity.”

Our ability to see connections between texts, with free movement of non-sequential text, would be transformative. Nelson hopes that one day, his vision for a document that allows such relationships between texts will be realized, and eventually commonplace. “There are precedents throughout the literary world,” he says. “But the notion of making it a fundamental format seems to have been left to me.”[/quote]

O! I love project Xanadu, it’s an amazing idea by an amazing man. And i have indeed thought of it in our context. Problem is, of course, it doesn’t work. But you can play with a demo here:


Note the way they code their text references:

span: http://hyperland.com/xuCambDemo/WelcXu-D1y,start=590,length=41

So you have a URL to the document, then define a span more precisely inside that document by noting the number of characters before the span begins, and its length. This is essentially the approach of “standoff properties”, which you can read a number of discussions about here. As we evolve, we are moving closer to using standoff properties, which has the great advantage of decluttering the base text. One of the core problems of this approach is how to deal with fluidity. If the text changes even by a single character, you’re stuffed. This is where the segments come in. By dividing the text into much smaller pieces, any change in a segment will only affect that segment, which is relatively easy to calculate. We can then further specify text within the segment, if we wish, through counting character offsets. this can be used, for example, for variant readings, notes, and so on.

As far as transcluding text goes, this could be done on the basis of ID numbers. You could mention here, for example, MN 6#5.8, and the text for that could be drawn in. This is not too dissimilar to what Discourse does already with Oneboxes. It could even be internationalized, so you see the quote in your preferred language!

But you couldn’t do it with text as such, since, like I said, it is (often) not unique, so it doesn’t go back to one source.

1 Like