Stefan Karpik on Pali

Stefan Karpik has just published a follow-up to his 2019 paper on the history of Pali

Karpik, Stefan. (2023). “Light on Epigraphic Pali: More on the Buddha Teaching in Pali.” JOCBS 23: 41–89.

I wonder what the Pali-wallahs make of his linguistic arguments regarding Pali, Magadhi, and the Asoka inscriptions. My sense is that he’s onto something in terms of the language of the Pali texts, but that his pitching it terms of “the language the Buddha spoke”, will more or less ensure that he is not taken seriously by professional scholars (outside of the OCBS anyway).

1 Like

Thanks for sharing; I was awaiting further work of his. I am sure Bryan Levman (or someone else) is going to respond one way or the other.

[…] his pitching it terms of “the language the Buddha spoke”, will more or less ensure that he is not taken seriously by professional scholars […].

Let’s await his argumentation. There are and always have been reputable scholars in the field who argued that Pāḷi was spoken by the Buddha. I have summarized some of the main scholarly opinions in the introduction to my Pāḷi grammar ((PDF) Māgadhabhāsā (Pāḷi) – A Compendious Grammar on the Language of Pāḷi Buddhism (Second Edition, Revised) | Ṭhānuttamo Anāgārika -; here the relevant portion:

Pāḷi – What is it?

The Handbook of Comparative and Historical Indo-European Linguistics (Klein et al., 2017: 318) states: “It is generally accepted that Pāli as known from the Theravāda texts was a lingua franca, not a single individual language particular to one dialect area.” However, the scholarly discussions on the subject matter that have been consulted are of course somewhat more nuanced than that generalizing statement in its depiction of the status quo. They state, more specifically, that Pāḷi is either (a) some form of either a lingua franca, [f.n. 1] koine [f.n. 2] or standard dialect (Geiger, 1916/1956: 4–6; Karpik, 2019: 67; Oberlies; 2007: 183; Roth, 1980: 78; Wynne, 2019: 9–10), (b) some form of a vernacular (Childers, 1875: xiv; Roth, 1980: 78; Warder, 1970/2000: 294) or (c) based upon one of these (Levman, 2019: 64–5, n. 1; Lüders, as quoted by Waldschmidt in Lüders, 1954: 8; Norman, 1980: 66; Rhys Davids, 1911: 53–4). There is also a dissensus as to the question if Pāḷi predominantly constitutes an artificially crafted language (Gombrich, 2018: 84–5; [f.n. 3] Norman: 65; von Hinüber, 1996: 5 [f.n. 4]) or had developed mainly by natural means (Pischel, 1957: 5). It also has to be noted that the first-mentioned views under (a) above premise some actually spoken basis underlying the Pāḷi language, having been significantly morphed or superseded by contrived structures in the course of time – at least in part – and that the second-mentioned view does not assume that the language was safe from any form of change as it relates to redaction, transmission errors etc. Not one text-critically involved scholar, as far as I am aware of, is of the opinion that the Pāḷi as we know it has undergone no changes whatsoever.

The above-presented traditional accounts, reporting the language as found in the texts of the Pāḷi Buddhist tradition to be māgadhabhāsā etc., are by and large considered incorrect by modern scholars. They adduce, inter alia, the peculiar features of the Māgadhī dialect proper as inferred from the Aśokan inscriptions and the medieval descriptions of it by the Indian grammarians and determined these features to be (a) l instead of r (e.g. lājarāja), (b) a-stems in e for o (e.g. lājerājo) and (c) palatal ś for dental s. However, based upon inscriptional and other evidence, Norman (1980: 68–9) demonstrated that these features were found merely within a relatively restricted area and that it is feasible to regard the home of Pāḷi as being outside the region where the true Māgadhī was spoken but still within Magadha, somewhat in the center of the east-Indian region, not far from Kaliṅga. He considers it feasible that Māgadhī – as depicted within the aṭṭhakathā tradition as the language of the tipiṭaka – is a variant of the Māgadhī dialect proper and that the Buddhist tradition can thus be correct. To similar conclusions came already Winternitz (1908/1981: 40), seeing the Māgadhī dialect proper at the base of Pāḷi, and Geiger (1916/1956: 4), to quote the latter:

“A consensus of opinion regarding the home of the dialect on which Pāli is based has therefore not been achieved. Windisch therefore falls back on the old tradition—and I am also inclined to do the same—according to which Pāli should be regarded as a form of Māgadhī, the language in which Buddha himself had preached. What emerges from the above is that the traditional narrative should not be and has not been dismissed outright.”

[…] Surely, Geiger (1916/1956: 4–6) must have based his deliberations to some extent upon the exegeses of the aṭṭhakathā, ṭīkā and grammatical traditions showcased throughout this section when he wrote:

“[…] Pāli should be regarded as a form of Māgadhī […]. Such a lingua franca naturally contained elements of all the dialects […]. I am unable to endorse the view, which has apparently gained much currency at present, that the Pāli canon is translated from some other dialect (according to Lüders, from old Ardha-Māgadhī). The peculiarities of its language may be fully explained on the hypothesis of (a) a gradual development and integration of various elements from different parts of India, (b) a long oral tradition extending over several centuries, and (c) the fact that the texts were written down in a different country. I consider it wiser not to hastily reject the tradition altogether but rather to understand it to mean that Pāli was indeed no pure Māgadhī, but was yet a form of the popular speech which was based on Māgadhī and which was used by Buddha himself.”

Whatever the case may be when it comes to the nature of Pāḷi, perhaps Bodhi (2020: 3) is right when suggesting: “If by some unexpected miracle transcripts of the original discourses should turn up in the exact language(s) in which they were delivered, one who knows Pāli well would be able to read them with perhaps 90 percent accuracy.” In thus manner, the scope of modern scholarly assessments concerning the nature of Pāḷi partially extends […].


  1. Merriam Webster (“Lingua franca,” n.d.): “[A]ny of various languages used as common or commercial tongues among peoples of diverse speech.”
  2. Merriam Webster (“Koine,” n.d.): “[A] dialect or language of a region that has become the common or standard language of a larger area.”
  3. Gombrich holds that the Buddha was the progenitor of the Pāḷi language or at least a principle figure as it relates to its creation.
  4. Commenting on von Hinüber’s assessment of Pāḷi as an artificial language, Prof. Oberlies remarks: “The ‘artificial language’ of Mr. von Hinüber goes too far also for me” – “Die ‘Kunstsprache’ von Herrn von Hinüber geht auch mir zu weit” (personal communication, May 3, 2020).


  • Bodhi (2020). Reading the Buddha’s discourses in Pali : A practical guide to the language of the ancient Buddhist canon . Wisdom Publications.
  • Childers, R. C. (1875). A dictionary of the Pali language . Trübner & Co.
  • Geiger, W. (1956). Pali literature and language (B. Ghosh, Trans.; 2nd ed.). University of Calcutta (original work published 1916).
  • Gombrich, R. F. (2018). Buddhism and Pali. Mud Pie Books.
  • von Hinüber, O. (1996). A handbook of Pāli literature. Walter de Gruyter.
  • Karpik, S. (2019). The Buddha taught in Pāli: A working hypothesis. The Journal of the Oxford Centre for Buddhist Studies , 16, 10–86.
  • Klein, J., Joseph, B. & Fritz, M. (Eds.) (2017). Handbook of comparative and historical Indo-European linguistics . De Gruyter Mouton.
  • Levman, B. G. (2019). The language the Buddha spoke. Journal of the Oxford Centre for Buddhist Studies , 17, 64–108.
  • Lüders, H. (1954). Beobachtungen über die Sprache des buddhistischen Urkanons (E. Waldschmidt, Ed.). Akademie Verlag.
  • Norman, K. R. (1980). The dialects in which the Buddha preached. In H. Bechert (Ed.), Die Sprache der ältesten buddhistischen Überlieferung – The language of the earliest buddhist tradition (pp. 61–77). Vandenhoeck & Ruprecht.
  • Oberlies, T. (2007). Aśokan Prakrit and Pāli. In D. Jain & G. Cardona (Eds.), The Indo-Aryan languages (pp. 161–203). Routledge.
  • Pischel, R. (1957). Comparative language of the Prākrit languages (S. Jhā, Trans.). Motilal Banarsidass.
  • Roth, G. (1980). Particular features of the language of the Ārya-Mahāsāṃghika-Lokottaravādins and their importance for early Buddhist tradition. In H. Bechert (Ed.), Die Sprache der ältesten buddhistischen Überlieferung – The language of the earliest Buddhist tradition (pp. 78–100). Vandenhoeck & Ruprecht.
  • Rhys Davids, T. W. (1911). Buddhist India. T. Fisher Unwin.
  • Warder, A. K. (2000). Indian Buddhism . Motilal Banarsidass (original work published 1970).
  • Winternitz, M. (1981). A history of Indian literature (Vol. I) (V. S. Sar-ma, Trans.). Motilal Banarsidass (original work published 1908).
  • Wynne, A. (2019). Once more on the language of the Buddha. The Journal of the Oxford Centre for Buddhist Studies, 8–10.
1 Like

I find the traditional account fits some of the modern theories quite well that say that Pāḷi was kind of a supra-regional language:

Commentaries, Sub-Commentaries and Pāḷi Grammatical Literature

The aṭṭhakathā and ṭīkā traditions take the language of Magadha (māgadhabhāsā) to be a natural language – a delightful language indeed (Sv-pṭ: 6). As presented already above, the Samantapāsādikā vinaya aṭṭhakathā (Sp IV: 23) proffers the following annotation of the phrase sakāya niruttiyā as used by two Brahmins in the context of one cardinal (as it relates to linguistics) incident recorded in the vinaya, where they, still attached to things Vedic, complain about the way or language by adopting or use of which the Buddha’s teaching was spoiled: “[…] ‘own tongue’ means the common speech belonging to Magadha (māgadhiko vohāro) in the manner spoken (vuttappakāro) by the Perfectly Enlightened One.”

The 12/13th century CE Vimativinodanīṭīkā (Vmv: 125) interprets the relevant portion of the episode thus: “They ruin (dūsenti) the word of the Buddha with their own language (sakāya niruttiyā) as it relates to the canon (pāḷi): ‘Surely, those of inferior birth who have learned [memorized; i.e. the buddhavacana] corrupt the language of Magadha (māgadhabhāsāya) to be spoken by all with ease (sabbesaṃ vattuṃ sukaratāya)’ – this is the meaning.” The Vinayālaṅkāraṭīkā (Pālim-nṭ: 180) from the 1600’s CE in turn as succinctly as possible glosses sakāya niruttiyā as māgadhabhāsā, the “language of Ma-gadha.”

The Samantapāsādikā (Sp I: 94), on another occasion, equates māgadhabhāsā seemingly with the Aryan language as a whole, thereby possibly referring to a supra-regional language. The indigenous Pāḷi grammars basically concur with the above. The Padarūpasiddhi, for example, mentions explicitly that the Buddha spoke a tongue belonging to Magadha (māgadhika), as recorded in the tipiṭaka (Rūp: 32) – for a detailed discussion concerning themes related to the last-mentioned point, see Gornall (2014). The above is, as we have already seen at the beginning of this chapter, a sensible account of what language the Buddha employed, at least primarily.

In this connection, it appears relevant to mention that the aṭṭhakathā tradition is not just an alternative scholarly opinion but rather constitutes strong additional evidence (cf. Karpik, 2019: 74), as Norman (1983: 119) spelled it out:

[…] some parts of the commentaries are very old, perhaps even going back to the time of the Buddha, because they afford parallels with texts which are regarded as canonical by other sects, and must therefore pre-date the schisms between the sects. As has already been noted, some canonical texts include commentarial passages, while the existence of the Old Commentary in the Vinaya-piṭaka and the canonical status of the Niddesa prove that some sort of exegesis was felt to be needed at a very early stage of Buddhism.

Furthermore, Buddhaghosa’s Samantapāsādikā contains over 200 quotations of earlier material, according to the indigenous tradition harkening back in parts to the first council (paṭhamasaṅgīti) held shortly after the demise of the Buddha (von Hinüber, 1996: 104).

Abbreviations/References (Primary)

  • Pālim-nṭ: Vinayālaṅkāraṭīkā
  • Rūp: Padarūpasiddhi
  • Sv-pṭ: Sumaṅgalavilāsinīpurāṇaṭīkā
  • Sp: Samantapāsādikā
  • Vmv: Vimativinodanīṭīkā

References (Secondary)

  • Gornall, A. (2014). How many sounds are in Pāli? Schism, identity and ritual in the Theravāda saṅgha. Journal of Indian Philosophy , 42(5), 511–550.
  • von Hinüber, O. (1996). A handbook of Pāli literature. Walter de Gruyter.
  • Karpik, S. (2019). The Buddha taught in Pāli: A working hypothesis. The Journal of the Oxford Centre for Buddhist Studies , 16, 10–86.
  • Norman, K. R. (1983). Pāli literature : Including the canonical literature in Prakrit and Sanskrit of all the Hīnayāna schools of Buddhism . Otto Harrassowitz.

Unfortunately Karpik does not argue for this position. It is merely an assumption that he makes. Rather his argument is all about the relationship between Pāli and the language of the Asoka inscriptions. And as I say, I find him persuasive on this topic. I wait with interest to see what scholars of Middle Indic make of his arguments, but I suspect we’ll have to change the textbooks.

My attitude to “reputable scholars” is unprintable. But let’s just say that the whole point of Karpik’s argument is that “reputable scholars”, including unimpeachable giants like Roy Norman, are wrong all the time.

As far as the “language of the Buddha” goes, Karpik tacitly assumes (1) the absolute historicity of the Buddha and (2) that the Pāli canon is straightforwardly an historical record of things said by that Buddha. He doesn’t even attempt to address the deep divisions in Buddhist Studies over these issues. I imagine this will go down well with Theravādins, but he’s arguing against people like Oskar von Hinuber who, I think, is unlikely to even read a paper framed in terms of “what language the Buddha spoke”.

It comes down to an argument over what kind of evidence the Pāli texts are. As far as I can see, on one side are a small group of Theravādins (Sujato, et al) and a handful Theravāda-adjacent scholars (like my old mentor Gombrich and some of his prodigies) who believe in the two assumptions enough to publish apologetics for them. On the other side, are the rest of us who don’t believe these assumptions are helpful or at least have serious doubts about them. At the very least a scholar who relies on these assumptions should explain why, for example, they accept the historicity of the Buddha without any caveats.

Unfortunately, Kaprik is only publishing in JOCBS to date. The current editor is a well-known advocate of the same two assumptions. So I doubt Karpik met any serious challenges to his assumptions in the review process. I’ve written to Karpik, encouraging him to drop the “language of the Buddha” schtick and to write something about the history of Pāli for the PTS Journal, of which von Hinuber is a co-editor with Rupert Gethin (and a hard man to please). We shall see. I hope he manages to break out of the OCBS bubble. I’m very glad that I did. Being exposed to a wider range of academic editors and reviewers has made my writing a lot better.

I’m not Theravadin. :pray:


Interesting paper. Let’s take the first example @StefanK uses:

Suganaṁ raje raño Gāgīputasa Visadevasa
pauteṇa Gotiputasa Āgarajusa puteṇa
Vāchiputena Dhanabhūtina kāritaṁ toranāṁ
silākaṁmaṁto ca upaṁno.

I transliterate the same into standard Sanskrit (using the very arguments he uses via his footnotes 20-32 - as the arguments apply equally well to Sanskrit):

Śuṅgāṇāṁ rājye rājño Gārgīputrasya Viśvadevasya
pautreṇa Gauptīputrasya Āṅgāradyutaḥ putreṇa
Vātsiputreṇa Dhanabhūtinā kāritaṁ toraṇaṁ
śilākarmāntaḥ ca utpannaḥ

Moreover, he is apparently not able to treat the language of the Ashokan edicts as Pāli (he says the language underlying the Ashokan inscriptions simply vanished after some decades as it was Ashoka’s personal language - which isnt credible in the least, as Aśoka did not have a unique language of his own) - but even Aśokan edicts are readable as Sanskrit if we take into account similar orthographic principles as the above.

First Rock Edict of Ashoka:
iyaṃ dhaṃma-lipī Devānaṃpriyena Priyadasinā rāña lekhāpitā
idha na kiṃci jīvaṃ ārabhitpā prajūhitavyaṃ
na ca samājo katavyo
bahukaṃ hi dosaṃ samājamhi pasati Devānaṃpriyo Priyadasi rājā
asti pi tu ekacā samājā sādhu-matā Devānaṃpriyasa Priyadasino rāño
pura mahānasamhi Devānaṃpriyasa Priyadasino rāño anudivasaṃ bahūni prāṇa-sata-sahasrāni ārabhisu sūpāthāya
se aja yadā ayaṃ dhaṃma-lipī likhitā tī eva prāṇā ārabhare sūpāthāya dvo morā eko mago so pi mago na dhruvo
ete pi trī prāṇā pachā na ārabhisare

In standard sanskrit:

iyaṃ dharma-lipī Devānāṃpriyeṇa Priyadarśinā rājñā lekhitā
iha na kiṃcid jīvaṃ ālabhya prahotavyam
na ca samājaḥ kartavyaḥ
bahavaḥ hi dosāḥ samāje paśyati Devānāṃpriyaḥ Priyadarśī rājā
asti api tu kecit samājāḥ sādhu-matāḥ Devānāṃpriyeṇa Priyadarśinā rājñā
purā mahānase Devānāṃpriyasya Priyadarśino rājño 'nudivasaṃ bahūni prāṇi-śata-sahasrāṇi ālabhyanta sūpārthām
te adya yadā iyaṃ dharma-lipī likhitā trayaḥ eva prāṇinaḥ ālambhyante sūpārthaṃ - dvau mayūrau eko mṛgaḥ, so’pi mṛgo na dhruvam
ete’pi trayaḥ prāṇinaḥ paścād na ālapsyante

So as per the above, one can use the same arguments to claim that the early inscriptions are in Epigraphic Sanskrit (written with orthographic irregularities in the early period of writing) but were not a separate language of their own. That would also explain why the language of later inscriptions gradually became standard sanskrit - just as Buddhists themselves started converting their canonical and other literature to standard Sanskrit.

Let’s now see another claim he relies on. He quotes Luders in saying “Early Brāhmī script does not indicate double consonants”.

Is it true that Early Brāhmī script does not indicate two consonants next to one another i.e. consonant clusters (like the canonical Pāli word ‘dukkha’)? Let’s see from the Ashokan edict I have quoted above :

  1. pri in Devānaṃpriyena, Priyadasinā
  2. tpā in ārabhitpā
  3. pra & vya in prajūhitavyaṃ
  4. vyo in katavyo
  5. mhi in samājamhi & mahānasamhi
  6. prā in prāṇa & prāṇā
  7. srā in sahasrāṇi
  8. dvo in dvo
  9. dhru in dhruvo

So what do we do with evidence (like the above) that doesn’t fit his theory?

None of the examples you listed are double consonant clusters in the sense ‘like the Pāli word dukkha.’ There is no gemination, only clusters with a glide following a consonant. ‘tpā’ being an exception, though still not two of the same consonant. So are there clusters? Yes. Are they of the kind that represents a long consonant of the same type (gemination or ‘double consonant’)? No, if not counting the use of niggahīta.

Double consonants means two consonants next to each other without an intervening vowel (i.e. a consonant cluster).

k & kh are different consonants in Pali - they are not the same consonant appearing twice (eg. in the word ‘dukkha’).

ka & kha are written as 𑀓 & 𑀔 respectively in Brāhmī (if the Brāhmī characters don’t show on your computer you will need a font like ‘Segoe UI Historic’ or Noto Sans Brahmi installed to display them). Nobody who read Brāhmī in the 3rd century BCE would have pretended that they represented the repetition of the same consonant - because they are not the same consonant. Their pronunciations are not identical.

Therefore the examples I’ve provided above are of double-consonants (or consonant clusters) - and that is the case for dukkha as well.

The point is - consonant clusters could and did exist in Old-Brāhmī, and where such clusters appear (as in the above Ashokan edict and other early edicts), the words where they appear coincide with the Sanskrit spellings and not with the Pali spellings - thereby evidencing that the writer thought they were writing Sanskrit (other orthographic irregularities/simplifications notwithstanding).

If even dissimilar consonant clusters were written in Old-Brāhmī (as shown in my prior post above), there is no reason why similar consonant-clusters couldn’t be written (as in Pali). But the fact that similar consonant clusters were not written is because Pali didnt exist as a distinct language in the time of Ashoka. The Pali of the Pali canon follows the orthographic conventions established by the Early-Brahmi inscriptions - it does not precede them. This doesnt however mean the contents of the canon are all later than the inscriptional evidence.

In the canon, it was much more difficult to change the language of the versified Pali due to metrical reasons (even though there too similar orthographic conventions were applied to the extent possible), so the verses generally show a language relatively older (and more original) than the orthographically-standardized prose pali.

I read it as a geminated aspirated velar plosive, i.e. /k^h:/. Not sure of the various orthographic conventions for phonological glossing.

Not sure what you are talking about. k is not an aspirated sound, kh is an aspirated sound. They are two different consonants. There is no gemination here.

The same is the case for

  • c & ch
  • j & jh
  • ṭ & ṭh
  • ḍ & ḍh (or their respective allophones ḷ & ḷh)
  • t & th
  • d & dh
  • p & ph
  • b & bh

Those pairs are independent consonants. One immediately following the other in the same word does not amount to gemination.

No idea what that means.

In any case, canonical Pali also uses dissimilar conjunct consonants (so if early Brāhmī didnt use dissimilar conjuncts, it couldn’t have been used to write Pali accurately either) - for example I found these in a random search of the Pali canon, I am sure there are many more examples.

  1. tra & sta in tatra, utrastamidaṃ, anutrastaṃ & nāññatra
  2. tri & dri in nāññatrindriyasaṃvarā,
  3. tri in tatridaṃ, tatrime & tatrimāni
  4. tru in citrupāhanaṃ, tatrupāyāya & tatruppattiyā
  5. tva in tvaṃ, tatvassa
  6. gya in ārogyaṃ, manussadobhagyaṃ, agyantarāyo
  7. sya in ālasyaṃ, anālasyaṃ
  8. bra in bravitūti, brahantaṃ, brahā, brahāraññaṃ, brahāvane, brahma
  9. dra in udrabheyyuṃ, gadrabhaṃ, dukkhudrayaṃ, dudrabhītipi, devadudrabhi, bhadraṃ, saudrayā
  10. vya in vyatto, havyaṃ, vyattūpasevī
  11. tre in lokacitresu, aññatreva, tatreva
  12. smi in abhibhosmi, panasmi, kasmiñci, ummattosmi, pismi, kismiṃ
  13. snā in asnātha
  14. sne in sasnehaṃ, sneho, snehapareto
  15. ste in anuddhastena, uddhaste
  16. sta in biḷārabhastaṃ, odhastapatodo
  17. kri in kriyavādā, kriyā
  18. kru in akrubbaṃ, krubbetha, vikrubbato
  19. pla in plavanti, uplava, uplaveyyāti, uplavissati
  20. dva, dvi & dve in advayaṃ, vākyadvayam, dvinnaṃ, dvidhā, dvipadā, dve, dvedhā, dveḷhakajātā
  21. kya in vākyam, sakya, mālukyaputto, abhinanduntivākyaṃ

Dear Ven. Sujato, good to know that you are not Theravadin. :pray:

A rose by any other name…