Correcting the CPED

Is it now?
I am at the hangout.

No, in one hour! Is that still okay for you?

oops yes

@ElissaJ and @Russell, 5thanks so much for the hangout earlier. Look forward to working with you both.

Elissa, the term to start with is atikhīṇa.


Dear Bhante @sujato,

You’re very welcome. I’m very glad to be able to give back by assisting in any way I can to propagate the Dhamma. Likewise, looking forward to working with you and @ElissaJ.

with reverence, respect, and gratitude,



There are symbols uses that aren’t in the abbreviations list. Maybe they are some dictionary standard:

  1. a check mark
  2. a double-s
  3. <

Also, about half way down, is (senti cāpātikhīṃā va;…

I’m not sure what senti or the va mean.

As far as I can tell, the only definition listed is “scattered”. Right?



You can normally ignore everything in the etymology section, i.e. everything between the [square brackets]. So none of these symbols affect us. But for the record:

  1. Not sure what you mean by “check mark”. There’s an *asterisk, which means “hypothesized but not actually found anywhere”, and a √, which means “root”.
  2. double s: this is the “section sign” §
  3. the < here seems to mean “derived from”, but I’m not entirely sure.

Normally you can ignore these examples, the only thing that’s of interest to us is to see whether they include any references to the EBTs. In this case, there is a reference to Dhp, so it is in the EBTs.

Right. Clearer typography would be really useful. You shouldn’t have to hunt and squint in a dictionary entry to find the basic meaning of the word!

Great, thanks for all the info. I didn’t realize the square brackets were the etymology. I am more than happy to ignore that part. :slightly_smiling:

So, I did the first 10 on the EJ Test tab and highlighted the rows in yellow. How about if what I did looks good, you can change the highlight to green.

I put in some question marks on some.

I added a column for my reference for Page #. There is one that is nf for not found but I’m thinking now that’s because it is not the regular form, right?

Also, I added column K for you to put any notes about my ?

One question is some have no EBT or other reference, so I think that means to include them and not put the 0 in the ebt column? I’m guessing that if they include references to texts, it means all texts have been searched and only those used are included. And if there are no references to texts at all, then it is a widely used word and should be included.

My last question, let me know if this is cluttering up the discussion board too much and I can send via email next time.

Thank you Bhante. I am learning a lot and hope my contributions back will be helpful.


P.S. I plan to pick up the pace when computer is fully revived and this initial education part gets me rolling.

Okay, looking good so far. Consider it greened!

Good idea.

That’s correct. How you’ve handled this is fine.

That’s the safe option, yes.

Not quite. In this case atiga is listed, but only occurs as part of a compound. So the entry lists the compounded forms under which the word appears. In such a case it is fine to do as you have done.

It would also be fine to simply omit atiga, and only give the compounded forms. In that case, the compounded forms would have the compounded term + atigacchati as the “regular form”.

However, as I said above, the safe option is simply to include the entry in ambiguous cases.

Not at all, that’s what it’s for. We normally do most of our development discussion here, so people who are not in on the original exchange can still see what’s happening.


Here is the check mark I was talking about and it’s in the middle of an entry.


It means “root”. You can ignore it.

So now I am having trouble copy/pasting from the pdf to the google doc. I’ve tried several programs to view the pdf and it still doesn’t copy the characters correctly.

Any suggestions?


Dear @ElissaJ,

Perhaps this will help you a bit. I found that the google doc is hard to tweak for me to make work as easy as possible. I downloaded the document itself onto my drive and it’s been working fine as it allows me to do all the fun things I can do with a spreadsheet :grin: If you hadn’t downloaded OfficeLibre, I highly suggest you do because it’s really cool. Did I mention it’s free? :heart_eyes:

(I also noticed that when I left the document open in google drive, my PC’s CPU was going nuts).

in mettā,



Dear Bhante @Sujato;

I have sent you a spreadsheet via your gmail :grin: I did it that way since I noticed that working on the spreadsheet via google took a lot out of my CPU and it was really slow.

At your convenience, please take a look at it. I started from the beginning of “G”. I welcome your recommendations.

with respect, reverence, and gratitude,


Thanks Russell. Are you able to copy from the PDF and paste to the Spreadsheet and it gets all the unicode characters correctly?

The pdf is an image file. You can’t copy from it. However the pdf has an embedded OCR layer (which is why you can search it). I’ll send you the text extracted from that, which might be of some help.

Fine, just make sure it’s synced properly.

Good to know!

Alas, the text file didn’t get all the characters. I downloaded a trial of an ocr program where you can set the language, but of course it doesn’t list Pali. It doesn’t list Sanskrit either. Does anyone know what more common or modern language would have the needed characters?

Russell. How is your progress? I got stuck on matching up the DOP entries to the spreadsheet list, and so was wanting to be able to copy from the pdf and search on the spreadsheet. What is your method?

I doubt very much if you’ll find an OCR that does a much better job, but good luck. I can do it with the open source Tesseract, which does include Sanskrit. I just used it to OCR the Monier Williams Sanskrit Dictionary: it took a full 12 hours running at 100% of one of my CPUs, so I’m not eager to do it unless it’s going to be really useful! I didn’t test out the Sanskrit side of things; it probably assumes you are using Devanagari characters so will not be of any use.

Perhaps you can explain to me exactly what you want to achieve and maybe there’s an easier way.

Is the problem that you can’t input the diacritical marks? If so, we have ways of doing this in all major operating systems here:

But I’m not sure how useful this will be, as you don’t need to use diacritical marks to search on Google Docs, it does fuzzy searching by default.

In the majority of cases you just have to visually scan the document to find the entries.


Dear @ElissaJ,

Like you I had the same trouble pasting the words from the .pdgf to the spreadsheet. If you have the Pali app for your keyboard, you will be able to type Pali :smiley:

My solution for the issue was to copy what is already available on the spreadsheet and tack on the additional words (ex: gaṇati = gaṇa (existing in spreadsheet) I type in “ti”.

When a word’s root is not already on the spreadsheet, I just insert a new cell and type it in completely since I have no choice and can’t afford to waste precious time. I also insert the new word in ascending order as it would show in the CDoP so that way anyone whose looking at it will get a sense that they are all related terms.

Hope this helps! Okay, going back to the spreadsheet!

in mettā,