SuttaCentral

Volunteer wanted! Help collect author/translator information


#21

Great Bhante @sujato, that all sounds clear.

Just a few more questions… is this the complete list of authors we should compile info on? sc-data/author_edition.json at master · suttacentral/sc-data · GitHub

and in the above github link, is the short_name the same as what we’ll use for the slug?

In terms our own process, and SC norms and preferences: is it best for @robbie and I to communicate on here/some other SC channel/platform, or is personal email ok?


#22

Yes.

No, the one you want is called uid. (The “short name” is used in buttons and similar contexts where space is at a premium. Actually I think that due to changes earlier this year we may not even use the short name at all.)

Note that author_edition includes both “authors” (= translators mainly) and also editions, which applies to root texts in Pali, etc. We are primarily looking for author info at the moment, but any edition info that you find would be great, too. Maybe keep a separate spreadsheet for that, if you come up with anything.


#23

Perfect! I sent an edit request.

By the way, I developed something of a pipeline (in R) to convert SuttaCentral’s JSON data into a data frame (so that the data can be copy-pasted into a Google spreadsheet). I will add the author_edition data as soon as I can edit.

install.packages("RJSONIO")
require(RJSONIO)
json_data <- fromJSON("https://raw.githubusercontent.com/suttacentral/sc-data/master/additional-info/author_edition.json", encoding="UTF-8")
json_data <- lapply(json_data, function(x) {
  x[sapply(x, is.null)] <- NA
  unlist(x)
})
df <- as.data.frame(do.call("rbind", json_data))
View(df)

in the case of collaborative translations, each author is also listed independently

I recently learned about tidy data from Wickham (2014): https://vita.had.co.nz/papers/tidy-data.pdf.

This paper tackles a small, but important, component of data cleaning: data tidying.
Tidy datasets are easy to manipulate, model and visualise, and have a specific structure:
each variable is a column, each observation is a row, and each type of observational unit
is a table.

It seems like there might be some value in distinguishing singular slugs from compound slugs (that is, considering them different observational units). Fields like DoB and DoD are not relevant for a compound slug (they are only relevant for the associated individual translators). In the current author_edition dataset the information of multiple authors is put in one cell (which makes the data messy), e.g.

33 | author | aung-rhysdavids | Aung … | Shwe Zan Aung, C.A.F. Rhys Davids

I think this could be prevented by having one table which lists the associated authors for a compound slug, e.g.

1 | aung-rhysdavids | aung | rhysdavids

and then another table (based on the Google spreadsheet) with the author data, e.g.

1 | aung | Shwe Zan Aung | pli | en | . .  .
2 | rhysdavids | C.A.F. Rhys Davids | pli | en | . . .

#24

@Robbie I don’t follow 100% but it sounds great :slight_smile:

If I understand, you will populate the “slug name” and “long name(s)” columns, and identify/populate whatever other info we are pulling from the current author_edition dataset.

Once that’s completed, will you tag me in this thread? Then I can start helping with the research.


#25

And I’ve just okayed it.

Okay so that’s awesome. Note that Bilara i/o offers this natively, but the relevant data is not yet on Bilara, so i’m not sure how exactly it would work.

Indeed, that sounds great. We should, in fact, end up with distinct JSON files:

translator.json
edition.jason
translation_collaborators.json

Translation collaborators should have only slug, short name, and long name. (Sometimes the long name is not inferrable from the individual names, eg. “T.W. & C.A.F. Rhys Davids”)

Of course this will only affect a few cases.


#26

@sgns I have imported the data from the author_edition dataset! I created new UIDs (“collaborator UIDs”) for individual authors who previously only appeared in a compound UID (e.g. walton for Jessica Walton). All new UIDs are marked with an asterisk * in the A-column. (I have backed up this table, so there’s no risk of data loss. I also have made a table which lists the collaborator UIDs for each compound UID.)


#27

Great @Robbie
I will dig in soon!