SuttaCentral

Indicating publication status on Bilara via git rather than json


#1

See previous discussions and proposals:

In the current proposal we are to indicate the publication status of a document in a JSON file included in the directory. I wonder whether we should, rather, do this using git branches.

Current proposal

Have a JSON file with publication data, including publication status. This would be included at the top folder level in bilara-data. This has the advantage that all publication data is included in one place, and can act as a master reference for all SC’s publications. However I fear it may have some problems:

  • How do we handle complex cases where a translator wishes to publish certain texts in a nikaya and not others? They might have to make a list of hundreds of sutta UIDs in JSON. This would be brittle and error-prone.
  • The publication status of a particular text is not evident in the Github UI. Someone looking at the repo wouldn’t know what is published and what isn’t, unless they read the JSON file.

Git proposal

Distinguishing between “published” and “unpublished” branches is of course one of the basic functions of Git. Rather than reinventing the wheel, why not use this for texts, just as we do for code?

  • Under this model, when a translator begins work on Bilara, their work is opened on an unpublished branch in Bilara. Anyone looking at the Bilara repo immediately knows what is what.
  • When ready to publish, that branch is merged with the published branch. In this scenario, it is, I think, better to call this branch published rather than master, which is not really meaningful.
  • Anything on the published branch is “ready to consume” by any app that wants it. Typically we’d expect that apps would mirror the published content and serve it from their own repo.
  • Note that in some cases, apps will want to transform the data in different ways. For example, on SC we want to turn markdown emphasis into HTML <em> tags, whereas for a voice app they may be removed.

How would this be handled in terms of UI? I’m thinking that work in Bilara is always committed to unpublished. There is a button in the bilara toolbar, PUBLISH. When you hit that, it opens a modal dialog or something like that. The data for publication.json is entered via this modal, or automatically generated. Thus normally publication.json would not be hand-edited (which is a change from my original proposal.)

In the following mockup, I use italics for text that will be autopopulated.

This action will publish your translation publicly to SuttaCentral.net, as well as any services that use SuttaCentral’s texts

Before publishing, ensure that texts are proofread and all corrections are made.

Text

Root title: Majjhima Nikāya
Translated title: Middle Discourses
ID: mn

Translator(s)

Author UID: sujato
Full name: Bhikkhu Sujato
Short name: Sujato

Publication action

☐ Publish all changes to collection Middle Discourses.
:ballot_box_with_check: Publish only this text (MN 1) to collection Middle Discourses.

Unpublish

☐ Unpublish all texts from collection Middle Discourses.
☐ Unpublish only this text (MN 1) from collection Middle Discourses.

Publication status

☐ Draft: this work is incomplete and will be revised.
:ballot_box_with_check: Completed: the main work on this text is complete, but corrections and improvements are ongoing.
☐ Final: no further revision is anticipated.

Edition

☐ Initial publication.
☐ Minor update. Use this for simple corrections and minor revisions. This update will not be recorded in the publication data, but is retained in Github.
:ballot_box_with_check: Revised edition. Use this for a substantially revised new edition. This update will be recorded in the publication data.

  • text field for revised editions: “Describe the nature of the revision (required)”.

License

All publications made on Bilara must use a Creative Commons Zero license. The author or authors dedicate the work to the Public Domain. This dedication is irrevocable.

Check these details are correct

Publish a completed version of a revised edition of MN 1 in Middle Discourses by Bhikkhu Sujato.

Confirm

Then the relevant data would be automatically added to publication.json.

Let’s break this down.

Text

The root text title and ID are set up in a separate process at the start of the project. The root text is basically what would normally be considered a “book”, i.e. a nikaya, a book of the Khuddaka, one of the Vibhangas or Khandhakas, a book of the Abhidhamma, etc.

Translator

Like the title info, the translator data is already set up at the start of the project.

Publication action

The Publication Action is by default set to the whole collection.

However a translator may want to work progressively. Imagine for example that someone wishes to translate a Samyutta sutta every day. Then they set the publication action to “this text only”. Then:

  • If the collection is not yet published, it is created in publication.json and the specific text ID added.
  • If the collection is already published, the ID is added.
  • If the text is already published, but is revised, it is simply pushed.

A translator should also have the ability to unpublish work.

Publication status

This is a note to let readers know the current status of the project. It might be used, for example, to trigger a “draft” flag on the website.

Edition

We want to be able to note substantial changes, however we need not record every corrected comma in publication.json: that’s what git is for. The only way to do this is to rely on the translator.

Initially, “initial publication” is automatically checked and other options are disabled. Once the commit is made, the situation reverses: Initial publication is disabled, and “minor update” is checked. The translator can choose to check “revised edition”.

  • “Initial Publication”: This creates the relevant entry in publication.json.
  • “minor update”. Changes will be logged in git with generic commit message, but not recorded in publication.json.
  • “Revised edition”, the date and commit number will be added as a new edition to publication.json. In addition, they will have to add some text describing the revised edition. This will be the commit message, and also will be added to publication.json

Check the details

We’ll want to make sure everything is right.

Details added by system

Certain details can be left out of this dialog for simplicity, and will be added automatically to publication.json. These will include;

  • Publication number: scpub1, scpub2
  • Source URL
  • CC0 license
  • edition number (if needed)
  • publication date/time
    • If individual suttas are added to a collection over a period of time, the date/time records only the first and last times as a range.
  • Publisher name “SuttaCentral”
  • Publication type and URL. In some cases these may have to be added by hand later, eg. for books.

Creating a translation project on Bilara
#2

Thank you, Bhante, this sounds fantastic and would perfectly suit my needs, for example! :pray: :tada: :dolphin:


#3

This sounds reasonable.

So you want to use unpublished for the “work in progress” branch, and published. Github allows setting a default branch (what people see by default when they visit the repo), which would be the default branch?

The scheme of adding publication files to the published branch does have some ramifications, in git the normal flow is to do a git merge to bring content from one branch to another, but it’s not easy to merge the changes pertaining to just one file. So instead an approach would probably be used like this:

git checkout published 
git checkout unpublished /translation/en/sujato/mn1_translation-en-sujato.json
git add /translation/en/sujato/mn1_translation-en-sujato.json
git commit -m "... something descriptive ... "

(the two git checkout lines do completely different things, the first changes the working branch, the second yanks files out of another branch without changing the working branch)

This kind of approach simply adds the file whole-cloth with no commit history, so published has a very minimalistic history (basically just publication and revision events), while unpublished has the full messy history of individual translation strings and edits. It’s probably actually a good thing.


#4

I’d say see “published” as default. people usually don’t want their mess to be public!

True. I did a little research on this beforehand, and there are various methods possible. But I’ll leave the technicalities for you. I agree, having a nice clean commit history of the published version, and full details on unpublished, would be great.


#5

Bhante, this is great and is almost exactly what we are doing with sc-voice/bilara-data. The _publication.json in master specifies the production state and is actively read by production Voice code. For example, the Vinaya is not shown in Voice.

We have also found it convenient to branch individual suttas into their own branches for the duration of actual translation. For example, Anagarika @Sabbamitta will be working in the dn33_de_sabbamittabranch for months given the amount of time required. This allows us to separate book publication from sutta publication.

Would also suggest that server code be configurable to choose a particular branch for publication status. For example, the production code should look at “master”, and the staging code could look at “staging”. This would permit review and testing of staging content prior to publication.


#6

Thanks Karl! We are still finalizing details, and your experience with Voice is important for this.

Subsequent to this post, following discussions with Blake, we will probably change some of the details. Rather than publishing directly to the site (as I envisaged originally), it’s probably better for the “published” branch in bilara-data to signify “ready to be consumed by apps” rather than “push it live already!”. So it would be pushed from there to the sc-data or Voice or whatever other app wants to use it. That way the apps themselves are kept independent, and can apply the publishing model that suits them.

What do you think?


#7

Excellent. This is a “production pull model” and is what Voice implements. Our admins have to manually pull the latest from bilara-data master, which always has the latest to share. This is perfect. Thank you!


#8

I’ll update the OP to reflect this.