Database and name/tree files

alex_stx · December 8, 2020, 2:50pm

From our discussion:

Some files in the name folder have the following structure:

{
  "<uid>": "<empty_string>"
}

but this uid is then found in super-name.json and if it is, we use the name from there.
Example:

jp-name.json :

{
  "jp": ""
}

super-name.json :

{
  ...
  "jp": "Jñānaprasthāna",  // This will be used as name in the database.
  ...
}

The same behavior is provided for files with the following structure:

{
  "pr": {}
}

// and

{
  "mvs": {
    "t1545.98": "阿毘達磨大毘婆沙論"
  }
}

we can find these uid in super-name.json and use their names:

{
  ...
  "pr": "Prajnaptiśāstra",
  ...  
  "mvs": "Mahāvibhāṣā Śāstra",
  ...
}

What to do with the other-group document, which is a child of the sutta document in the super-tree.json file? This document prevents the creation of a tree in the database, because there is no such key in the documents collection, and the links between the documents are made exactly by the keys.
The file super_extra_info.json has several documents with spaces in the uid field. I personally found two of them: a document with uid: sutta and a document with uid: dk. You can find them with a file search. It’s not a huge problem right now, but it can make us uncomfortable in general, so it would be cool if someone would fix it.
The logic of setting the fields acronym, volpage and biblio_uid
From what we have discussed, the following comes out:
To add these fields to the document we use the files super_extra_info.json and text_extra_info.json. First, in the file super_extra_info.json we look for a record where the value of the field uid is the same as in the currently processed document. If such a record is found, we take from it the required values. If there is no such record in super_extra_info.json then we look for the necessary information in the file text_extra_info.json (search logic is the same).
language and root_lang fields
It would be a good idea to repeat what the purpose of each of these fields is and which of them we use in the new data collection.
Now we use the field root_lang. The value for this field is taken from language.json.
What about the language field?
Last but not least, what to do with those queries that will break after changing the structure of documents in our data collection? Now everything is tied to the root and root_edges collections (the tree links are defined here). The root_names are also occasionally used, but at this point it is not clear what for.
I want to say that as soon as we add an updated data collection that is based on names files, and we build connections based on tree files - while removing the root and root_edges collections, we will also have to adjust all the queries in the database that are currently designed for root and root_edges (and some for root_names)

sujato · December 8, 2020, 8:14pm

Hey Alex, okay let’s go.

Yes, so this is the situation where we only have a title and UID for a text, but no further information. (This usually applies to very obscure texts that are mentioned in the parallels or are otherwise included for completion.)

I have investigated this case, and as I suspected the reference t1545.98 is there for the parallels. In parallels.json this number occurs a few times. What’s happening is that mvs is a very large text that is mostly not contained on SC (since it is a late commentary), however it does quote from some early text that we do include, so it is in the parallels.

Because it is such an unusual case, it missed the proper processing, and ended up with nested json, which should not happen. (FYI, the reason we avoid using nested json for name is that Bilara relies on flat json.)

I have now corrected this and it works properly in Bilara. If you look for t1545.98 in bilara-data you’ll see that it is translated already!

I’m not sure what you mean by “the documents collection”. Can you point exactly to it for me?

other-group is found in super-tree, super-name, and super-extra-info, so that is good. Where else does it need to be listed?

I’m not sure what is going on here. The content of the navigation should be determined by super-tree alone. Whether there is a document by that name is irrelevant: not every node in the navigation has a corresponding document.

I’ve fixed this. Let us know if there are any similar issues.

That’s correct.

And just to repeat what I said then, currently these fields are blank in super_extra_info but they may not remain so; we are always adding new data. So to be on the safe side, do as you say.

The language in the navigation is needed so that we can inform the user what the root language of that text or collection is. You can see this at work on the old site if you open up the sidebar and open a few navigation levels. There are little icons that indicate the root language.

https://suttacentral.net/

(In the current design on staging we do not do this. The reason is that the root language information is included in the blurb. I’m not sure whether it is necessary to add it elsewhere as well. In any case, it’s good to have the data available in case we do use it.)

Okay, so the main thing here is that all the lang data must be sourced from language.json so we don’t end up with clashing source files.

There you can see that we have a Boolean field is_root.

The language for navigation purposes is always root so we can probably just keep it simple and use root_lang.

That should be enough, I think? Ignore the alternative language field, I assume it is not necessary.

Indeed, that will have to be resolved. This is beyond my expertise, it will be up to yourself and @hongda to implement the needed queries and ensure that outdated queries and logic are removed.

HongDa · December 9, 2020, 5:47am

Yes, I think that once a new data source is taken to replace the root and root_edges collections, queries designed to reference root and root_edges need to be updated accordingly.