I’ve been working on our process for importing unsegmented texts, both legacy translations and root texts. The main reason for this is to reduce CPU load and generally improve the code quality. This part of the code base hasn’t been revisited for several years and is affecting the reliability of the website.
I’ve come across an inconsistency in our handling of volume-and-page markers for our texts and would like some feedback before I try to fix it. So, have a look at these two API responses:
(Sorry about the JSON, I’m not familiar with the front end code)
The first has been segmented by @cdpatton while the second hasn’t.
Each has a “volpages” attribute, “T i 421a12” and “T i 485b19” respectively.
Each then has a list of translations, with the root text first in the list. OK, not actually a translation, that’s just how we do things. These in turn have their own “volpage” attribute. null and “T 0485b21” respectively. The rest of the translations, which are actual translations, have null for their volpage attributes.
I can see why this is happening. When we load unsegmented texts and the language code is ‘lzh’ the volpage is extracted from the HTML. If we just set the volpage to null for every item in the translation we can delete the volpage extraction code and reduce the CPU load.
Sorry for the technical nature, I hope our users understand what I’m talking about.
If that’s all too complicated, I can just make the optimisation. Should be a pretty small blast radius if I’ve got it wrong. Just keep an eye on Volume and Page details when I’m done. I’ll let youse know.
I’m not sure what affect it would have on anything, I know nothing about the programming of SC. I did notice while poking around in my attempt to update the structure tree for the Dirgha Agama that unsegmented texts have a data file that holds data like the volpage in the structure folder called “text_extra_info.json”. I’m guessing that would probably be the source to use for the volpage data instead of the actual root text files … But it is just a guess. I didn’t program the website.
Bhante, give me some time. Legacy text processing already existed before I took over the development of sc, so there are some details I need to look at the code.