Deploy new fonts

sujato · April 13, 2015, 7:21am

I’ve just enabled our new fonts for Thai and Devanagari. You can see the Thai here: http://suttacentral.net/th/mn10 and the Devanagari here: http://suttacentral.net/hi/mn10

These fonts, like our main Roman font, are supplied by the leading font design firm for international scripts, Rosetta. The Devanagari is the award-winning extension of Skolar, while the Thai is Lumen. Lumen also comes with a lovely Burmese script, which, however is still in the beta stage.

I think the Buddhist texts are the most beautiful words ever spoken, and they deserve to be presented in the best typefaces available.

These fonts are not yet available in the Pali text. I have, however, made adjustments to the CSS to make this happen, using the ISO 15924 standard for scripts.

For @blake, to enable these see /text/common/scss. We need to insert the ISO script code on the <div id="text" lang="pi">, so we will have:

<div id="text" lang="pi-Thai">
<div id="text" lang="pi-Sinh">
<div id="text" lang="pi-Mymr">
<div id="text" lang="pi-Deva">

If I am not mistaken, that should do it.

I have also enabled @font-face fonts, mostly Noto, for the following complex scripts. Since these don’t have an equivalent of small-caps, we use bold for headings. If anyone notices any problems with these, please let us know.

Korean
Burmese
Sinhala
Japanese

sujato · April 19, 2015, 11:16pm

@blake, the new Rosetta fonts have disappeared. They had gone entirely from the fonts folder, I have now replaced them, but can’t get them to show up up the site. They work fine locally. See /hi and /th texts.

blake · April 20, 2015, 7:23am

@sujato
With the nonfree fonts because we don’t add them to the git repository (the nonfree folder is .gitignored) they need to be manually sent to the server. When you are happy with the state of your nonfree font folder, you can deploy the files with these commands:

invoke deploy.staging.nonfree_fonts
invoke deploy.production.nonfree_fonts --force

Note that as a rule all font file names should have a version suffix, also I suggest renaming using the convention established by JohnN, that is the all lower case hyphen separated with version:
LumenThai-Bold would be lumen-thai-bold-1

When any changes are made to the site, because of Cloudflare caching, it might take up to 4 hours for the changes to generally appear. The quickest way to see a fresh version of the page is to add a ?foo=1
prefix to the url (where foo is a random string), creating a unique URL which Cloudflare hasn’t seen before. From the Cloudflare control panel you can also clear the cache or enable development mode (which temporarily bypasses the cache but does not clear it).

sujato · April 20, 2015, 8:19am

I did this, although I didn’t use --force, is that necessary?

yes: can I leave that to you? We still need to rejig the Roman fonts to use the latest version and deprecate the dedicated small-caps fonts in favor of open-type via CSS. I was basically trying to just get things working and see how they were. When you get some time if you could go through them. I’m still not really clear how the whole sass font setup works, which is why I’d be more comfortable if you did the finalizing.

You mean suffix, right? I did that too. Now I can see the Thai (Luman) but still not the Deva.

blake · April 20, 2015, 8:23am

For whatever reason JohnN made the invoke deploy.production.nonfree_ fonts perform a dryrun, and when you use --force it actually transfers the files.

sujato · April 20, 2015, 9:08am

I thought it was uploading mighty fast…

I’ve now done it using --force, and have restarted. But the same as before: I get the Lumen and skolar Deva working locally, but only Lumen working online. As far as I can see the CSS is identical.

blake · April 20, 2015, 11:03am

Okay I can see why it’s not working. On Linux there are file permissions, Nginx is set to serve static content from the suttacentral/static folder, however if those files don’t have read permission for non-owners, then nginx can’t read them, and returns a 403 ‘Forbidden’ error, since nginx is running under the ‘www-data’ user, not the ‘sc-production’ or ‘sc-staging’ users which run the CherrPy servers, it can only see sc-production’s static files if the permissions are set correctly.

It works on your local development machine because you don’t have Nginx set up, so the files are being directly served by CherryPy, which is running as the sujato user which is the same user who owns the files; owner can read, others can’t read = no problem.

rsync (which is used by the deploy fonts invoke task) will faithfully reproduce your permissions onto the server.

In this case for some reason the fonts/nonfree folder on your machine didn’t have ‘x’ permissions for other (on a directory, ‘execute’ essentially means ‘can be opened’), once that permission setting had been rysnced to the server, nginx could no longer see inside the nonfree folder, and no longer served any fonts from it - you just had some of them in the browser cache already so they still seemed to be working.

You can just do chmod o+x static/fonts/nonfree to fix this problem on your computer, or run invoke fonts.download_nonfree to synchronize from the server.

Now it can see inside the folder, it is also returning 404 errors for the .woff2 fonts since the files do not in fact exist, however it proceeds to download the .woff fonts and is happy.

sujato · April 20, 2015, 11:22am

Technically speaking, hooly dooly! Thanks for working this out.

sujato · May 4, 2015, 10:38am

@blake, another detail on font usage. I’ve just noticed that the correct fonts are not being used in the metaarea. I haven’t investigated all of them, but I guess it is similar. We need a way to ensure that the language-specific font is used for the auto-generated ToC. In addition, the Metadata may be in the specific language font or in English, or both. So I guess we should use the language-specific font as default, overriden by en if necessary.

blake · May 4, 2015, 2:35pm

Ah yes, I suppose it wouldn’t be, as the sidebar is not a child of the div#text

We should probably try to think of a clever way to minimize the language CSS to make using the right font as easy as possible. I think it would be beneficial to lower the specificity of the rules, getting rid of #text and other context selectors wherever possible, and using something a lot like this as a starting point:

main {
    /* Default font for contents of page (<main> excludes header and footer) */
    font-family: 'Skolar Sutta Web'
}

/* Language Fonts */
[lang=my] {
    font-family:  'Noto Sans Myanmar'
}

[lang=th] {
    font-family: 'Lumen Thai'
}    

…

[lang=en] {
    /* En is default, so this rule is for en text embedded in another language */
    font-family: 'Skolar Sutta Web'
}

Ideally, wherever possible, it should be possible to override the font for some text as easily as putting <span lang="en"> which means having very low specificity in the first place so overriding is easy.

Incidentally what’s the difference between [lang=my] and :lang(my)? If you use it on my/dn1 as a standalone selector [lang=my] will select 1 element (the div#text) while :lang(my) will select 566 elements (the div#text and every single one of it’s descendants). The pseudo-element selector certainly has it’s uses, in this specific context I think the attribute selector matches intention better - to set the font properties on one high level element and have those properties be inherited by descendants, rather than setting the property on every element.

Something else to consider would be using a class like .latin-script as something more generic than [lang=en] and more explicit than having latin as the implicit default. The .latin-script class would be added selectively at the template level based on the language iso code, and a rule like this could be used:

.latin-script {
    h1,h2,h3,h4,h5,h6 {
        @include serif-small-caps;
    }
}

Then if a language like th is used, the latin script class is not added, and the simple [lang=th] rule is applied everywhere with no need for complex overrides to revert the smallcaps rule. In other words, the rule is applied only where it is applicable, ideally only where it is definitely applicable.

A rethink of how fonts are applied, with dramatic simplification, would tie in well with using a font loader to avoid FOUT. Due to the horrible horrible limitations of browsers, font loaders need to do some pretty funky tricks to determine when a font has actually been loaded, and the long and the short of it is you end up having to implicitly specify in CSS what fonts to use.

The Web Font Loader sets classes on the <html> element for when all fonts are loaded (or failed) and specific fonts loaded (or failed)

.wf-loading
.wf-active
.wf-inactive
.wf-<familyname>-<fvd>-loading
.wf-<familyname>-<fvd>-active
.wf-<familyname>-<fvd>-inactive

So you would then set some rules like this:

/* Hide page contents while fonts are loading */
.wf-loading main {
    visibility: hidden;
}

/* Yay! Font goodness! */
.wf-active {
    main {
        font-family: 'Skolar Sutta Web';
    }

    [lang=my] {
       font-family:  'Noto Sans Myanmar';
    }
    
    [lang=th] {
        font-family: 'Lumen Thai';
    }
    …
}

.wf-inactive {
    main {
        /* let the browser choose fonts */
        font-family: 'serif';
    }
}

You could also set rules for specific fonts, like .wf-skolar_sutta_web-n4-loading, which would mainly be useful for if some fonts might fail to download, because they are really big or something. wf-active only gets set once every font has finished loading which normally is what you want, but you might want some elements of the page to be able to render before every font is loaded (the header is an example).

Web Font Loader gives you a lot of control but you need to be quite explicit, you tell it exactly what fonts you want loaded for the page instead of relying on the built in logic of @font-face to decide what fonts to load.

The level of explicitness does mean it can lead to blow-out of rules so if we were to use Web Font Loader, we would also want to simplify the font style rules in general, which ties in well with my suggestions earlier in this post.

Note that it is possible to both use the Web Font Loader and have font-face fonts work even without it (i.e. on a device with javascript disabled), in this usage Web Font Loader isn’t responsible for loading the fonts and simply monitors the loading progress and you can use .wf-loading to avoid FOUT and .wf-inactive to provide explicit fallbacks if loading took too long. In this usage avoiding FOUT is easy, all you actually need is a rule like this:
.wf-loading #text {visibility: hidden}
However providing fallbacks is not any simpler in this usage as you’d still need a heap of rules for .wf-inactive.

If we were to go with a Web Font Loader, we would want to have a separate manifest file (probably JSON) mapping language codes to font families. If a language only uses default fonts (i.e. Skolar) it would not need an entry. If we wanted to use something like the .latin-script idea, then this file could also define things like that.

sujato · May 5, 2015, 12:17am

Sounds good.

Wow, i never knew that.

We’re already using, or should be using, ISO 15924 to define scripts, although this is not working yet. Would it not be better to stick with one, ISO-approved, way of defining scripts, rather than introducing a new CSS class for the same thing?

Apart from this, if you want to implement a simpler way of describing the font use, go nuts.

Another thing to consider. @Vimala has started working on Farsi and Hebrew, which are rtl scripts, and that means we will need support for Unicode bi-di.

As for using the Web Font Loader, I’m not that enthused about it. FOUT is not a bad problem, so unless it does more than that it strikes me as a complex set of hacks for a tiny problem, one which will, in all likelihood, go away in the next couple of years as browser handling of @font-face improves.

blake · May 5, 2015, 6:20am

Let me know how this should be working or how you want it to work. I see there are ways of defining script, from the wiki:

This way one could differentiate, for example, between Serbian written in the Cyrillic (`sr-Cyrl`) or Latin (`sr-Latn`) script, or mark romanized text as such.

If you diligently went around and used ‘fully qualified’ language codes, you could then use the attribute selector, specifically the ‘ends with’ variant:

[lang$="-Latn"] {
    h1,h2,h3,h4,h5,h6 {
        @include serif-small-caps;
    }
}

If we were going to do this, we’d need to use the fully qualified language codes everywhere, although we could “upgrade” the normal language codes on the fly before delivering a page to the browser.

sujato · May 5, 2015, 7:24am

We are currently, as per my OP in this Topic, going to use the script selector to define scripts for the Pali texts. If we are to define scripts elsewhere, it would seem to make sense to use the same method. So we would use the fully qualified language codes everywhere that we would have used a CSS selector in the method you proposed. Which, probably, would be on every page, I guess, why not? It would help with keeping everything explicit.

blake · September 15, 2015, 6:44pm

So further adventures in Font Land

I’ve noticed that the Skolar fonts such as SkolarDevanagari contain both the regular skolar glyphs, and the specialized devanagari glyphs.

It is possible to subset it to remove the glyphs available in regular skolar, and then use a font stack like this:

[lang=hi] {
    font-family: "Skolar Devanagari" "Skolar Sutta Web" serif;
}

The way it works in CSS (according to the 2.1 spec, and also in practice in most browsers), is that for each character (not only each element), it searches the font stack for the first font which contains the required glyph. So the above CSS works as you would hope it does in the case of roman text embedded in devanagari text, if it can’t find a glyph in Skolar Devanagari, it looks for it in Skolar Sutta Web (of course, ideally you would wrap any foreign text with the proper lang, making the issue moot)

Because the Devanagari glyphs are pretty big, the savings aren’t huge, it reduces the file size from 456kb to 316kb (for TTF), which is around about a 30% size reduction.

Another subsetting example. The Korean font ‘KoPubBatang’ contains around 17,000 glyphs. On SuttaCentral there are only 960 unique unicode codepoints in the Korean texts (excluding those codepoints contained in Skolar - mainly found in english/pali snippets and such).
KoPubBatang weighs in at about 6.7MB, if subsetted that becomes 525KB (these numbers for TTF). It seems to me that downloading and subsetting the font would be advantageous.

I have been implementing a font compilation step in assets compilation which automates font management. The basic steps are:

Generate subsets of language-specific fonts using Skolar as the default master font.

Protect these fonts with an obfuscation mechanism which prevents use as a system font, this would be in line with our terms of use agreement for Skolar. Disabled on development server for ease of debugging.*
Generate a menu specific font containing only those glyphs required to render language names.
Create a BASE64 version of the font required to render “ SuttaCentral” to avoid FOUT.*
Automatically generate cache-busting font names, skolar-devanagari-regular-{XYZ}.woff, allowing the font files to be un-suffixed. This can also be used for obfuscation by removing the prefix.
Automatically create a WOFF2 version for further size reduction on supporting software.
Generate the required SCSS using a Jinja2 template, in a way which unifies the @font-face declarations (programmatically generating the CSS is required to make sure the cache-busting names are inserted correctly).

(*) low priority to implement

Essentially the font compilation step examines the contents of the free and nonfree font folders, generates new font files in the various flavors, and generates the appropriate CSS to @font-face those font files. Having this fully automated ought to be less error prone than manually subsetting.

I am still considering which fonts formats to use. WOFF2 and WOFF seem good. As it happens, nearly all tools use TTF/OTF as the base format - even the tool which generates WOFF2 fonts wants to receive a TTF, and if you want to create a WOFF2 from a WOFF you have to first convert the WOFF to TTF. So it makes sense to use TTF as the master format even if we don’t serve it. The utility which generates WOFF from TTF called webify will also cheerfully generate EOT and SVG (in fact it has to be told not to!), so including these formats is not a big deal.
One argument for delivering only WOFF2 and WOFF is having a browser support cutoff - anything too ancient to support WOFF should just fall back on system fonts, and the argument is the ancient device probably has an easier time rendering its system fonts. A good example is that Opera (pre-Blink) looks a lot better if you let it use its own fonts because for some reason it just doesn’t know how to render Skolar.

sujato · September 16, 2015, 12:25am

We tried this a couple of years ago for the smallcaps fonts and browser support wasn’t great; but perhaps it has improved.

When relying on this, remember that the way this kind of fallback works will differ in different languages. In particular, the CJK, and possibly other, languages use Roman for Indic terms, whereas Brahmi-descended scripts (Thai, Myanmar, etc) don’t. It would be good to check up on some other non Roman scripts (Cyrillic, Hebrew, Arabic) and see what they do. So if falling back from Devanagari to Skolar, no problems.

However, what happens in Japanese? The font has its own Roman glyphs, so currently it probably uses these and then falls back to the next font in the stack for the diacritical glyphs, resulting in just the kind of yuk we want to avoid. So fine, subset out the Roman glyphs and use Skolar, except then does that actually fit stylistically, or cause any line-height problems? Perhaps we should, in these cases, fall back on Noto, not Skolar.

And, BTW, don’t forget that we should, as we make these changes, also be changing our handling of smallcaps to use the CSS opentype method (not font-variant) rather than a distinct smallcaps font.

definitely worthwhile, especially since internet connections in much of India aren’t great.

I don’t know much about Korean, but I think this is because the bulk of the glyphs are in fact Chinese, which in Korean, as in Japanese, are used together with the native glyphs. Surprisingly, while google has made available on their early access Brahmi (!) and Karosthi (!!) and cuneiform (!!!) they still don’t have Noto Chinese or Korean. Perhaps we’d be best off subsetting our own Noto CJK for Korean. (the Noto CJK fonts, in particular, are fantiastic. Extremely well designed, extensive glyph set, stylistic variants for the local languages, suitable for screen …)

And BTW, can we have a Brahmi option for the Pali texts? That would be so cool!

I’ve been doing this with my texts, or at least using <i></i>.

Actually, according to caniuse, currently there’s only 0.07% of users still on IE6, so the “old browser” thing is not so important. IE8 is the only significant (1.45%) woffless PC browser, and we shouldn’t support it. Almost all woffless browsers are in fact mobile: Opera Mini (5.37%, tho only 1.15% on SC) and old Android (4-ish%). Opera Mini doesn’t support @font-face anyway, so that’s irrelevant. Since turnover of phones is fast, and there isn’t the old problem of institutional browser lock-in at places of work and so on that keep users on old systems, old Android will fade pretty fast. Also, old phones probably don’t want to download much data, so there’s that. All in all, I’d stick with woff and woff2.

Not just Skolar, Opera font handling sucks in general. John N put in place an Opera-specific fallback for just this reason: this should probably be deprecated now, yes?

Finally, bear in mind that we’re still waiting for the final version of Lumen Myanmar from Rosetta.

After all that, let me just say: your process sounds great, please go ahead!

blake · September 16, 2015, 6:01pm

Yeah I have observed the yuk problem, for example in one of the Korean texts (/ko/mn10), we have 열반 (涅槃, nibbāna) 의 실현을.

That is one kind of snippet, and I don’t think it has to be stylistically compatible with the text around it, because it is alien, in a sense it is not a part of the flow (when it is important it be a part of the flow I would guess they would use the special fullwidth english alphabet characters). When I look at cases like the hebrew, which has pali in the titles, to my eyes using noto for the pali doesn’t look especially compatible when compared with liberation (the browser choice) or skolar - the glyphs are just so extremely different that it seems all the fonts are equally mismatched.

The other kind of embedded language is that of a whole sentence or paragraph, for example in the Thai metadata, we have

သာသနာရေး ဦးစီးဌာနမှ ညွန်ကြားရေးမှူး ဦးဝင်းနိုင်က ပုံနှိပ်ထုတ်ဝေသည်။

Prepared for SuttaCentral by နမြေိုးဝငျး and John Nishinaga.

At the moment the proper font isn’t used in the metadata - but that will be fixed. If the whole thing is in Lumen Thai then the “Prepared for…” bit is stylistically the same as the Thai text, but that english text is then completely at odds to english text everywhere else on the site, it’s a lot heavier and to my eyes it looks like the wrong font is being used.

So my argument would be that the same font should always be used when rendering english or pali, because this will provide a higher level of consistency across the site and simplify the CSS rules. Of course for pali snippets in non-roman script we would want to use the specific script variants such as “pi-Thai” in the lang tag.

Hah, the existing transliteration code is the work of a mad genius, so I don’t know.

I did manage to find a font based on the Edicts of Asokha, it’s called “Imperial Brahmi”
Page: https://sites.google.com/site/brahmiscript/
That page contains badly broken links, so here are the corrected links:
Font: https://sites.google.com/site/brahmiscript/ImperialBrahmi.ttf
Editor: https://sites.google.com/site/brahmiscript/BrahmiLipi.zip

Unfortunately it’s not a unicode font, and the code is in C#, but it is something.

sujato · September 16, 2015, 10:55pm

The main thing is that it should be at least sane, which would mean that the Roman glyphs have to be stripped from the Korean font and a consistent fallback used. CJK fonts have notoriously bad Roman glyphs, you see it all over the place in Taiwan in signage and text. Part of the problem is the monospace thing. But at least with Noto we have a reasonable not-screamingingly-badness. That’s basically the reason I’m pushing for it as a universal fallback: not that its the best for every language, but that it is pretty good, and solves a bunch of problems like this.

Fair enough, but a different situation: those titles are meant to contrast. Certainly it should be fixed from what it is now: it needs the text-transform:lowercase, and probably our usual sans applied.

But generally speaking the same logic applies: regardless of what other fonts look like, we can be reasonably confident that Noto will match with Noto.

Umm, that’s Burmese, but whatever!

But in the i18n version most people won’t see the rest of the site. The critical thing is that the typesetting in any one page works well. If it is different in different pages, that’s okay. The main priority is the experience of the person who is reading a sutta. Everything else, including uniformity of the site as a whole, is secondary to that.

The main purpose of a text body font is to not call attention to itself for the person who is reading it. Harmonizing the body font is part of that. In many cases, for example, the Noto exotic script font is a sans, and Skolar will strongly stand out against that. This detracts from readability: it draws the eye to the thing that’s different, and this should be avoided.

The typesetting in the sidebar is not so critical. My sense would be to keep Source Sans for the Roman text, and Noto Sans for the exotic scripts. Bear in mind, in the i18n version, the tabs and so on will also be in the exotic script. However this creates a problem of download size, if we’re calling say Lumen for body text then another Burmese script in the sidebar. It’s a fairly marginal case, because it’s only these two languages, but I’d err or the side of just keeping Lumen everywhere. (Unless, I guess, we can call the Noto Sans fonts only when and if the sidebar is opened.) In Burmese and Thai, I don’t think we’ll see Roman glyphs used in the body text at all, so we can strip these and use Source Sans as usual in the sidebar.

So I’d propose the following:

In an exotic script context, generally strip the Roman glyphs (I believe this is already the case for the Noto fonts).
If the body text font is Noto, we should use Noto as fallback for body text, either Sans or Serif as appropriate, but continue to use Source Sans in the sidebar.
If the font is Devanagari, use Skolar as fallback for body text (obviously).
If it is Thai or Burmese, use Lumen everywhere for the main script, with Source Sans as fallback.

Note also as a general rule that we should use the serif and sans versions of Noto where they are available, in the same way as in English, i.e. serif for body text and most headings, sans for sidebar and UI elements, and “superheadings” (above the <h1>). This is, of course, as long as the download is not prohibitive.

Well, we just need another mad genius, then, don’t we? Where might we find one of those, I wonder ….

You must have missed the bit where I said that Google has in fact released Noto Brahmi (and also Kharoshthi). And if it takes a mad genius, someone over at sanskritdictionary.com must be one, because they already have a Roman>Deva>Brahmi converter.

sujato · September 26, 2015, 10:34pm

I just noticed that they’ve released Noto Sans Tibetan:

https://www.google.com/get/noto/#sans-tibt

This should replace our current Tibetan font.