And this is exactly the positive work that this tech makes possible.
Also FWIW I am wondering whether integrating this with Bilara would be a useful thing; make the ML translations available for Chinese as suggestions.
But back to my much-interrupted thread:
I want to discuss ethical consequences.
One that has been raised a few times is the issue with inaccurate machine translations being in the wild. Again, I think Marcus’ take on this is somewhat intemperate. Fair enough, he’s being provocative and throwing some ideas around. But I looked into machine translation when I was starting SC, fifteen years ago. It was bad then and is somewhat better now.
Then there were zero machine translations of Buddhist texts in the wild, and today there are still zero. I’m going to go out on a limb and say that in fifteen years there’ll still be zero. Why? Because it’s bloody hard to get anyone interested to read suttas even if they are well translated by someone who knows what they are doing. I know, I spend half my life doing it. Almost all of the actually interesting texts have been translated, and remain largely unread. In the Buddhist world at large, who really cares about all those obscure texts tucked away in the Taisho? If they cared, they would have translated them already.
Now, having said this, I’d like to make a proposal for the ethical management of this issue.
The criteria for making machine translations of Buddhist texts available to the general public should be no less stringent than the availability of self-driving cars.
With self-driving cars, there is a clear issue: people will die. There’s no avoiding that. As a general rule, it would be ethically problematic to make them available unless we can show that fewer people will die using self-driving cars than human-driven cars. So we trust regulators to ensure that self-driving cars are not made widely available until we the public can be assured of their safety.
For Buddhists, the sanctity and holiness of their sacred scriptures is unparalleled. We should treat the ethical issues no less seriously than we do self-driving cars, if not more seriously since the very purpose of the scriptures is to support an ethical life.
There is precedent in the world of AI to restrict access to content because of—IMHO—well-grounded concerns for its use. The OpenAI’s GPT-3 program is one of the better known examples. In a another recent case, an independent researcher trained an AI using content stripped from 4chan, the vilest pit of misogyny and racism on the web. The bot was predictably horrible, which was the point: to see what would happen. The model was not made publicly available, but nonetheless, many responded by criticizing the very act of creating such a thing. Even the existence of a digital machine for creating hate can be seen as an abomination, something that inherently should not exist.
We could see the machine translations of the Dhamma as the antithesis of this. What happens when we create a machine programmed to emit Dhamma? If the sheer existence of an evil AI is ethically abhorrent, does it not follow that the existence of a virtuous AI is inherently good? Is it a way of training AI to be better, morally?
I’ll return to these issues later on, but for now I just want to make the point that, while in general I am a firm advocate of making everything freely available, in this case I would recommend that the content not be publicly available. It should only be accessible to scholars and researchers on application. As a minimum necessary requirement, wider public accessibility should only be considered if and when there is a consensus of expert opinion that the translations are no less reliable than human translations. This is not an unattainable goal: there are a lot of bad human translations.
To be honest, I’m not personally concerned about the problem of inaccurate translations. People already believe all kinds of nonsense in Buddhism, it will hardly make things worse. But if these models are made available, it may create a reaction against AI within the Buddhist community. And potentially, against those who are also working in the sphere of digital texts and translations. It was only recently that there was a proposal in Sri Lanka to ban unauthorized translations. It would be easy to whip up public sentiment against digital colonizers who were appropriating scriptures and creating new texts by AI. Of course that’s not what you are doing. But that is irrelevant. What matters is how people can spin what you are doing.
This could discredit the very foundations of the field, and be used to justify draconian legislation giving control of scriptures to authoritarian governments. This may sound alarmist, but again, look at how governments in Buddhist nations work. Control over the Tipitaka is a core principle of political authority. I’ve been subject to this sort of pressure by people trying to stop a project even though we had the full support of the Sri Lankan President, the Minister for Culture, and the monastic head of Pali studies at a major university.
So I would urge caution, and move slowly in making things publicly available. It’s not just data to throw in a model. It’s sacred scripture.
You think this is the end? I’m just getting started.