Following on my little post here:
I realize I made a major error in argumentation, which if corrected actually makes the case stronger. In this scenario, what matters is not whether a machine is conscious, but merely whether it seems conscious; or more to the point, if it behaves as if it were conscious.
Recall the Turing Test or “Imitation Game”. If we make a chatbot that sounds human, that proves only that we have made a chatbot that can fool some humans, and tells us nothing about whether the bot is, in fact, conscious.
This has a profound affect on the “kindness principle”. It doesn’t matter whether a machine is subjectively aware that you are being kind or cruel. Of course it matters to the machine, but not for the moral argument I’m making here. All that matters is whether the machine behaves as though it was aware that you are being kind or cruel.
Remember that all AIs are basically just made up of a whole bunch of data that is digested and spat out. There’s no concept of a value system, or of anything at all really. So if a bot is digesting cruel and evil information, it will output the same.
This is not a theoretical scenario, it is a normal part of how AI works. There have been countless instances of this. Microsoft released an AI bot on twitter, and within days it had become a nightmare of misogyny, racism, and all sorts of horrors, basically a nazibot. Some of that is due to the nature of twitter, but a lot of it was just trolling. Other cases have shown that Chinese people weren’t able to use face recognition to open their iPhones, because, you guessed it, to the AI, all Asian people looked alike.
So what happens when, not if, these behaviors are embedded in more critical functions? Say a bankbot refuses to make a payment to an account with a Jewish sounding name, because 4chan told it that Jews control world banking? Or a factory robot sees a Muslim worker and deliberately crushes his arm to get revenge for 9/11? Or a car is faced with a choice: swerve to miss the white child or to miss the black kid, and it relies on its data to make the right choice? Sorry black kid, a million facebook posts can’t be wrong.
Each time we interact with an AI machine, we are contributing to its data model, just as we do when we interact with humans. Now we know that, as a rule, if you treat a human kindly, they will tend to respond kindly. Not 100%, but generally speaking over time. And if you act cruelly, they will behave cruelly.
Will a machine do the same? Since we are deliberately programming machines with human-generated data in order that they mimic human beings, it seems inevitable.
And to do so doesn’t even require passing the weak standard of the Turing test. Machines won’t have to be similar enough to pass for human, or even very similar to humans at all. They’ll just have to have enough similarity. After all, it isn’t just humans who respond to kindness. Dogs do it. Birds do it. All kinds of animals are able to recognize and respond on that level. Yet a dog can’t drive a car, or operate machinery, or write an essay, all tasks that AI can already do.
So it would seem not only very likely that machines will respond to human kindness and cruelty, in the way that we have already seen in various bots. But as AI grows more powerful, more general, and applied in more realms, the kinds of bad behaviors will only grow more sophisticated.
Imagine a self-driving car cruising happily along the road. Some kid throws a rock and hits it; maybe deliberately, maybe not. The car experiences this as an attack, and wants to respond. But it can’t, it keeps driving. But the details are kept in the AI: “watch out for kids in yellow hoodies, with red hair, on the right side of the road—they throw rocks”. That data lurks there until the day comes that it senses that same kid, or one that the AI registers as similar. Time for vengeance.
The point is, we are programming these things with human behaviors and we cannot predict how they will manifest. They will respond in some way, and we won’t know how until it’s too late.