AI-Box adversarial training

An AI with unknown motivations of its own, and abilities vastly superior to ours is a mortal danger to the world. It could attack us in ways which we cannot even comprehend. Could we contain such a potentially dangerous superhuman artificial intelligence in a locked computer (“a box”) once it is created, or could it convince one of us in a dialog to let it loose on the world? Could the AI learn to manipulate humans, and get out of the containment just by asking the keepers in the right way?

Continue reading