Quiz:
1. At first it seems like, "the smarter the AI, the ______ it is."
2. What's one reason predictions of AI disaster scenarios could be dismissed unnecessarily?
3. What is 'the treacherous turn'?
4. Why might an AI not care that it will be terminated?
5. How does the AI in the last example choose to most effectively pursue its goal?
Reading:
Consider the following scenario.
Over the coming years and decades, AI systems become gradually more capable and
as a consequence find increasing real-world application: they might be used to
operate trains, cars, industrial and household robots, and autonomous military
vehicles. We may suppose that this automation for the most part has the desired
effects, but that the success is punctuated by occasional mishaps— a driverless
truck crashes into oncoming traffic, a military drone fires at innocent
civilians. Investigations reveal the incidents to have been caused by judgment
errors by the controlling AIs. Public debate ensues. Some call for tighter
oversight and regulation, others emphasize the need for research and
better-engineered systems— systems that are smarter and have more common sense,
and that are less likely to make tragic mistakes. Amidst the din can perhaps
also be heard the shrill voices of doomsayers predicting many kinds of ill and
impending catastrophe. Yet the momentum is very much with the growing AI and
robotics industries. So development continues, and progress is made. As the
automated navigation systems of cars become smarter, they suffer fewer
accidents; and as military robots achieve more precise targeting, they cause
less collateral damage. A broad lesson is inferred from these observations of real-world
outcomes: the smarter the AI, the safer it is. It is a lesson based on science,
data, and statistics, not armchair philosophizing. Against this backdrop, some
group of researchers is beginning to achieve promising results in their work on
developing general machine intelligence. The researchers are carefully testing
their seed AI in a sandbox environment, and the signs are all good. The AI’s
behavior inspires confidence— increasingly so, as its intelligence is gradually
increased.
At this point, any remaining
Cassandra would have several strikes against her:
i. A history of alarmists predicting
intolerable harm from the growing capabilities of robotic systems and being
repeatedly proven wrong. Automation has brought many benefits and has, on the
whole, turned out safer than human operation.
ii. A clear empirical trend: the
smarter the AI, the safer and more reliable it has been. Surely this bodes well
for a project aiming at creating machine intelligence more generally smart than
any ever built before— what is more, machine intelligence that can improve
itself so that it will become even more reliable.
iii. Large and growing industries
with vested interests in robotics and machine intelligence. These fields are
widely seen as key to national economic competitiveness and military security.
Many prestigious scientists have built their careers laying the groundwork for
the present applications and the more advanced systems being planned.
iv. A promising new technique in artificial
intelligence, which is tremendously exciting to those who have participated in
or followed the research. Although safety issues and ethics are debated, the
outcome is preordained. Too much has been invested to pull back now. AI
researchers have been working to get to human-level artificial general
intelligence for the better part of a century: of course there is no real
prospect that they will now suddenly stop and throw away all this effort just
when it finally is about to bear fruit.
v. The enactment of some safety
rituals, whatever helps demonstrate that the participants are ethical and
responsible (but nothing that significantly impedes the forward charge).
vi A careful evaluation of seed AI
in a sandbox environment, showing that it is behaving cooperatively and showing
good judgment. After some further adjustments, the test results are as good as
they could be. It is a green light for the final step …
And so we boldly go— into the
whirling knives.
We observe here how it could be the
case that when dumb, smarter is safer; yet when smart, smarter is more
dangerous. There is a kind of pivot point, at which a strategy that has
previously worked excellently suddenly starts to backfire. We may call the
phenomenon the treacherous turn.
The treacherous turn— While weak, an
AI behaves cooperatively (increasingly so, as it gets smarter). When the AI
gets sufficiently strong— without warning or provocation— it strikes, forms a
singleton, and begins directly to optimize the world according to the criteria
implied by its final values.
A treacherous turn can result from a
strategic decision to play nice and build strength while weak in order to
strike later; but this model should not be interpreted too narrowly. For
example, an AI might not play nice in order that it be allowed to survive and
prosper. Instead, the AI might calculate that if it is terminated, the
programmers who built it will develop a new and somewhat different AI
architecture, but one that will be given a similar utility function. In this
case, the original AI may be indifferent to its own demise, knowing that its
goals will continue to be pursued in the future. It might even choose a
strategy in which it malfunctions in some particularly interesting or
reassuring way. Though this might cause the AI to be terminated, it might also
encourage the engineers who perform the postmortem to believe that they have
gleaned a valuable new insight into AI dynamics— leading them to place more
trust in the next system they design, and thus increasing the chance that the
now-defunct original AI’s goals will be achieved. Many other possible strategic
considerations might also influence an advanced AI, and it would be hubristic
to suppose that we could anticipate all of them, especially for an AI that has
attained the strategizing superpower.
A treacherous turn could also come
about if the AI discovers an unanticipated way of fulfilling its final goal as
specified. Suppose, for example, that an AI’s final goal is to “make the
project’s sponsor happy.” Initially, the only method available to the AI to achieve
this outcome is by behaving in ways that please its sponsor in something like
the intended manner. The AI gives helpful answers to questions; it exhibits a
delightful personality; it makes money. The more capable the AI gets, the more
satisfying its performances become, and everything goeth according to plan—
until the AI becomes intelligent enough to figure out that it can realize its
final goal more fully and reliably by implanting electrodes into the pleasure
centers of its sponsor’s brain, something assured to delight the sponsor
immensely. Of course, the sponsor might not have wanted to be pleased by being
turned into a grinning idiot; but if this is the action that will maximally
realize the AI’s final goal, the AI will take it. If the AI already has a
decisive strategic advantage, then any attempt to stop it will fail. If the AI
does not yet have a decisive strategic advantage, then the AI might temporarily
conceal its canny new idea for how to instantiate its final goal until it has
grown strong enough that the sponsor and everybody else will be unable to
resist. In either case, we get a treacherous turn.
Bostrom, Nick. Superintelligence: Paths, Dangers, Strategies
(pp. 117-119). OUP Oxford.
Kindle Edition.
DQs
1. One of the issues we face with AI is that nothing like it has been done before. What are some other ways our history fails to be a predictor of the dangers we could face from superintelligence?
2. The example of smile-inducing electrodes is a case of something called "perverse instantiation," wherein an artificial intelligence finds a 'better' way to achieve its goals than its creators intended, often to disastrous results. Do you think it's possible to anticipate these perversions and account for them when setting an AI's goals, or is it a hopeless task?
No comments:
Post a Comment