Read Superintelligence: Paths, Dangers, Strategies Online
Authors: Nick Bostrom
Tags: #Science, #Philosophy, #Non-Fiction
The ability of one caste to mimic another extends to oracles, too. A genie could be made to act like an oracle if the only commands we ever give it are to answer certain questions. An oracle, in turn, could be made to substitute for a genie if we asked the oracle what the easiest way is to get certain commands executed. The
oracle could give us step-by-step instructions for achieving the same result as a genie would produce, or it could even output the source code for a genie.
8
Similar points can be made with regard to the relation between an oracle and a sovereign.
The real difference between the three castes, therefore, does not reside in the ultimate capabilities that they would unlock. Instead, the difference comes down to alternative approaches to the control problem. Each caste corresponds to a different set of safety precautions. The most prominent feature of an oracle is that it can be boxed. One might also try to apply domesticity motivation selection to an oracle. A genie is harder to box, but at least domesticity may be applicable. A sovereign can neither be boxed nor handled through the domesticity approach.
If these were the only relevant factors, then the order of desirability would seem clear: an oracle would be safer than a genie, which would be safer than a sovereign; and any initial differences in convenience and speed of operation would be relatively small and easily dominated by the gains in safety obtainable by building an oracle. However, there are other factors that need to be taken into account. When choosing between castes, one should consider not only the danger posed by the system itself but also the dangers that arise out of the way it might be used. A genie most obviously gives the person who controls it enormous power, but the same holds for an oracle.
9
A sovereign, by contrast, could be constructed in such way as to accord no one person or group any special influence over the outcome, and such that it would resist any attempt to corrupt or alter its original agenda. What is more, if a sovereign’s motivation is defined using “indirect normativity” (a concept to be described in
Chapter 13
) then it could be used to achieve some abstractly defined outcome, such as “whatever is maximally fair and morally right”—without anybody knowing in advance what exactly this will entail. This would create a situation analogous to a Rawlsian “veil of ignorance.”
10
Such a setup might facilitate the attainment of consensus, help prevent conflict, and promote a more equitable outcome.
Another point, which counts against some types of oracles and genies, is that there are risks involved in designing a superintelligence to have a final goal that does not fully match the outcome that we ultimately seek to attain. For example, if we use a domesticity motivation to make the superintelligence want to minimize some of its impacts on the world, we might thereby create a system whose preference ranking over possible outcomes differs from that of the sponsor. The same will happen if we build the AI to place a peculiarly high value on answering questions correctly, or on faithfully obeying individual commands. Now, if sufficient care is taken, this should not cause any problems: there would be sufficient agreement between the two rankings—at least insofar as they pertain to possible worlds that have a reasonable chance of being actualized—that the outcomes that are good by the AI’s standard are also good by the principal’s standard. But perhaps one could argue for the design principle that it is unwise to introduce even a limited amount of disharmony between the AI’s goals and ours. (The same concern would of course apply to giving sovereigns goals that do not completely harmonize with ours.)
One suggestion that has been made is that we build the superintelligence to be like a tool rather than an agent.
11
This idea seems to arise out of the observation that ordinary software, which is used in countless applications, does not raise any safety concerns even remotely analogous to the challenges discussed in this book. Might one not create “tool-AI” that is like such software—like a flight control system, say, or a virtual assistant—only more flexible and capable? Why build a superintelligence that has a will of its own? On this line of thinking, the agent paradigm is fundamentally misguided. Instead of creating an AI that has beliefs and desires and that acts like an artificial person, we should aim to build regular software that simply does what it is programmed to do.
This idea of creating software that “simply does what it is programmed to do” is, however, not so straightforward if the product being created is a powerful general intelligence. There is, of course, a trivial sense in which all software simply does what it is programmed to do: the behavior is mathematically specified by the code. But this is equally true for all castes of machine intelligence, “tool-AI” or not. If, instead, “simply doing what it is programmed to do” means that the software behaves as the programmers
intended
, then this is a standard that ordinary software very often fails to meet.
Because of the limited capabilities of contemporary software (compared with those of machine superintelligence) the consequences of such failures are manageable, ranging from insignificant to very costly, but in no case amounting to an existential threat.
12
However, if it is insufficient capability rather than sufficient reliability that makes ordinary software existentially safe, then it is unclear how such software could be a model for a safe superintelligence. It might be thought that by expanding the range of tasks done by ordinary software, one could eliminate the need for artificial general intelligence. But the range and diversity of tasks that a general intelligence could profitably perform in a modern economy is enormous. It would be infeasible to create special-purpose software to handle all of those tasks. Even if it could be done, such a project would take a
long
time to carry out. Before it could be completed, the nature of some of the tasks would have changed, and new tasks would have become relevant. There would be great advantage to having software that can learn on its own to do new tasks, and indeed to discover new tasks in need of doing. But this would require that the software be able to learn, reason, and plan, and to do so in a powerful and robustly cross-domain manner. In other words, it would require general intelligence.
Especially relevant for our purposes is the task of software development itself. There would be enormous practical advantages to being able to automate this. Yet the capacity for rapid self-improvement is just the critical property that enables a seed AI to set off an intelligence explosion.
If general intelligence is not dispensable, is there some other way of construing the tool-AI idea so as to preserve the reassuringly passive quality of a humdrum tool? Could one have a general intelligence that is not an agent? Intuitively, it is
not just the limited capability of ordinary software that makes it safe: it is also its lack of ambition. There is no subroutine in Excel that secretly wants to take over the world if only it were smart enough to find a way. The spreadsheet application does not “want” anything at all; it just blindly carries out the instructions in the program. What (one might wonder) stands in the way of creating a more generally intelligent application of the same type? An oracle, for instance, which, when prompted with a description of a goal, would respond with a plan for how to achieve it, in much the same way that Excel responds to a column of numbers by calculating a sum—without thereby expressing any “preferences” regarding its output or how humans might choose to use it?
The classical way of writing software requires the programmer to understand the task to be performed in sufficient detail to formulate an explicit solution process consisting of a sequence of mathematically well-defined steps expressible in code.
13
(In practice, software engineers rely on code libraries stocked with useful behaviors, which they can invoke without needing to understand how the behaviors are implemented. But that code was originally created by programmers who had a detailed understanding of what they were doing.) This approach works for solving well-understood tasks, and is to credit for most software that is currently in use. It falls short, however, when nobody knows precisely how to solve all of the tasks that need to be accomplished. This is where techniques from the field of artificial intelligence become relevant. In narrow applications, machine learning might be used merely to fine-tune a few parameters in a largely human-designed program. A spam filter, for example, might be trained on a corpus of hand-classified email messages in a process that changes the weights that the classification algorithm places on various diagnostic features. In a more ambitious application, the classifier might be built so that it can discover new features on its own and test their validity in a changing environment. An even more sophisticated spam filter could be endowed with some ability to reason about the trade-offs facing the user or about the contents of the messages it is classifying. In neither of these cases does the programmer need to know the best way of distinguishing spam from ham, only how to set up an algorithm that can improve its own performance via learning, discovering, or reasoning.
With advances in artificial intelligence, it would become possible for the programmer to offload more of the cognitive labor required to figure out how to accomplish a given task. In an extreme case, the programmer would simply specify a formal criterion of what counts as success and leave it to the AI to find a solution. To guide its search, the AI would use a set of powerful heuristics and other methods to discover structure in the space of possible solutions. It would keep searching until it found a solution that satisfied the success criterion. The AI would then either implement the solution itself or (in the case of an oracle) report the solution to the user.
Rudimentary forms of this approach are quite widely deployed today. Nevertheless, software that uses AI and machine learning techniques, though it has some ability to find solutions that the programmers had not anticipated, functions
for all practical purposes like a tool and poses no existential risk. We would enter the danger zone only when the methods used in the search for solutions become extremely powerful and general: that is, when they begin to amount to general intelligence—and especially when they begin to amount to superintelligence.
There are (at least) two places where trouble could then arise. First, the superintelligent search process might find a solution that is not just unexpected but radically unintended. This could lead to a failure of one of the types discussed previously (“perverse instantiation,” “infrastructure profusion,” or “mind crime”). It is most obvious how this could happen in the case of a sovereign or a genie, which directly implements the solution it has found. If making molecular smiley faces or transforming the planet into paperclips is the first idea that the superintelligence discovers that meets the solution criterion, then smiley faces or paperclips we get.
14
But even an oracle, which—if all else goes well—merely
reports
the solution, could become a cause of perverse instantiation. The user asks the oracle for a plan to achieve a certain outcome, or for a technology to serve a certain function; and when the user follows the plan or constructs the technology, a perverse instantiation can ensue, just as if the AI had implemented the solution itself.
15
A second place where trouble could arise is in the course of the software’s operation. If the methods that the software uses to search for a solution are sufficiently sophisticated, they may include provisions for managing the search process itself in an intelligent manner. In this case, the machine running the software may begin to seem less like a mere tool and more like an agent. Thus, the software may start by developing a plan for how to go about its search for a solution. The plan may specify which areas to explore first and with what methods, what data to gather, and how to make best use of available computational resources. In searching for a plan that satisfies the software’s internal criterion (such as yielding a sufficiently high probability of finding a solution satisfying the user-specified criterion within the allotted time), the software may stumble on an unorthodox idea. For instance, it might generate a plan that begins with the acquisition of additional computational resources and the elimination of potential interrupters (such as human beings). Such “creative” plans come into view when the software’s cognitive abilities reach a sufficiently high level. When the software puts such a plan into action, an existential catastrophe may ensue.
As the examples in
Box 9
illustrate, open-ended search processes sometimes evince strange and unexpected non-anthropocentric solutions even in their currently limited forms. Present-day search processes are not hazardous because they are too weak to discover the kind of plan that could enable a program to take over the world. Such a plan would include extremely difficult steps, such as the invention of a new weapons technology several generations ahead of the state of the art or the execution of a propaganda campaign far more effective than any communication devised by human spin doctors. To have a chance of even
conceiving
of such ideas, let alone developing them in a way that would actually work, a machine would probably need the capacity to represent the world in a way that is at least as rich and realistic as the world model possessed by a normal human adult (though
a lack of awareness in some areas might possibly be compensated for by extra skill in others). This is far beyond the reach of contemporary AI. And because of the combinatorial explosion, which generally defeats attempts to solve complicated planning problems with brute-force methods (as we saw in
Chapter 1
), the shortcomings of known algorithms cannot realistically be overcome simply by pouring on more computing power.
21
However, once the search or planning processes become powerful enough, they also become potentially dangerous.