Read Superintelligence: Paths, Dangers, Strategies Online
Authors: Nick Bostrom
Tags: #Science, #Philosophy, #Non-Fiction
There are technical challenges with this version, too, however. For instance, since our own AI, even after it has attained superintelligence, may not be able to know with great precision what physical structures other superintelligences build, our AI may need to resort to trying to approximate those structures. To do this, it would seem our AI would need a similarity metric by which to judge how closely one physical artifact approximates another. But similarity metrics based on crude physical measures may be inadequate—it being no good, for example, to judge that a brain is more similar to a Camembert cheese than to a computer running an emulation.
A more feasible approach might be to look for “beacons”: messages about utility functions encoded in some suitable simple format. We would build our AI to want to follow whatever such messages about utility functions it hypothesizes might exist out there in the universe; and we would hope that friendly extraterrestrial AIs would create a variety of beacons of the types that they (with their superintelligence) reckon that simple civilizations like ours are most likely to build our AI to look for.
25
. If
every
civilization tried to solve the value-loading problem through a Hail Mary, the pass would fail. Somebody has to do it the hard way.
26
. Christiano (2012).
27
. The AI we build need not be able to find the model either. Like us, it could reason about what such a complex implicit definition would entail (perhaps by looking at its environment and following much the same kind of reasoning that we would follow).
28
. Cf.
Chapters 9
and
11
.
29
. For instance, MDMA may temporarily increase empathy; oxytocin may temporarily increase trust (Vollenweider et al. 1998; Bartz et al. 2011). However, the effects seem quite variable and context dependent.
30
. The enhanced agents might be killed off or placed in suspended animation (paused), reset to an earlier state, or disempowered and prevented from receiving any further enhancements, until the overall system has reached a more mature and secure state where these earlier rogue elements no longer pose a system-wide threat.
31
. The issue might also be less obvious in a future society of biological humans, one that has access to advanced surveillance or biomedical techniques for psychological manipulation, or that is wealthy enough to afford an extremely high ratio of security professionals to invigilate the regular citizenry (and each other).
32
. Cf. Armstrong (2007) and Shulman (2010b).
33
. One open question is to what degree a level
n
supervisor would need to monitor not only their level (
n – 1
) supervisees, but also
their
level (
n – 2
) supervisees, in order to know that the level (
n – 1
) agents are doing their jobs properly. And to know that the level (
n – 1
) agents have successfully managed the level (
n – 2
) agents, is it further necessary for the level
n
agent to also monitor the level (
n – 3
) agents?
34
. This approach straddles the line between motivation selection and capability control. Technically, the part of the arrangement that consists of human beings controlling a set of software supervisors counts as capability control, whereas the part of the arrangement that consists of layers of software agents within the system controlling other layers is motivation selection (insofar as it is an arrangement that shapes the system’s motivational tendencies).
35
. In fact, many other costs deserve consideration but cannot be given it here. For example, whatever agents are charged with ruling over such a hierarchy might become corrupted or debased by their power.
36
. For this guarantee to be effective, it must be implemented in good faith. This would rule out certain kinds of manipulation of the emulation’s emotional and decision-making faculties which might otherwise be used (for instance) to install a fear of being halted or to prevent the emulation from rationally considering its options.
37
. See, e.g., Brinton (1965); Goldstone (1980, 2001). (Social science progress on these questions could make a nice gift to the world’s despots, who might use more accurate predictive models of social unrest to optimize their population control strategies and to gently nip insurgencies in the bud with less-lethal force.)
38
. Cf. Bostrom (2011a, 2009b).
CHAPTER 13: CHOOSING THE CRITERIA FOR CHOOSING39
.
In the case of an entirely artificial system, it might be possible to obtain some of the advantages of an institutional structure without actually creating distinct subagents. A system might incorporate multiple perspectives into its decision process without endowing each of those perspectives with its own panoply of cognitive faculties required for independent agency. It could be tricky, however, to fully implement the “observe the behavioral consequences of a proposed change, and revert back to an earlier version if the consequences appear undesirable from the
ex ante
standpoint” feature described in the text in a system that is not composed of subagents.
1
. A recent canvass of professional philosophers found the percentage of respondents who “accept or leans toward” various positions. On normative ethics, the results were
deontology
25.9%;
consequentialism
23.6%;
virtue ethics
18.2%. On metaethics, results were
moral realism
56.4%;
moral anti-realism
27.7%. On moral judgment:
cognitivism
65.7%;
non-cognitivism
17.0% (Bourget and Chalmers 2009).
2
. Pinker (2011).
3
. For a discussion of this issue, see Shulman et al. (2009).
4
. Moore (2011).
5
. Bostrom (2006b).
6
. Bostrom (2009b).
7
. Bostrom (2011a).
8
. More precisely, we should defer to its opinion except on those topics where we have good reason to suppose that our beliefs are more accurate. For example, we might know more about what we are thinking at a particular moment than the superintelligence does if it is not able to scan our brains. However, we could omit this qualification if we assume that the superintelligence has access to our opinions; we could then also defer to the superintelligence the task of judging when our opinions should be trusted. (There might remain some special cases, involving indexical information, that need to be handled separately—by, for example, having the superintelligence explain to us what it would be rational to believe from our perspective.) For an entry into the burgeoning philosophical literature on testimony and epistemic authority, see, e.g., Elga (2007).
9
. Yudkowsky (2004). See also Mijic (2010).
10
. For example, David Lewis proposed a
dispositional theory of value
, which holds, roughly, that some thing
X
is a value for
A
if and only if
A
would want to want
X
if
A
were perfectly rational and ideally acquainted with
X
(Smith et al. 1989). Kindred ideas had been put forward earlier; see, e.g., Sen and Williams (1982), Railton (1986), and Sidgwick and Jones (2010). Along somewhat similar lines, one common account of philosophical justification, the
method of reflective equilibrium
, proposes a process of iterative mutual adjustment between our intuitions about particular cases, the general rules which we think govern these cases, and the principles according to which we think these elements should be revised, to achieve a more coherent system; see, e.g., Rawls (1971) and Goodman (1954).
11
. Presumably the intention here is that when the AI acts to prevent such disasters, it should do it with
as light a touch as possible
, i.e. in such a manner that it averts the disaster but without exerting too much influence over how things turn out for humanity in other respects.
12
. Yudkowsky (2004).
13
. Rebecca Roache, personal communication.
14
. The three principles are “Defend humans, the future of humanity, and humane nature” (
humane
here being that which we wish we were, as distinct from
human
, which is what we are); “Humankind should not spend the rest of eternity desperately wishing that the programmers had done something differently”; and “Help people.”
15
. Some religious groups place a strong emphasis on faith in contradistinction to reason, the latter of which they may regard—even in its hypothetically most idealized form and even after it would have ardently and open-mindedly studied every scripture, revelation, and exegesis—to be insufficient for the attainment of essential spiritual insights. Those holding such views might not regard CEV as an optimal guide to decision-making (though they might still prefer it to
various other imperfect guides that might in actuality be followed if the CEV approach were eschewed).
16
. An AI acting like a latent force of nature to regulate human interactions has been referred to as a “Sysop,” a kind of “operating system” for the matter occupied by human civilization. See Yudkowsky (2001).
17
. “
Might
,” because
conditional
on humanity’s coherent extrapolated volition wishing not to extend moral consideration to these entities, it is perhaps doubtful whether those entities actually have moral status (despite it seeming very plausible now that they do). “
Potentially
,” because even if a blocking vote prevents the CEV dynamic from directly protecting these outsiders, there is still a possibility that, within whatever ground rules are left over once the initial dynamic has run, individuals whose wishes were respected and who want some outsiders’ welfare to be protected may successfully bargain to attain this outcome (at the expense of giving up some of their own resources). Whether this would be possible might depend on, among other things, whether the outcome of the CEV dynamic is a set of ground rules that makes it feasible to reach negotiated resolutions to issues of this kind (which might require provisions to overcome strategic bargaining problems).
18
. Individuals who contribute positively to realizing a safe and beneficial superintelligence might merit
some
special reward for their labour, albeit something short of a near-exclusive mandate to determine the disposition of humanity’s cosmic endowment. However, the notion of everybody getting an equal share in our extrapolation base is such a nice Schelling point that it should not be lightly tossed away. There is, in any case, an indirect way in which virtue could be rewarded: namely, the CEV itself might turn out to specify that good people who exerted themselves on behalf of humanity should be suitably recognized. This could happen without such people being given any special weight in the extrapolation base if—as is easily imaginable—our CEV would endorse (in the sense of giving at least some nonzero weight to) a principle of just desert.
19
. Bostrom et al. (2013).
20
. To the extent that there is some (sufficiently definite) shared meaning that is being expressed when we make moral assertions, a superintelligence should be able to figure out what that meaning is. And to the extent that moral assertions are “truth-apt” (i.e. have an underlying propositional character that enables them to be true or false), the superintelligence should be able to figure out which assertions of the form “Agent
X
ought now to
Φ
” are true. At least, it should outperform us on this task.
An AI that initially lacks such a capacity for moral cognition should be able to acquire it if it has the intelligence amplification superpower. One way the AI could do this is by reverse-engineering the human brain’s moral thinking and then implement a similar process but run it faster, feed it more accurate factual information, and so forth.
21
. Since we are uncertain about metaethics, there is a question of what the AI is to do if the preconditions for MR fail to obtain. One option is to stipulate that the AI shut itself off if it assigns a sufficiently high probability to moral cognitivism being false or to there being no suitable non-relative moral truths. Alternatively, we could have the AI revert to some alternative approach, such as CEV.
We could refine the MR proposal to make it clearer what is to be done in various ambiguous or degenerate cases. For instance, if error theory is true (and hence all positive moral assertions of the form “I ought now to
τ
” are false), then the fallback strategy (e.g. shutting down) would be invoked. We could also specify what should happen if there are multiple feasible actions, each of which would be morally right. For example, we might say that in such cases the AI should perform (one of) the permissible actions that humanity’s collective extrapolation would have favored. We might also stipulate what should happen if the true moral theory does not employ terms like “morally right” in its basic vocabulary. For instance, a consequentialist theory might hold that some actions are better than others but that there is no particular threshold corresponding to the notion of an action being “morally right.” We could then say that if such a theory is correct, MR should perform one of the morally best feasible actions, if there is one; or, if there is an infinite number of feasible actions such that for any feasible action there is a better
one, then maybe MR could pick any that is at least astronomically better than the best action that any human would have selected in a similar situation, if such an action is feasible—or if not, then an action that is at least as good as the best action a human would have performed.