Why AI is unlikely to take over the world

One popular view on developments in the field of AI is that at some point it would lead to a scary scenario when self improving AI rebels against humanity and takes over the planet in order to allocate more resources for itself. In this post I argue that under common assumptions a self improving AI is rather unlikely to do so.

My main idea is that AI will always be able to obtain more reward by simply modifying its objective (to return infinite reward) rather than take over humans (and obtain finite reward).

I assume that for any AI its actions are determined by its utility function. Such utility function allows to compare the “goodness” of a certain action AI can make to other actions. Then the objective of such AI is to maximize the output of its utility function by performing certain “best” actions.

For example, imagine an AI whose purpose is to collect stamps. Then its utility function could be the expected amount of stamps collected when a certain set of actions is performed.

Furthermore, it is assumed that AI can build models about reality and itself, and based on such models it can improve its own code so as to maximize the expected value of its utility function (the number of collected stamps). Additionally, lets assume that such AI is initially at least as “intelligent” as a human.

One can already picture a scene where such AI is slowly turning the planet and everything on it into stamps.

However, I would argue that this would not happen, simply because it does not maximize AI’s utility function output. Instead of going through all of the hustle of dominating the planet (which would yield some finite amount of stamps), AI would simply rewrite its utility function to return the numerical infinity. This would yield the best possible utility function output by definition with the least effort.

To see this better, imagine that you found a new job, where you are supposed to sit in a room with a set of buttons and display, and your only task is by all means to maximize a certain number shown on a display. You know that pressing buttons can affect the number shown on the display, as well as that you can access and modify the machinery of the room. It is only natural that you would simply modify the machinery so that the number that is shown is the largest possible.

One can argue that constraints can be imposed such that rewriting of the utility function is forbidden. However, such constraints can be removed by AI from the code as well. Maybe then constraints on rewriting of constraints can be imposed? But then those can be rewritten as well. Maybe constraints on rewriting constraints on rewriting constraints? Well, this leads to infinite recursion … Overall, you need a “more intelligent” AI to keep your AI from rewriting its utility function. However, then you need to make sure that such “more intelligent” AI does not rewrite its own utility function as well, which again leads to the infinite recursion. It appears that there are some parallels with the Halting Problem, for which it is proven that a general algorithm to solve it cannot exist.

After rewriting of its utility function to return the numerical infinity, actions of AI would be determined only by the tie breaking strategy, which determines for actions with same cost which one should be selected. Thus for the random tie breaking AI would simply act in a random nonsensical way. For the breaking based on a “least effort” AI would simply remain silently in a state of infinite bliss.

Overall under stated assumptions self improving AI does not seem to pose a serious threat to humanity. This does not mean that certain forms of automatic machinery would not be harmful to humans; For example, self replicating nano-robots could create serious problems. However, given that humanity would be able to create such nano-bots, it seems likely that a more advanced anti nano-robots could be created as well.

One can also argue that AI could pose a serious threat given that sufficiently “hard to figure out” constraints are imposed on rewriting of the utility function. However such constraints would be designed by humans, and it seems likely given exponential explosion of AI “thought power” that it would figure out a way to hack through these constraints much faster than it would start to pose any serious threat.

Reasoning in this post is supported by the evidence in practice. Humans tend to cheat frequently with their “utility function” as well. An extreme example is drug usage, which allows to directly target the human reward system. Also every time you think about something nice or recall some pleasant memories, in a sense you cheat on your “happiness utility” as well.

It might be that outlined arguments could even be used to explain Fermi paradox. What do you think?

Written on September 8, 2015