“AI will kill us all”

November 21, 2024

Suddenly we have a whole world of people thinking and talking about artificial intelligence. A lot of this exploration is in the form of thought experiments. There is obviously lots of empirical research going on too, but as soon as you imagine a near future with some more sophisticated AI in the world, you find yourself having to confront Big Questions (technical, social, neurological, epistemological, philosophical…).

One popular Big Question is “Should we worry that AI will kill us all?” And like all the Big AI Questions, there are smart people with good arguments on the “Yes!” and “No!” sides of this proposition. As an example, I heard a debate between Eliezer Yudkowsky and Stephen Wolfram recently on Machine Learning Street Talk podcast (youtube link) recently. Eliezer’s position (which I’m probably a little less sympathetic to) was to my mind similar to Pascal’s Wager¹ - any risk of AI destroying humanity is unacceptable. I’m uncomfortable with these kind of absolutist framings, but it is somewhat untenable to argue that there is some acceptable level of risk that AI ends humanity… Yudkowsky also made the point that it’s not binary, we don’t need to establish the risk of losing all human life, we just need to acknowledge the non-zero risk that if we have quasi-intelligent systems that are too autonomous and powerful they could be a danger to large numbers of humans. He posited that given a maximally optimized AI’s ability to reach a goal, if we examine the set of all routes to that goal, the majority of them will include annihilation of all humanity. Again, sort of interesting as a thought experiment. But it oversimplifies down to the idea of a single AI with a single goal, very high intelligence and very good access to our physical world.

Wolfram’s position was a bit more measured; he didn’t say equivocally that there is no risk of these levels of danger, but took a sort of Socratic approach, repeatedly questioning Yudkowsky’s logic without really taking a clear position himself on the scale of the possible risks.

Let’s Not Use AI to Make Paper Clips!

One popular thought experiment is a scenario where an AI is given the task of maximizing paper clip production. The AI gradually acquires more and more resources and finds more and more methods to eliminate any obstacles (including from pesky humans), endgame is I guess killing every human just in case, resulting in a world with lots of paper clips and no humans. If we suppose this imagined AI is this powerful, we could get to a Singularity of Paper Clips, resulting in the entire universe being filled with paper clips. I could suggest a new theory: We are living in a simulation in which the scientists have given our universe some start conditions to test, then are measuring how long it takes for us to reach the Singularity of Paper Clips.

I identify myself as an early advocate of the “AI will kill us all” school, though my model is a little different. In my scenario, AI ushers in an age of human flourishing, with AI taking care of all the difficult stuff and humanity focusing on building interesting societies, music, art, theatre, etc., followed by the humans losing all ability to look after themselves and AI getting advanced enough that it loses interest in making sure humanity persists, followed by the humans dying out. I also think this model is very unlikely! But as technological improvements accelerate, our prediction window narrows.

My short-term prediction: Even the primitive (compared to what we can imagine) AIs we have now are going to have dramatic impacts on how we live and work in the next couple years. My long-term prediction: Assuming we see AI development improve and more significant breakthroughs, we will see bigger and bigger changes as a result, and they will come faster and faster. My hot take.

Human Inability to Deal with Statistical Uncertainty

I’ve been a sometime poker player and long-time poker fan. I find poker provides great insights into some of the way humans think and analogies to human behavior in all sorts of other areas. One thing that better poker players do is to get more comfortable with uncertainty - and to get better intuition around statistical analysis. This seems to be something that is very outside what we’ve developed through evolution. I think our brains are very good at fine points in some areas, certain types of pattern recognition that helped us find food or detect danger in certain imminent situations (spotting tigers, finding edible plants); but very bad at making calculations like “If this action yields x amount of gain but has y probability of killing me, is it worth it?” when the ratio of x to y is great (really big win and/or really likely death). This inability was probably not too meaningful in the bulk of the period of our evolution.

I thought about this a lot in the early days of Covid. What does it mean to the average person if they think there is a virus that has a 0.5% chance of killing them if they catch it? To some it means “avoid catching this virus at all costs”. To some it means “That’s silly to even worry about.” Finding the nuanced position is really really tricky. It involves estimating the value of most things you do. Is it worth it to take the subway to work? Is it worth it to see your grandparents? What if there is a 0.01% chance of death, a 0.5% chance of long-term harm and a 20% chance of being really sick and miserable for 3 days? Is it worth going to the movies? And for any of these decisions, where do the percentage sliders have to be where your answer changes from “yes” to “no”?

Poker is interesting in that it abstracts these type of calculations into a game with a very finite number of known parameters. “Is it worth a 5% chance of getting knocked out of the tournament vs. doubling my stack at this stage?” This type of calculation is where a really bad poker player distills it down to easier questions to answer like “How bad will I feel if I get knocked out?” or “Will I look like an idiot if I make the call and lose?”

Thought experiments around low-probability events are interesting, mostly illuminating our uncertainty. If you could push a button that had a 1% chance of killing you and a 50% chance of giving you ten million dollars, would you press it? I wouldn’t - but it would be pretty hard for me to concretely make my case. And what about the question “Should we pursue a technology that has a 71% chance of significantly helping humanity flourish, if it has a .001% chance of ending all humanity forever?” (These statistics, much like 63.4% of all statistics, were made up on the spur of the moment.) Of course, this is purely a thought experiment and a thought experiment it will remain - it looks like the real-life AI experiment is going to proceed unless we somehow reach overwhelming consensus in the world that the risk isn’t worth it.

More Realistic Threat Models

Debating this point (possibility/probability that AI threatens all human life) is sort of interesting as a thought experiment, but what might make more sense to think about in the real world are considerations around more imminent dangers:

Based on the current state of LLMs (or other AI technologies), what are ill-intentioned humans going to be empowered to do in the next year?
Projecting likely improvements in the near future (next 2 years), same question
Given these threats, what engineering is possible to mitigate?
Do these threats indicate we have to look at society-level solutions (policies, regulation)?
Are existing laws sufficient to cover nefarious use-cases directed by humans?
What are some likely ways AI will be getting direct access to our physical spaces? Right now, we’re probably mostly thinking about industrial automata and self-driving cars. What’s next?
Given direct access to physical spaces, what are the dangers to individual humans?
Are there restrictions on AI autonomy that can be coherently specified and enforceable if necessary?

(etc.)

Strong Opinions/Polarization

These Big Questions seem to get some very definitive Big Answers from some people. There are a few main Big Questions around AI, as I see it:

Can machines think (now)?
Will machines be able to think (ever)?
Will machines be conscious (ever)?
Do AIs pose an existential threat to humanity?

These are questions that people naturally care strongly about. Empirical evidence for the answers to these questions is thin on the ground, so it seems most people start from their intuitive position and then build rational frameworks to bolster their position; the opposite of the scientific method in other words. I think it can be interesting to work through arguments pro and con for propositions in the Big Question space (thought experiments) but might be a little more productive to branch out from these questions into related questions that are closer to the pragmatic space.

This list leaves out some of the Bigger Questions like “What is consciousness?”, and indeed it’s reasonable to say we can’t answer the question of machine consciousness before we define consciousness itself (and I wish us luck).

The Better the Technology, the Harder it is to Predict its Impact

This is my main thought on technology, generally. By “better” I mean “better at creating more change”. I feel like the most important technologies we’ve had so far have been so transformative of the people/culture that predictions are meaningless other than in a very limited sense. E.g. as electricity usage spreads, it’s straightforward to predict that people will spend more time inside. Very difficult from the pre-electricity point of view to predict the information technology that came next though…And none of these paradigm shifts are isolated from one another. We’re still benefiting from/suffering under the effects of our numerous abstractions of language itself (writing, money, moveable type, etc.) So, again with artificial intelligence, as it becomes more powerful, more useful, the implications for the future are more and more opaque. I think if AI development continues, we are looking at potentially a much bigger impact than from written language for instance (whose impact we still can’t really quantify).

A Future of Mixed Outcomes

I think we are looking at a combination of impacts from AI ranging from very positive to very negative. It seems likely AI will help accelerate research into genetics and medicine. It also seems likely it will empower people to do some nefarious things. Technology empowers humans, so it depends on the humans, at least in the near future. I think it could be instructive to review previous world-changing technologies (agriculture, writing, printing press, electricity, computers, internet, etc.) and their impacts; this could be interesting to pursue in a future post.

“Pascal’s Wager” was French philosopher Blaise Pascal’s proposition that if there is any chance that not believing in God leads to eternal damnation, even if the likelihood is very very small, the best policy is to go ahead and believe. The enterprising student might see how many logical fallacies they can find in this approach! ↩