Why I Should Work on AI Safety - Part 1: AI Will Surpass Human Intelligence
Is AI Safety really worth focusing on?
Thank you Neel Nanda for planting the seed for this post in your blog post describing how you formed your views on AI safety.
Socrates: Why is AI Safety the most important problem to work on?
Aditya: Because if this problem isn’t solved, then humanity could go extinct.
Socrates: How could humanity go extinct?
Aditya: Well, if we were to create an AI that is more intelligent than we are, then it is possible that this AI could become misaligned with human values and then engage in some behavior that severely harms humanity to the point of human extinction.
Socrates: Like what?
Aditya: Well let’s say we make an AI and ask it to make paper clips. This AI makes the mistake of believing that making paper clips is the most important goal and so decides to make paper clips at the expense of all other human values. Essentially, this AI may start stealing to make paper clips or it may destroy things and gather the atoms of these things to make paper clips. And then when humans try to stop the AI, the AI will realize that the humans are trying to turn it off, and then the AI will try to kill all humans.
Socrates: Wow, that escalated quickly. This all sounds a bit overblown, don’t you think? I feel like you skipped a lot of important steps here.
Aditya: What do you mean?
Socrates: How did the AI go from just making paper clips to suddenly stealing money and vaporizing matter? Can you walk me through what the specific possible steps could be in between?
Aditya: Hmm, to be honest, I’m not sure what those would be. I know that a lot of smart people think this will happen and I am just guessing that if the AI is truly smarter than us, then my trying to figure out its strategies would be as futile as an ant trying to figure out how an orangutan thinks.
Socrates: Okay, but I’m still not convinced. So essentially, you’re not sure…and you’re telling me you want to spend the next few years on this?
Aditya: Well, I don’t know! To be honest, it just sounds like a cool career path where I look really smart, I get to tell people I’m an “AI Safety Researcher” and I get to make a lot of money 😉.
Socrates: Hmm, I’ve gotta say, those are not the best reasons to go into this field. I’ve heard it’s a lot of hard work. Also, those reasons don’t even answer the main question I asked which was “Why is AI Safety the most important problem to work on?”. Also, don’t get me wrong, it’s understandable to have a variety of motivations for choosing a career path and that’s all fine. But if you are going to become an AI safety researcher who wants to create meaningful research that truly moves the field in a beneficial direction, then you need to convince me that the following five things are possible1:
That AI will someday exceed human intelligence
That AI could become misaligned
That a superintelligent AI could cause human extinction
That we can mitigate the risk of AI becoming misaligned
That I am a good fit to work on this problem
Aditya: Hmm okay…going through this exercise is harder than I thought.
Socrates: As it should be. Claiming that anything is the most important anything is in general a hard claim to support.
Aditya: *sigh* Alright, let me give this a shot. Let’s start this point by point and do our best to create some sort of informal logical proof. I tried to watch a YouTube video about how actual logic proofs work but I wasn’t quite getting it. Anyway, let’s start with the most basic claim that anyone would agree with and build our way up. I am going to label each claim with a Greek letter. Why? Because Greek letters look cool 😋.
α: Observable reality is truth.
Example: If I claim that water exists, I can prove this by showing someone a glass of water and they can verify it by confirming that this glass of water appears in their field of vision.
Socrates: Admittedly, even this “ground truth” you are conveying may not hold water *ba dum tss*. However, I understand the purpose of this dialogue is to focus on AI safety and not epistemology so I will assume this is true for now.
Aditya: Phew! I wasn’t sure if you would let that slide, but at least now we have some common ground to stand on. Thanks!
Socrates: Welcome! Anyways, please continue.
Aditya: Okay let’s see, hmm. So I need to somehow get from α to our first item:
It is possible that AI will someday exceed human intelligence.
Socrates: α → 1 is quite the chasm to cross but I wish you all the best.
Aditya: Thanks haha, let’s see how far I get...
Alright, I gave it some thought and I have formed my argument. First, let me define intelligence. I read the Wiki page on this topic and the following definition resonated with me (though I admittedly like the others also):
β: “Intelligence measures an agent's ability to achieve goals in a wide range of environments.” -Shane Legg & Marcus Hutter
In nature, we see that across all living organisms there is a wide variance in the levels of intelligence. The least intelligent may be some single-celled organisms while the most intelligent would (as far as we know) be humans. As such, we have our next claim:
γ: In reality, we observe a wide variance of intelligence across living organisms.
Does all that make sense so far? This is where we are at:
α: Observable reality is truth.
β: Intelligence measures an agent's ability to achieve goals in a wide range of environments.
γ: We observe a wide variance of intelligence across living organisms.
…
1. It is possible that AI will someday exceed human intelligence.
Socrates: Yes.
Aditya: Great, onto the next point. The process of natural selection has shown us that species can evolve in a manner such that the intelligence of future generations is on average greater than that of prior generations. As such, according to the prior definition, we can think of intelligence as falling along this spectrum:
By the way, the person on the right is Saitama2 😉.
Socrates: Nice reference 😎.
Aditya: Thanks! 🤩 In essence, we can think of this as a spectrum from inanimate objects to god-level beings. Humans fall somewhere in between this spectrum. As such, it seems quite plausible that there is an infinitely large space between the point on this spectrum where humans reside and the rightmost point of this spectrum where god-level beings reside. This brings us to our next claim:
δ: According to our definition of intelligence, there is no reason to believe that humans represent any upper limit. As far as we know, there is no upper bound to intelligence.
So we are now here:
α: Observable reality is truth.
β: Intelligence measures an agent's ability to achieve goals in a wide range of environments.
γ: We observe a wide variance of intelligence across living organisms.
δ: According to our definition of intelligence (β), there is no reason to believe that humans represent an upper limit. As far as we know, there is no upper bound to intelligence.
…
1. It is possible that AI will someday exceed human intelligence.
Any questions or concerns so far?
Socrates: No, so far so good.
Aditya: Gotcha, so onto our next claim:
ε: AI has steadily, and at an accelerating rate, improved its capabilities in a wide variety of domains.
For a thorough treatment of this, I recommend checking out this great article from Richard Ngo. For the sake of simplicity, I will relate one example: written language.
It is estimated that the earliest human beings first evolved into existence around 195,000 years ago. This is based on the fact that the oldest remains of humans were found to be that old according to carbon dating. Now, when did humans first learn how to communicate via written language? It seems this was 6,000 years ago. As such, it seems that it took humans around 189,000 years to learn how to communicate via written language.
As for AI, it’s hard to pin down when AI was first truly developed as what is and is not AI can sometimes be somewhat hard to pin down. For the sake of simplicity, I’ll just defer to Wikipedia which states that the first AI programs were written in 1951 to play checkers and chess.
In 2023, the monthly peer-reviewed scientific journal Nature Biomedical Engineering, it was written:
“‘it is no longer possible to accurately distinguish’ human-written text from text created by large language models”
So in just 72 years, AI has accomplished something that took humanity 189,000 years. As such, we are now here:
α: Observable reality is truth.
β: Intelligence measures an agent's ability to achieve goals in a wide range of environments.
γ: We observe a wide variance of intelligence across living organisms.
δ: According to our definition of intelligence (β), we can think of it as a spectrum and there is no reason to believe that humans represent an upper limit. As far as we know, there is no upper bound to intelligence.
ε: AI has steadily, and at an accelerating rate, improved its capabilities in a wide variety of domains.
…
1. It is possible that AI will someday exceed human intelligence.
How does all this look?
Socrates: Good so far.
Aditya: Great, moving along. Now for my last claim:
ζ: It is reasonable to believe that AI’s rate of intelligence improvement will not only continue at the current speed, but also accelerate.
According to the AI Digest’s visual explainer on “How fast is AI improving?”, we see that “Performance usually improves predictably with time and money.” They list out various examples across many domains, including medicine, marketing, math, etc., and show that over time, the bigger and more complex models tend to perform better across a variety of benchmarks.
Furthermore, the Epoch Research Institute which focuses on trends in AI has shown that many key inputs such as the amount of computing used and money invested in training are only continuing to increase at higher rates.
As such, this should bring us to the point of bridging the gap from α → 1:
α: Observable reality is truth.
β: Intelligence measures an agent's ability to achieve goals in a wide range of environments.
γ: We observe a wide variance of intelligence across living organisms.
δ: According to our definition of intelligence (β), we can think of it as a spectrum and there is no reason to believe that humans represent an upper limit. As far as we know, there is no upper bound to intelligence.
ε: AI has steadily, and at an accelerating rate, improved its capabilities in a wide variety of domains.
ζ: It is reasonable to believe that AI’s rate of intelligence improvement will not only continue at the current speed, but also accelerate.
1. It is possible that AI will someday exceed human intelligence.
So that’s it! What do you think?
Socrates: Thank you for laying all that out. I must say this all sounds plausible to me. However, I am aware that while AI may someday exceed human intelligence, there are many intelligent people out there who say that it is highly unlikely. Have you taken the time to read any of their counterarguments?
Aditya: Well, not really 😅.
Socrates: I see. Well, how about we do that?
Aditya: Alright, let’s do it! But this post is already getting quite long, so maybe let’s focus the next post on that?
Socrates: Sounds like a plan!
Aditya 🤝 Socrates
That’s all for Part 1! Thank you all for reading this post, it means a lot 😀. I am currently envisioning several more parts to this series, maybe 9 more. I was thinking that as I have 5 key claims, I could write out 5 posts outlining my arguments for why they are true and 5 posts outlining counterarguments.
Also, feel free to let me know if you have any feedback on my writing! As my goal is to become an AI safety researcher someday, I will presumably be doing a lot of writing so if you have any feedback at all, especially in terms of how I can write with more clarity, then please let me know!
Thanks again for reading this post! 😄
To be precise, when I say “possible,” I mean that the probability is greater than 5%. Why do I mean that? I believe that the stakes of this problem are so high (i.e., human extinction) that even if the following claims have a small chance of being true, it is still worth devoting some human effort towards this problem.
Saitama is the main character of my favorite anime called “One-Punch Man.” In the show, he is considered to be so strong that he can defeat any enemy with just one punch! He essentially has god-level strength.