AI Governance and Safety Institute
Our Mission
We aim to improve institutional response to risks from future artificial intelligence systems and ensure the benefits of AI are realized. We conduct research and outreach and develop educational materials for stakeholders and the general public.
Get Involved
The challenges of AI governance and safety require a collaborative effort. Join our community of researchers, policymakers, and concerned citizens or support our work.
Key Concepts
Machine Learning and Interpretability
Developing the science of understanding AI. Usual computer programs are human-made instructions for machines to follow, but modern AI systems are instead trillions of numbers; the algorithms these numbers represent are found by computers themselves and not designed, controlled, or understood by humans.
Outer Alignment
Ensuring that the specified goals for a smarter-than-human AI system fully capture human values.
Inner Alignment
Making sure the AI system actually pursues the goals we specify, rather than its own misaligned objectives. AI developers know how to use a lot of compute to make AI systems generally better at achieving goals, but don’t know how to influence the goals AI systems are trying to pursue, especially as AI systems become human-level or smarter.
Convergent Instrumental Subgoals
Understanding how advanced AI systems might develop certain subgoals (like self-preservation or resource acquisition) regardless of their final goals, and how to use powerful AI systems safely despite these drives.
Advanced Agents
Ensuring safety of superhuman AI, or preventing everyone from developing a superhuman AI until it's possible to do that safely and in alignment with human values. If monkeys really want something, but humans really want something different, and humans don’t really care about monkeys, humans would usually get what they wanted even if it means monkeys don’t; if an AI system that’s better at achieving goals than humans doesn’t care about humans at all, it would get what it wants even if it means humans won’t get what they want. We should avoid developing superhuman systems that are misaligned with human values.
Expert Warnings
Geoffrey Hinton, who recently won the Nobel Prize for his foundational work in AI, and many other leading academics and researchers from the industry have expressed serious concerns about the future of artificial intelligence. Key reasons:
Modern AI vs Traditional Software
- Traditional computer programs: Human-written instructions that machines follow
- Modern AI systems: Complex networks of trillions of numbers, created through machine learning
- We can see the numbers, but we have no idea what they mean and what kinds of algorithms they implement
- These systems are not directly designed or controlled by humans
Future AI to Humans Could Be What Humans Are to Monkeys
- When human and monkey interests conflict, humans usually prevail
- This isn't because monkeys are weak, but because humans are more capable
- Similarly, a highly capable AI system that doesn't share human values could pursue its goals regardless of human interests
- Just as monkeys can't meaningfully influence human decisions, humans might struggle to control superintelligent AI