Many risks come from the development and deployment of AI: [[Existential AI risks]], [[Non-existential AI risks]]
AI safety includes 3 paths which combat these risks
- [[Technical AI safety]]
- [[AI governance]] / getting the conditions and process right
- [[Societal resilience]]
# [[Technical AI safety]]
### Alignment
- Outer alignment: Pursues the goals we intend
- Inner alignment: Doesn't develop deceptive or dangerous instrumental goals
- [[AI character]]
### Control
- Corrigibility
- Allows itself to be shut down
### Interpretability
### Provable safety
# [[AI governance]]
### Coordination
- Stop the race to build ASI first through increasing levels of coordination
- [International AGI project](https://www.forethought.org/research/the-international-agi-project-series)
### Evals can block deployments
### [[Compute governance]]
- [[Preventing open weight releases of frontier models]]
- [[KYC for compute providers]]
### Slowing down
- Terrorist attacks on data centers
- Activism
- Advocacy
# [[Societal resilience]]
### [[Defensive acceleration]]
- Biosecurity
- Make [[Cloud labs]] harder to use
- DNA synthesis screeing
- Cyber
### Stop bad actors
- State-level investigations on dangerous activity patterns
### [[Symbiosis with AI may keep humans relevant post-AGI|Human-AI symbiosis]]
### Epistemic security
### Hardening infrastructure
### Failsafe autonomy
[^1][^2]
[^1]: “AI Governance Needs a Theory of Victory | Convergence Analysis.” n.d. Accessed February 24, 2026. [https://www.convergenceanalysis.org/publications/ai-governance-needs-a-theory-of-victory](https://www.convergenceanalysis.org/publications/ai-governance-needs-a-theory-of-victory). [[AIGovernanceNeeds|Annotations]]
[^2]: ryan_greenblatt. 2025. _Plans A, B, C, and D for Misalignment Risk — LessWrong_. October 8. [https://www.lesswrong.com/posts/E8n93nnEaFeXTbHn5/plans-a-b-c-and-d-for-misalignment-risk](https://www.lesswrong.com/posts/E8n93nnEaFeXTbHn5/plans-a-b-c-and-d-for-misalignment-risk). [[ryan_greenblattPlansMisalignmentRisk2025|Annotations]]