AI safety goals - Noah Lloyd

Many risks come from the development and deployment of AI: [[Existential AI risks]], [[Non-existential AI risks]] AI safety includes 3 paths which combat these risks - [[Technical AI safety]] - [[AI governance]] / getting the conditions and process right - [[Societal resilience]] # [[Technical AI safety]] ### Alignment - Outer alignment: Pursues the goals we intend - Inner alignment: Doesn't develop deceptive or dangerous instrumental goals - [[AI character]] ### Control - Corrigibility - Allows itself to be shut down ### Interpretability ### Provable safety # [[AI governance]] ### Coordination - Stop the race to build ASI first through increasing levels of coordination - [International AGI project](https://www.forethought.org/research/the-international-agi-project-series) ### Evals can block deployments ### [[Compute governance]] - [[Preventing open weight releases of frontier models]] - [[KYC for compute providers]] ### Slowing down - Terrorist attacks on data centers - Activism - Advocacy # [[Societal resilience]] ### [[Defensive acceleration]] - Biosecurity - Make [[Cloud labs]] harder to use - DNA synthesis screeing - Cyber ### Stop bad actors - State-level investigations on dangerous activity patterns ### [[Symbiosis with AI may keep humans relevant post-AGI|Human-AI symbiosis]] ### Epistemic security ### Hardening infrastructure ### Failsafe autonomy [^1][^2] [^1]: “AI Governance Needs a Theory of Victory | Convergence Analysis.” n.d. Accessed February 24, 2026. [https://www.convergenceanalysis.org/publications/ai-governance-needs-a-theory-of-victory](https://www.convergenceanalysis.org/publications/ai-governance-needs-a-theory-of-victory). [[AIGovernanceNeeds|Annotations]] [^2]: ryan_greenblatt. 2025. _Plans A, B, C, and D for Misalignment Risk — LessWrong_. October 8. [https://www.lesswrong.com/posts/E8n93nnEaFeXTbHn5/plans-a-b-c-and-d-for-misalignment-risk](https://www.lesswrong.com/posts/E8n93nnEaFeXTbHn5/plans-a-b-c-and-d-for-misalignment-risk). [[ryan_greenblattPlansMisalignmentRisk2025|Annotations]]