Alignment Theory - Concepedia

About

Alignment theory is a research field dedicated to developing theoretical frameworks and practical methods for ensuring artificial intelligence systems reliably pursue intended human goals, values, and ethical principles. It investigates the technical and philosophical challenges of aligning increasingly capable AI agents with human interests to prevent unintended or harmful outcomes and ensure beneficial deployment.

Top Publications

Rankings shown are based on citation count.

The Alignment Problem: Machine Learning and Human Values	2021
Embedding Values in Artificial Intelligence (AI) Systems	2020
Development and validation of the AI attitude scale (AIAS-4): a brief measure of general attitude toward artificial intelligence	2023
Aligning Knowledge Assets for Exploitation, Exploration, and Ambidexterity: A Study of Companies in High‐Tech Parks in China	2016
Aligning AI With Shared Human Values	2020