Purpose Limitation for AI
The risk of secondary misuse of trained AI models or anonymised training data poses one of the most significant regulatory gaps in AI governance to date. Purpose Limitation for AI is our ethical and legal proposal for an improved regulation. We advocate for updating the established – but in the context of AI ineffective – data protection principle of purpose limitation to the specific context of AI models and anonymous training data.
Purpose Limitation in Data Protection
Purpose Limitation is a core principle of data protection law, defined in the GDPR (art. 5 (1b), 6 (4)) and rooted in the European Charter of Fundamental Rights (art. 8 (2). Purpose limitation requires data controllers to define the purpose of data collection no later than at the time of collecting personal data. They are prohibited from processing the data in ways that are incompatible with the initially stated purpose (c.f. art. 6 (4) GDPR). These purposes must be specific, explicit, and legitimate, clearly outlining the aims and objectives of the data processing. Fundamentally, the purpose limitation principle aims to protect data subjects and ensure that subsequent data processing remains controllable and compliant with data protection law.
The purpose limitation principle in the GDPR is of limited effect in the context of AI. This is because AI models can be trained from anonymised training data, and in many relevant cases, the model data itself might be anonymous data. As the GDPR does not apply to the processing of anonymous data, there is a regulatory gap.
Updating Purpose Limitation for AI
In our interdisciplinary work, which combines critical AI ethics and legal studies, we develop the concept of Purpose Limitation for AI as an ethical and legal framework to govern the purposes for which trained models (and training dataset) may be used and reused.
The concept comes in a variations for trained modeles and anonymised training data:
Purpose Limitation for Models: A machine learning model shall only be trained, used and transferred for purposes aligned with those for which the training data was collected.
Purpose Limitation for Training Data: An actor shall only train a machine learning model from a data set if the purposes for which the data set was originally collected are compatible with the purposes for which the model is being trained.
Why is Purpose Limitation for AI important?
We argue that the possession of trained models lies at the core of an increasing asymmetry of informational power between AI companies and society. Limiting this power asymmetry must be a key objective of regulation, and Purpose Limitation for Models could be a crucial step in that direction. The production of predictive and generative AI models signifies the most recent manifestation of informational power asymmetry between data processing organisations (predominantly large corporations) and society.
Without public oversight of the purposes for which existing AI models can be reused in new contexts, this power asymmetry poses significant risks to individuals and society, including discrimination, unfair treatment, and exploitation of vulnerabilities (e.g., risks of medical conditions being implicitly estimated in job applicant screening). Our proposed purpose limitation for AI models aims to establish accountability, effective oversight, and mitigate collective harms related to the regulatory gap.