
Policy debates about AI often start where the damage ends: with liability after harm. Liability matters, but it is not enough on its own. When AI systems shape access to healthcare, liberty, employment, or essential public services, accountability is not only a governance preference but also a prerequisite for protecting rights and enabling effective remedy. Otherwise, people’s experience of AI systems ends up as denial without explanation, bias without recourse, and appeals processes that cannot surface what actually happened.
In high-stakes AI, accountability is only assignable when the system is designed for it: clear decision rights, auditable evidence, and control points that still work under stress. In other words, accountability is a design property, and liability regimes only function when that property exists.
We argue that four concepts help make accountability a design property in high-stakes AI, especially in domains such as healthcare and defence.
1) Accept intrinsic complexity, eliminate accidental complexity, and assign responsibility accordingly
High-stakes AI systems are complex because the domains are complex. No policy regime can simplify away domain uncertainty, attack-driven model behaviour, or time pressure. That “irreducible mess” is intrinsic complexity.
What policy can reduce is accidental complexity: the complexity we create ourselves through ambiguous handoffs, undocumented integrations, unclear oversight, and brittle interfaces between organisations. Accidental complexity is where accountability usually fails, because responsibility becomes implicit and evidence evaporates in the gaps. Concretely, that means requiring explicit responsibility maps across suppliers, deployers, integrators, and operators, and ensuring those maps match the technical reality of who can change what.
This aligns with lifecycle-wide, socio-technical approaches to AI risk like the NIST AI Risk Management Framework, which treats governance as distributed across actors rather than concentrated at the end user or the frontline operator. This also reflects the approach taken in military aviation safety - a recent MAA publication references key cross-industry guidance, all highlighting a whole system approach to delivering assurable capability.
2) Make trust justifiable through arguments: dynamic assurance arguments, not compliance theatre
Most “AI governance” fails because it substitutes documentation for justification. Trust can only be justified by argument, combining evidence into a coherent case that a system is acceptably safe and effective for a specific context.
Testing alone is not an argument. A risk register alone is not an argument. A pile of policies is not an argument. Policy should push organisations toward assurance cases that shape architecture and operations, and which update as models, workflows, and environments change.
The challenge we face is implementing this effectively and consistently. In healthcare, for example, NHS Digital Safety Standards (DCB0129 + DCB0160) do require a clear accountability differentiation, but the use of these standards is inconsistent at best. Recent research found that only 25.6% of digital healthcare technologies deployed in NHS organisations were fully assured.
3) Stop using “human-in-the-loop” to launder accountability
In high-stakes settings, humans are part of the system, not an external failsafe. Oversight becomes meaningless when humans lack time, training, authority, or visibility into system behaviour. When that happens, the human becomes the default liability holder while lacking real control.
Policy should define meaningful oversight operationally. “A human was involved” is not enough. The question is whether the human could competently, reliably, and authoritatively change the outcome.
This aligns with the EU AI Act’s requirement to assign oversight to suitably competent people with authority and support. However, this is challenging to implement in practice - key difficulties in effective oversight have not significantly changed since the advent of programmable control system automation in the 70’s.
4) Incentives, evidence, and remedy: making accountability enforceable
High-quality assurance happens when incentives reward it and failures incur costs, not because organisations are virtuous. There are powerful levers:
Procurement can mandate technical compliance.
Insurers can price governance maturity.
Regulators can treat missing evidence as a presumption against the operator or supplier.
But the missing element is often credible redress. Litigation is often inaccessible for individuals harmed by AI-mediated decisions. In employment law, we created tribunal-like routes because of the power asymmetry between employers and employees. High-stakes AI creates similar asymmetries: disclosure is difficult, expertise is scarce, and the harmed party is rarely able to establish what happened.
Policy should consider specialised, accessible mechanisms for AI harms that include disclosure powers, independent technical expertise, timelines fast enough to matter, and remedies beyond damages, including correction, cessation, and mandated monitoring.
This becomes more urgent given that attempts to create a neat, harmonised ex post liability approach in Europe have not produced a settled solution. That reinforces the central point: we cannot rely on liability alone to close the accountability gap.
Summary
In high-stakes settings like healthcare and defence, accountability is not a slogan, and liability cannot be a backstop. Accountability is a system property produced by outcomebased governance, engineered oversight, resilient architecture, and dynamic assurance across the lifecycle. By implementing these controls effectively, we can build systems that are not only fair, just, and responsible, but we can also increase system performance and reduce behavioural uncertainty.
By Callum Cockburn
Accountability Is A Design Property: Assurance and Liability in High-Stakes AI
Policy debates about AI often start where the damage ends: with liability after harm. Liability matters, but it is not enough on its own. When AI systems shape access to healthcare, liberty, employment, or essential public services, accountability is not only a governance preference but also a prerequisite for protecting rights and enabling effective remedy. Otherwise, people’s experience of AI systems ends up as denial without explanation, bias without recourse, and appeals processes that cannot surface what actually happened.
In high-stakes AI, accountability is only assignable when the system is designed for it: clear decision rights, auditable evidence, and control points that still work under stress. In other words, accountability is a design property, and liability regimes only function when that property exists.
We argue that four concepts help make accountability a design property in high-stakes AI, especially in domains such as healthcare and defence.
1) Accept intrinsic complexity, eliminate accidental complexity, and assign responsibility accordingly
High-stakes AI systems are complex because the domains are complex. No policy regime can simplify away domain uncertainty, attack-driven model behaviour, or time pressure. That “irreducible mess” is intrinsic complexity.
What policy can reduce is accidental complexity: the complexity we create ourselves through ambiguous handoffs, undocumented integrations, unclear oversight, and brittle interfaces between organisations. Accidental complexity is where accountability usually fails, because responsibility becomes implicit and evidence evaporates in the gaps. Concretely, that means requiring explicit responsibility maps across suppliers, deployers, integrators, and operators, and ensuring those maps match the technical reality of who can change what.
This aligns with lifecycle-wide, socio-technical approaches to AI risk like the NIST AI Risk Management Framework, which treats governance as distributed across actors rather than concentrated at the end user or the frontline operator. This also reflects the approach taken in military aviation safety - a recent MAA publication references key cross-industry guidance, all highlighting a whole system approach to delivering assurable capability.
2) Make trust justifiable through arguments: dynamic assurance arguments, not compliance theatre
Most “AI governance” fails because it substitutes documentation for justification. Trust can only be justified by argument, combining evidence into a coherent case that a system is acceptably safe and effective for a specific context.
Testing alone is not an argument. A risk register alone is not an argument. A pile of policies is not an argument. Policy should push organisations toward assurance cases that shape architecture and operations, and which update as models, workflows, and environments change.
The challenge we face is implementing this effectively and consistently. In healthcare, for example, NHS Digital Safety Standards (DCB0129 + DCB0160) do require a clear accountability differentiation, but the use of these standards is inconsistent at best. Recent research found that only 25.6% of digital healthcare technologies deployed in NHS organisations were fully assured.
3) Stop using “human-in-the-loop” to launder accountability
In high-stakes settings, humans are part of the system, not an external failsafe. Oversight becomes meaningless when humans lack time, training, authority, or visibility into system behaviour. When that happens, the human becomes the default liability holder while lacking real control.
Policy should define meaningful oversight operationally. “A human was involved” is not enough. The question is whether the human could competently, reliably, and authoritatively change the outcome.
This aligns with the EU AI Act’s requirement to assign oversight to suitably competent people with authority and support. However, this is challenging to implement in practice - key difficulties in effective oversight have not significantly changed since the advent of programmable control system automation in the 70’s.
4) Incentives, evidence, and remedy: making accountability enforceable
High-quality assurance happens when incentives reward it and failures incur costs, not because organisations are virtuous. There are powerful levers:
Procurement can mandate technical compliance.
Insurers can price governance maturity.
Regulators can treat missing evidence as a presumption against the operator or supplier.
But the missing element is often credible redress. Litigation is often inaccessible for individuals harmed by AI-mediated decisions. In employment law, we created tribunal-like routes because of the power asymmetry between employers and employees. High-stakes AI creates similar asymmetries: disclosure is difficult, expertise is scarce, and the harmed party is rarely able to establish what happened.
Policy should consider specialised, accessible mechanisms for AI harms that include disclosure powers, independent technical expertise, timelines fast enough to matter, and remedies beyond damages, including correction, cessation, and mandated monitoring.
This becomes more urgent given that attempts to create a neat, harmonised ex post liability approach in Europe have not produced a settled solution. That reinforces the central point: we cannot rely on liability alone to close the accountability gap.
Summary
In high-stakes settings like healthcare and defence, accountability is not a slogan, and liability cannot be a backstop. Accountability is a system property produced by outcomebased governance, engineered oversight, resilient architecture, and dynamic assurance across the lifecycle. By implementing these controls effectively, we can build systems that are not only fair, just, and responsible, but we can also increase system performance and reduce behavioural uncertainty.
By Callum Cockburn
Accountability Is A Design Property: Assurance and Liability in High-Stakes AI
Policy debates about AI often start where the damage ends: with liability after harm. Liability matters, but it is not enough on its own. When AI systems shape access to healthcare, liberty, employment, or essential public services, accountability is not only a governance preference but also a prerequisite for protecting rights and enabling effective remedy. Otherwise, people’s experience of AI systems ends up as denial without explanation, bias without recourse, and appeals processes that cannot surface what actually happened.
In high-stakes AI, accountability is only assignable when the system is designed for it: clear decision rights, auditable evidence, and control points that still work under stress. In other words, accountability is a design property, and liability regimes only function when that property exists.
We argue that four concepts help make accountability a design property in high-stakes AI, especially in domains such as healthcare and defence.
1) Accept intrinsic complexity, eliminate accidental complexity, and assign responsibility accordingly
High-stakes AI systems are complex because the domains are complex. No policy regime can simplify away domain uncertainty, attack-driven model behaviour, or time pressure. That “irreducible mess” is intrinsic complexity.
What policy can reduce is accidental complexity: the complexity we create ourselves through ambiguous handoffs, undocumented integrations, unclear oversight, and brittle interfaces between organisations. Accidental complexity is where accountability usually fails, because responsibility becomes implicit and evidence evaporates in the gaps. Concretely, that means requiring explicit responsibility maps across suppliers, deployers, integrators, and operators, and ensuring those maps match the technical reality of who can change what.
This aligns with lifecycle-wide, socio-technical approaches to AI risk like the NIST AI Risk Management Framework, which treats governance as distributed across actors rather than concentrated at the end user or the frontline operator. This also reflects the approach taken in military aviation safety - a recent MAA publication references key cross-industry guidance, all highlighting a whole system approach to delivering assurable capability.
2) Make trust justifiable through arguments: dynamic assurance arguments, not compliance theatre
Most “AI governance” fails because it substitutes documentation for justification. Trust can only be justified by argument, combining evidence into a coherent case that a system is acceptably safe and effective for a specific context.
Testing alone is not an argument. A risk register alone is not an argument. A pile of policies is not an argument. Policy should push organisations toward assurance cases that shape architecture and operations, and which update as models, workflows, and environments change.
The challenge we face is implementing this effectively and consistently. In healthcare, for example, NHS Digital Safety Standards (DCB0129 + DCB0160) do require a clear accountability differentiation, but the use of these standards is inconsistent at best. Recent research found that only 25.6% of digital healthcare technologies deployed in NHS organisations were fully assured.
3) Stop using “human-in-the-loop” to launder accountability
In high-stakes settings, humans are part of the system, not an external failsafe. Oversight becomes meaningless when humans lack time, training, authority, or visibility into system behaviour. When that happens, the human becomes the default liability holder while lacking real control.
Policy should define meaningful oversight operationally. “A human was involved” is not enough. The question is whether the human could competently, reliably, and authoritatively change the outcome.
This aligns with the EU AI Act’s requirement to assign oversight to suitably competent people with authority and support. However, this is challenging to implement in practice - key difficulties in effective oversight have not significantly changed since the advent of programmable control system automation in the 70’s.
4) Incentives, evidence, and remedy: making accountability enforceable
High-quality assurance happens when incentives reward it and failures incur costs, not because organisations are virtuous. There are powerful levers:
Procurement can mandate technical compliance.
Insurers can price governance maturity.
Regulators can treat missing evidence as a presumption against the operator or supplier.
But the missing element is often credible redress. Litigation is often inaccessible for individuals harmed by AI-mediated decisions. In employment law, we created tribunal-like routes because of the power asymmetry between employers and employees. High-stakes AI creates similar asymmetries: disclosure is difficult, expertise is scarce, and the harmed party is rarely able to establish what happened.
Policy should consider specialised, accessible mechanisms for AI harms that include disclosure powers, independent technical expertise, timelines fast enough to matter, and remedies beyond damages, including correction, cessation, and mandated monitoring.
This becomes more urgent given that attempts to create a neat, harmonised ex post liability approach in Europe have not produced a settled solution. That reinforces the central point: we cannot rely on liability alone to close the accountability gap.
Summary
In high-stakes settings like healthcare and defence, accountability is not a slogan, and liability cannot be a backstop. Accountability is a system property produced by outcomebased governance, engineered oversight, resilient architecture, and dynamic assurance across the lifecycle. By implementing these controls effectively, we can build systems that are not only fair, just, and responsible, but we can also increase system performance and reduce behavioural uncertainty.
By Callum Cockburn



