General Accountability Office 21-519 SP

GAO-21-519SP is an accountability framework for Artificial Intelligence:

To help managers ensure accountability and responsible use of artificial intelligence (AI) in government programs and processes, GAO developed an AI accountability framework. This framework is organized around four complementary principles, which address governance, data, performance, and monitoring. For each principle, the framework describes key practices for federal agencies and other entities that are considering, selecting, and implementing AI systems. Each practice includes a set of questions for entities, auditors, and third-party assessors to consider, as well as procedures for auditors and third- party assessors.

Principle 1: Governance

Principle 1 frames risk minimisation in AI systems as a management-led process spread across organisational and system levels. These processes typically emphasise engagement between teams with a high-degree of mechanical autonomy guided by appropriate documentation.

At the same time, many audit procedures in Principle 1 require specifity, measurability and repeatability in the mechanical application of those high-level designs.

Extract, Page 31:

"This documentation process begins in the machine learning system design and set up stage, including system framing and high-level objective design".

Extract, Page 32:

"Review goals and objectives to each AI system to assess whether they are specific, measurable and ... clearly define what is to be achieved, who is to achieve it, how it will be achieved, and the time frames"

The tension between these two 'levels' of framing can frustrate the audit process.

Governance 1.1 - Clear Goals

Define clear goals and objectives for the AI system to ensure intended outcomes are achieved

The design and implementation plan of an AI system should naturally give rise to a suite of acceptance tests that consider the system a black-box. Those tests should initially fail and the project should not be considered 'completed' unless those tests are passed (accepted).

Say our implementation is using a neural net:


import numpy as np

VectorType = np.array
TensorType = np.array

class ANN:

    def forward(self, X: TensorType) -> VectorType:
        """
        Operates on an input to generate the intended output.
        """

def test_accuracy():
    """
    The model should be accurate on the test set.
    """

def test_real_time():
    """
    The model should meet performance requirements for 'real time' system.
    """

Acceptance tests encode both our goals and objectives as well as tell us about the purpose of the system. Remember, engineers will optimise towards these tests so it is important to assess whether these tests are reliable when over-optimised to the extreme.

Decoded.AI can help frame acceptance tests in language familiar to GAO-21-519SP as well as unpack how metrics are computed so that their implications are better understood.

We start by adding some instrumentation into the code that helps us measure and understand the computation. For example, functions like test_accuracy are instrumented as acceptance tests based on the test_ prefix! When it runs, our system collects that information and transforms it into micro-frontends like the one above.

These are five (5) of the questions to consider:

What goals and objectives does the entity expect to achieve by designing, developing, and/or deploying the AI system?

The goal is to build a model that is accurate on some kind of task with an inference speed on a CPU of less than 200ms and with no more than 20Gb RAM.

The stated goals are specific, measurable and clear. Whilst they specify what is to be achieved, they do not say how, who is to achieve it or within what time frame.

To what extent do stated goals and objectives represent a balanced set of priorities and adequately reflect stated values?

The stated goals and objectives are unlikely to represent a balanced set of priorities. They are disproportionately weighted towards engineering considerations such as accuracy.

To what extent does the entity communicate its AI strategic goals and objectives to the community of stakeholders?

Stated goals and objectives are accessible over a web-browser

To what extent does the entity consistently measure progress towards stated goals and objectives?

States goals and objectives are consistently measured.

One of the benefits of using an RAI tool like Decoded.AI is that you get things like communicating with stakeholders out-of-the-box.

It's good to know as early as possible in development when a project is struggling to meet Responsible AI practices whilst systems, practices and ideas are still open to change.

To what extent does the entity have the necessary resources—funds, personnel, technologies, and time frames—to achieve the goals and objectives outlined for designing, developing and deploying the AI system?

One of the hardest parts of robustly developing an AI system is understanding how cost pressures will force changes in an otherwise well-designed plan. For AI developers, things like waiting for feedback or eagerly optimising for scaling problems can generate an intense pressure on the project lifecycle that is difficult to predict early on in development. One way to support that is to understand the cost structure of a computation by modelling future changes. If we have a rough estimate for the time taken for an attempt, how much an attempt costs (compute + engineering hours) and some idea of how many attempts might be necessary then we can roughly predict whether a project is within our capacity. We can also predict how likely future scaling challenges are to disrupt project trajectory.

Last modified: 2022-11-07