page image

CHAPTER 5 - Envisioning New Metrics, Governance and Accountability for AI

It may take some time to develop richer philosophical approaches to AI. In the meantime, there are many actions that can be taken to guide AI in socially constructive directions and govern its development. Interestingly, many major tech companies are publicly asking the U.S. government to regulate certain forms of AI, such as facial recognition, surveillance and driverless vehicles, or at least provide serious guidance to industry. These systems pose some serious dangers that need to be anticipated and prevented, especially if such things as weaponized drones and intrusive surveillance technologies become widely accessible. There are also significant national security and economic security issues at stake, as well as issues of civil rights, privacy and fair elections.

Precisely how the government or other parties should regulate specific AI technologies remains something of an open question, however. Participants considered a number of useful approaches for improving the governance and accountability of AI. A first priority is arguably the development of reliable metrics and empirical monitoring of salient developments in AI. But this would serve mostly as an informational predicate to new forms of legal oversight and regulation, perhaps involving novel strategies such as AI review boards, impact assessments and other independent mechanisms.

What Metrics Are Needed to Guide AI Development?
At the most basic level, the field of AI could benefit from some consensus metrics to assess what is actually happening with AI worldwide. Such metrics, in turn, could help various parties assess how current AI trends are achieving (or failing to achieve) key social, economic and educational goals. As a first order of business, then, it is important to consider what metrics need to be invented to determine that AI technologies are proceeding on the right track.

Michael Chui, Partner at the McKinsey Global Institute, gave a presentation about this challenge, suggesting different touchstones for evaluating AI. There is a truism in business, often attributed to Peter Drucker, that “What gets measured, gets managed.” But Drucker actually did say that, “Working on the right things is what makes knowledge work effective. This is not capable of being measured by any of the yardsticks for manual work.” In other words, not everything that we care about can necessarily be measured quantitatively (see Types of Measurements).

The choice of metrics can be quite consequential because it elevates certain priorities while ignoring other potentially important signposts. So the question is not only what metrics should we have, but which ones should we eschew? While metrics can help focus energies and coordinate the work of organizations and societies, they can also become empty totems. Or as researcher Joe Edelman has written, “Once metrics are defined, they’re like parasites and undead spirits, and they take over human beings” by inducing slavish attention rather than critical inquiry.

One important collective of metrics about the state of AI today is the AI Index, published by Stanford University, and translated into Chinese, Japanese and Korean. The Index is an effort “to track, collate, distil and visualize data relating to artificial intelligence,” and aspires to be a comprehensive resource of data and analysis for policymakers, researchers, executives, journalists and the general public. Another body that collects some metrics on AI is the Association for the Advancement of AI.

There was general agreement that there should be metrics to document the positive influences of AI. This, Neil Jacobstein of Singularity University said, “should really be seen as part of an overall effort for regulating AI. Such metrics are needed to determine if we’ve regulated enough or too much, and in the right ways. We are not developing AI in a competitive vacuum globally. China and other countries are racing ahead. It is possible to regulate us into irrelevance.” However, such a vision will require independently verified numbers or peer review, and not just industry-supplied numbers, said Meredith Whittaker. We have seen instances in which industry claims outpace the performance realities, she said. Unfortunately, continued Whittaker, because there is no “alternative AI production ecology,” it might be difficult to develop reliable numbers.

In any case, there are many actual and potential applications of AI that are simply not tracked right now, Jacobstein said. We ought to be collecting data about the role of AI in addressing pandemic diseases, climate change, illiteracy, international conflict and in the progress being made in meeting the other United Nation’s Sustainable Development Goals.

Steve Chien of the Jet Propulsion Laboratory suggested that the widespread use of AI to increase scientific progress in diverse scientific disciplines such as biology, chemistry, geology, etc. should be tracked, documented and reported. “There are a large number of traditional disciplines such as chemistry, biology and environmental sciences that are leveraging AI in their research, documented and passing the science peer review process in scientific journals sufficient tracking emphasis for the AI contribution,” he said. “That is a significant impact that should be tracked.” Chien cited several articles as exemplars including AI researchers (Kiri L. Wagstaff, David R. Thompson, Radio Science/Astrophysics; Umaa Rebbapragada, Astronomy; David R. Thompson, Greenhouse emissions) that document instances in which machine learning played a significant role in major scientific finding and models. Additionally, there are other high profile exemplars where AI is playing a major indirect role, such as in the AI-based scheduling for the Orbiting Carbon Observatory 3 Space Mission.

A variety of additional “missing metrics” for AI were mentioned by participants: It would be helpful to have more extensive data about AI developments in critical places around the world, such as China and Scandinavia. It would be useful to have ongoing tracking of state legislation that affects AI, and on the ways that philanthropy is funding AI-related work. It would also be helpful to synthesize and update the ethical frameworks for AI that are being developed around the world. Marc Rotenberg mentioned the work of Australian computer scientist Roger Clarke, who has catalogued about fifty AI ethics frameworks and tried to distill their most critical elements.

To paint a richer picture of how AI is developing, it would be worthwhile tracking employment trends for graduating AI researchers, said Tim Hwang of the Ethics and Governance of AI Initiative. What employers are researchers choosing, and who is hiring what types of experts for which topics? he asked. “Where researchers choose to do their work has relevance for who controls AI developments and whether or not the public has access to that research,” said Hwang. He considers such numbers a rough proxy for assessing the social good and which AI topics are being more intensively developed.

Similarly, it would be helpful to have numbers that reveal the diversity of genders, people of color and other minorities within AI fields now dominated by white males. These numbers will be “probative of the kinds of technical problems that will be prioritized within AI,” said Hwang, “which is relevant to how machine learning develops.” This in turn could have a “huge influence on the social impact of machine learning.”

In terms of government policymaking, the lack of shared metrics for describing AI has serious implications for coordinating federal funding for AI and fostering multidisciplinary research, said Terah Lyons, who used to work at the White House Office of Science and Technology Policy. Lyons said that “there is not a shared taxonomy for how we think about measuring artificial intelligence.” Indeed, many inter-agency meetings foundered because there was no shared language or policy categories among participants. “It’s an extremely fundamental challenge, but it’s still one that hasn’t been addressed,” said Lyons.

Steve Chien witnessed a similar challenges among federal agencies during congressionally initiated AI review directed by the National Defense Authorization Act in August 2018. “A national assessment of the state of AI was directed, but answering fundamental questions within the government, such as quantifying the NASA AI investment posed a tremendous challenge due to (a) multiple definitions of AI and (b) overlapping programs and organizations. These challenges are not unique to NASA, similar experiences were experienced at other Federal Agencies and even non-Governmental Entities.”

Law and Regulation to Oversee AI
A lively conference breakout group considered a variety of ways in which greater AI oversight and governance might be established. A first, obvious approach is industry self-regulation, which could take place industrywide or through individual companies—and within a company, via specific parts of the organization (legal, marketing, research, etc.). Another approach is a set of universal guidelines for AI uses, or synthesized guidelines from decentralized practices and policies that may already exist. One such framework, the “Universal Guidelines for AI,” has been endorsed by AI experts and international associations, including the American Association for the Advancement of Science. Independent bodies might also instigate new forms of auditing and reporting about AI behaviors within companies.

A cross-cutting concern is whether any regulation should be specific to a type of AI, or more universal in coverage. Participants were divided on this issue. Some felt that there should be laws equivalent to HIPAA (Health Insurance Portability and Accountability Act) or FERPA (Family Educational Rights and Privacy Act) to regulate AI and the various contexts in which it might be used. Others felt that government laws and regulations would be too slow and therefore not effective, or that new laws are either unnecessary (“nothing’s broken, so why fix it?”) or redundant (legal regimes already exist to regulate AI). Yet there was agreement that certain areas of AI, such as facial recognition and social scoring, may require domain-specific legislation.

That said, there was agreement that broader AI-related harms deserve to be addressed. These include racial or gender discrimination, consumer manipulation or fraud, breaches of trust, privacy invasions, political interference and social scoring, said Anita LaFrance of the University of Pennsylvania.

The potential means of AI governance are quite familiar: laws, regulations, liability rules, tort law, contract law and intellectual property law. Business behaviors might be “nudged” through various incentives created through tax law, civil and criminal liabilities, and even reporting requirements, which can be a form of governance. This list suggests that it may be worthwhile to think about layers of governance, ranging from self-regulation to informational disclosures and guidelines to federal law and regulation. The general sense is that there should be less opacity, more due process and an accent on fairness.

There are signs that governance of privacy and AI issues may soon become more harmonized on an international scale. The Organization for Economic Cooperation and Development (OECD) is finalizing international guidelines for the design and use of AI. Similar OECD Guidelines for Privacy Protection have influenced national policies, industry practices and also helped resolve challenges for transborder data flows. In April 2019, the Trump Administration embraced the OECD initiative to develop the AI framework and to support related efforts by the OECD on privacy. According to The New York Times, the White House was apparently concerned that the enactment of new state privacy laws and Europe’s surging leadership on privacy protection could splinter domestic and international markets, to the detriment of U.S. technology companies. The OECD AI Guidelines are also in line with statements previously made by the White House regarding the protection of privacy, civil liberties and democratic values. In a subsequent letter for The New York Times, Marc Rotenberg acknowledged the White House progress but also stated, “The United States must work with other democratic countries to establish red lines for certain AI applications and ensure fairness, accountability and transparency as AI systems are deployed.”

AI Review Panels, Impact Assessments and Certification
In a concluding presentation, Meredith Whittaker stressed that political choices lie at the heart of regulating AI. “When we ask who gets to determine which questions are relevant, what to measure and what to ignore, what gets funded and what research will be conducted, we begin to see that politics is going to define the scope of what AI means and its social impact,” she said. “So these are decisions that we should make with a great deal of intention and awareness.” In assessing the future of AI, Whittaker urged that “we broaden the frame as wide as possible” so that we can take account of all factors—the labor costs behind AI, including precarious workers; the environmental impact of the technologies; the huge infrastructures that they entail; and the structural factors that determine AI affordances.

She suggested that we should be wary of relying too much on numbers: “Reducing life to numbers that can be managed by a few is a dangerous proposition, and AI offers a beguiling set of techniques that makes that seem very easy. But we are already seeing the potential consequences of that type of decision-making and the social asymmetries that can result.” Asymmetrical power dynamics divide people into those who centrally control information and those who are the unorganized objects of information, she continued. AI intensifies this asymmetry because it mostly relies on “extractive processes that quantify and commodify our daily lives, personal interactions and our emotional signifiers.”

So what might be done beyond the industry self-regulation and government laws and regulations mentioned above? Whittaker offered a self-styled provocation for the group to consider: establish an AI review panel that would emulate the pioneering Cambridge Experimentation Review Board, which in the 1970s reviewed Harvard University’s recombinant DNA research. The Board convened a representative cross-section of people who might be affected by the lab research—a nurse, teacher, parent, a scientist from another discipline, among others—and charged them with studying the issues, hearing arguments from all sides, and synthesizing a community consensus. Whittaker thinks that an AI review panel could host an intelligent conversation and interrogative process, and help build a common frame of reference in identifying and preventing social costs.

The virtue of this approach, said Whittaker, is that it provides “a participatory model for understanding that puts the burden on the experts to reach out to people who are potentially most at risk.” A similar community panel, the Bronx Community Research and Review Board, was established by several hospitals in 1998 to make sure that their academic research practices are “fair, ethical and culturally appropriate” to the community. Whittaker said such panels can help expand the definition of “what’s important” in AI and avoid the rush to govern through numbers.

In a variant of this idea, Anita LaFrance Allen commended the idea of a national commission similar to the National Bioethics Commission established during the Obama Administration. That Commission assessed ethical problems raised by synthetic biology and served as a vehicle of “deliberative democracy” in formulating a consensus that might inform potential federal action. Even with no follow-through (the Trump Administration did not continue the commission), its dialogues focused the attention of affected parties and stimulated public discussion.

Some participants expressed skepticism at these ideas, however. A citizen review panel for AI would require a large investment of time and energy, and it could slow down or even stop certain AI initiatives. Alix Lacoste of Benevolent AI countered suggesting that in some cases, government and public intervention could in fact accelerate progress by enabling legislation, such as the Orphan Drug Act that sped up the drug approval process for medicines for rare diseases. In addition, Lacoste highlighted the potential positive role of government, philanthropy and review panels to help route AI research funds to scientific endeavors that may benefit society.

Reid Hoffman, the Co-founder of LinkedIn and Partner at Greylock Partners, emphasized that American tech companies are currently locked in a fierce race with China to develop AI, and various foreign intelligence agencies are trying to acquire American AI secrets. Hoffman said that citizen panels would get little public visibility and support, and if the government got involved, everything would move slowly, rendering any decisions ineffective.

A better approach than citizen panels, said Hoffman, would be to study a limited subset of AI, figuring out in advance what protections might be needed, and then to “re-factor” the oversight of AI later. “It’s a chimera to think you can actually get a real slowdown of AI given the nature of the competition and organizations operating here,” he said. Whittaker replied that “framing the issue in terms of an arms race is implicitly xenophobic. If potential Chinese sovereignty in AI is raised as the bar against which we measure ourselves, I think we’ve already lost. It feels like a Red Scare narrative all over again.”

Another form of oversight and governance to consider, said Marc Rotenberg of EPIC, is the idea of “impact assessments.” The idea of rigorous, formal reviews of the likely impacts of a business project have long been used to ensure public accountability for the environment and privacy, he said. These are models that might be emulated. Indeed, the European General Data Protection Regulation has provisions for a “data protection impact assessment.” Whittaker added that the AI Now Institute has in fact already produced an Algorithmic Impact Assessment framework, which it bills as “a practical framework for public agency accountability.”

There are also independent research and advocacy projects that might be worth creating or expanding. Neil Jacobstein suggested reviving the U.S. Office of Technology Assessment (OTA), which is still on the books, and until its defunding in the mid-1990s provided rigorous analyses of new technologies to the U.S. Congress. He said we need the OTA now more than ever. Additional third-party advocacy projects include the Algorithmic Justice League, which has documented racial biases in facial recognition software,34 and the EU-funded AlgoAware project systematically reviews social and democratic issues raised by algorithms.

It may be useful to have some type of organization that could act as an intermediary between AI projects and various constituencies, similar to the way that FINRA, the Financial Industry Regulatory Authority, mediates disputes among brokers, dealers and the investing public. One such example is a British think tank,, which the Omidyar Network has supported as an intermediary for transparency and fairness concerns in digital contexts.

Peter Norvig of Google suggested that perhaps a private, independent organization such as Underwriters Laboratories (UL) could help bolster public trust by certifying reliable AI services. In the early 1890s, when public distrust in the new technology of electricity was high, UL was founded to help reassure consumers about the safety of electrical products. Various conference participants raised questions about the efficacy of certification programs, however, at least if applied to AI. It was pointed out that certification for AI would have to be domain-specific, not general. Yet even this approach would not necessarily prevent unauthorized “off-label” AI uses. Others questioned whether certification would actually change consumer decisions, especially when so much AI is managed at the enterprise level and is therefore invisible to consumers. Certification might also need government regulation as a backup regime if it were to be credible. Apart from these reasons, it was pointed out that since so many AI systems are still in formative stages, it is too early to identify the proper foci for certification or metrics.

Title Goes Here
Close [X]