Lessons learned from software engineering (AI, Law, and Otter Things #12)

Sep 29, 2021

Today's newsletter is a rant about how my exposure to software engineering and large-scale AI systems influenced my views on the legal challenges of AI and how regulators and tech lawyers should approach them.

A substantial part of my research deals with the interfaces between software engineering and AI regulation. Not only I have worked quite a bit on data protection by design throughout the last academic year, but my thesis itself deals with the relevance of engineering tools — such as life cycle models — for legal thought about AI-related matters. This focus would come as a huge surprise to some of my past selves, as I despised the software engineering courses I took during my undergraduate CS education. Why, then, do I think software engineering is so important for legal assessments of AI?

My about-face on software engineering was partly motivated by practical experience. In the few years I spent working as a data scientist, I got involved in a broad range of software development processes. Some of them amounted to little more than making permanent an ad hoc analysis that Excel no longer would handle well, while others involved small-scale experimentation with machine learning models. In such projects, I often worked alone or in small teams, and what mattered most was arriving at something that worked to produce a good result. And, by the end of the project, I would have something nice (or a kludge) that I knew well and could maintain — or reasonably expect my successor to handle.

What changed my mind was my (limited) experience with large-scale systems, such as commercial-grade recommender systems and the data architectures needed to feed them. To deal with enormous volumes of data, software systems not only demand more resources — more computing cycles, more memory, more storage, and so on —, but they also grow in complexity, reducing the power of individual technical actors to effect change in the overall system behaviour. As a result, these systems are costly to build, especially when one must comply with legal requirements such as data protection law. The huge costs, in turn, mean that large-scale systems are not easily discarded: unlike the sort of ad hoc code I described above, a large-scale system requires regular maintenance and is likely to continue operations for a long time. Between these and other factors, dealing with the kind of machine learning system one might see in a corporate context is a qualitatively different challenge than ad hoc data analysis, even if those analyses can involve large volumes of data.

So, in the years before I decided to become a full-time researcher, I tried to find technical and conceptual tools for dealing with large-scale systems. By a curious twist of fate, this meant revisiting software engineering materials. Nowadays, there is a growing body of academic and practical literature on AI and software engineering, which has informed my understanding of the legal problems I work on. But my legal scholarship has also benefitted from general topics in software engineering, drawing from sources such as the classics on the difficulties of estimation, modern textbooks that provide better introductions than what I had back in the mid-2000s, and online sources such as Hillel Wayne's writings on whether software engineering is actually engineering (spoiler: yes, lest our criteria end up disqualifying lots of "real" engineering too) and on the history of computing. At the end of the day, my decision to finally learn some software engineering was fruitful even after I decided to leave for good the ICT sector.

It turns out engineering is not the best use of an otter's skills.

How did this experience with software engineering shape my legal approach to AI? From a methodological perspective, it convinced me that software engineering concepts and methods could be relevant to legal assessments of AI systems: they can be used to communicate with technical stakeholders, better understand the life cycle of AI systems, and direct legal interventions in software design and use. This view seems to be gaining some traction — see, e.g., the role that life cycle processes play in the EU AI Act —, and I hope my research helps with their incorporation into the tech lawyer's toolset.

My experience with large-scale systems also impacts how I understand AI as a legal and regulatory object. It convinced me that we should pay more attention to AI systems, as an algorithmic-centric account of AI provides a misleading image of the role of algorithm design in the construction of software in the industry — as opposed to academia or small-scale projects, where the main contribution comes from the algorithm itself. That is particularly true for companies not in the R&D business, which can use already-established ML techniques and algorithms and focus on adapting these techniques to their context and data sources.

A focus on the algorithm also obscures how elements such as the underlying hardware are relevant for understanding the evolution of AI systems and their social impacts. This is not to say that STS scholars have ignored such elements, and indeed their meaning of "algorithm" is broader than the one used by computer scientists. Nevertheless, I think there is little to gain in sticking with "algorithm" as an analysis unit, especially when the notion of a computing system provides a readily available frame that meshes well with the literature on socio-technical systems.

Working in the industry also pressed upon me the complex social dynamics of large-scale software development. Academic research on AI is relatively small in scale: even when working with large data sets, the software itself is meant more as a proof of context, developed by a few specialists in the particular techniques being used. Things could not be more different in large systems, as each developer will deal with only a fraction of the code. In this scenario, communication between developers becomes a complex issue, and no programmer can single-handedly govern the software as a whole. Instead, the power of changing development directions lies ultimately with management, which deals with the technical complexities of an AI system through various intermediaries — product owners, system architects, and so on. If we look at AI systems through a programmer-centric lens, we might ignore some of the actors that have the most power to direct the construction and use of big AI applications, which are most likely to be relevant to the law.

Looking at the social dynamics of AI system development raises several follow-up questions about the power relations that permeate this process. Since I have already extended this issue beyond my original point, I will not examine it deeply at the time. I will, however, point you towards Abeba Birhane, a cognitive science PhD researcher who has written on machine learning research as a value-laden process, the dependence of large-scale AI on precarious labour, and algorithmic colonialism. Her analyses highlight the various power asymmetries that impact machine learning design, development, and use. As such, Birhane's work provides a valuable, technically-grounded critique of AI in research and the real world.

As you might have noticed from previous issues, I am terribly uninspired when it comes to actually closing a text. So, I will just leave with the usual: any impressions/comments/suggests/constructive criticism are appreciated, and I hope you enjoyed this issue. A mercoledì!