Bias Debt in Tech
Why good intentions and ethics frameworks aren't enough
By Rakhi Rajani and Natalie Banner
Introduction
Finding pathogenic variants in a human genome is exactly the type of needle-in-haystack problem that machine learning is good at (Topol, 2019). Genomic data sits at the boundary between research and clinical care, where the stakes of getting things wrong are measured in patient harm, so this combination of data intensity and proximity to patient care has forced the field to develop infrastructure for ethical data use.
Most large language models are trained on vast quantities of scraped internet data of unknown provenance, uncertain quality, and questionable consent (Bender et al., 2021). The genomics sector offers a different model, one that other fields developing data-intensive technologies could learn from.
When DeepMind’s AlphaFold predicted the 3D structures of nearly all known proteins, it was celebrated as a landmark in AI-driven science (Jumper et al., 2021). The achievement also highlighted something easily overlooked: AlphaFold worked in part because it was trained on openly shared data from decades of structural biology research, including experimentally determined structures from the Protein Data Bank and sequences from UniProt. These datasets had traceable origins and known quality standards. Unfortunately, this is not the norm in AI development and even where good data exists, it doesn’t guarantee that technology will be developed sensibly.
Photo by Google DeepMind on Unsplash
The provenance problem
A critical challenge for AI development concerns the ethical provenance of training data. Many models are built on datasets whose demographic composition, consent basis, and quality controls are undocumented. The resulting tools may be cheap and convenient to develop, but they offer no assurance of quality or reliability (Gebru et al., 2021).
Clinical and genomic data, while not without limitations, typically operate under different constraints. Genomic sequences are generated in accredited laboratories, and datasets can increasingly be characterised for genetic ancestry representation, allowing researchers to understand and communicate the limitations of models trained on them (Martin et al., 2019). Consent frameworks, where they exist, define scope of use in line with participant expectations.
A useful analogy is with supply chain transparency. Food production and fast fashion rely on complex global networks where material origins, labour practices, and quality are difficult for any customer to determine. The outputs are cheap, convenient, and mass-produced. “Farm to table” and sustainable fashion models provide assurance over provenance instead, though at smaller scale and higher cost.
Both models exist in modern society. But we should be clear about what is appropriate to the task. In genomics research and healthcare, where outputs directly affect patient diagnosis and treatment, high standards of ethical data provenance are not a luxury but a precondition for technology development that can withstand scrutiny from the populations it affects.
Why frameworks fail
The field of AI ethics has produced numerous frameworks and principles intended to guide responsible development (Jobin, Ienca & Vayena, 2019). These typically highlight transparency, fairness, equity, privacy, and explainability, alongside traditional bioethical principles of autonomy, beneficence, non-maleficence, and justice (Beauchamp & Childress, 2019). These frameworks are valuable for encouraging discourse but they fail to translate into practical contexts of ethical decision-making in ways that help teams identify, surface, weigh up, and act on the values implicit in their work (Mittelstadt, 2019).
Two failure modes are common. First, principles may be framed at such a high level that virtually any behaviour could be seen to accord with them. This creates false security and the impression that ethical questions have been addressed when they have merely been gestured at. This “ethics-washing” lets teams believe the hard work is done when it has not started.
Second, without clarity on what good looks like, teams fall into paralysis. Principles become blockers rather than enablers. This is especially true when principles conflict, such as commitments to transparency potentially compromising privacy, or explainability trading off against accuracy.
The nuances of specific use cases matter. Ethical technology development requires dynamic, iterative, context-sensitive approaches, not checklist compliance.
Bias debt
We propose the concept of “bias debt” to describe the accumulation of biases within AI systems during experimental and developmental stages. Early AI development is frequently characterised by incomplete codebases and biased training datasets, prioritised for rapid prototyping rather than long-term fairness (Mitchell et al., 2019). Many organisations fail to address these foundational biases before scaling, leading to their entrenchment in production systems where they are far harder to fix (Mehrabi et al., 2021).
The term borrows from “technical debt” in software engineering, but the analogy only goes so far. Technical debt is a shortcut chosen, a mistake made, an old technology kept on too long. Bias debt is different: inherited, built in, compounded. It exists in the historical data we train on, the categories that shaped its collection, the assumptions so embedded they are invisible to the teams building on top of them (Friedman & Nissenbaum, 1996). It may be centuries old. And unlike technical debt, the system carrying it does not experience it as bias. It experiences it as normal (Selbst et al., 2019).
This is not merely a data problem. Values are coded into algorithms through invisible choices: how data is collected, how variables are categorised, what counts as a successful outcome. The result is biased outputs that carry the illusion of objectivity. Obermeyer et al. (2019) demonstrated this starkly in their analysis of a widely-used healthcare algorithm that systematically underestimated the health needs of Black patients because it used healthcare costs as a proxy for health needs, a variable already shaped by unequal access to care.
Conversely, when decisions are made about deploying technical tools, important values may be stripped out. Quantitative models can produce “computer says no” decision-making that fails to account for factors that matter socially and ethically but resist measurement (Crawford, 2021).
Bias-free systems are pretty much impossible, so the goal is transparency, accountability, and mitigation. To address bias debt, we suggest three practical interventions.
Implementing bias risk registers. Tech bias rarely appears on company-level risk registers. Centralised logs that track identified biases and mitigation efforts ensure this risk is visible and owned.
Documenting trade-offs made during development to keep embedded choices visible and contestable: which variables were excluded, what data was prioritised, what edge cases were deprioritised.
Regular auditing of training datasets after initial experimentation, before deployment.
Human-machine collaboration
Ethical technology use depends on how humans and machines work together. One commonly held assumption is that humans are inherently more reliable for ethical judgments, given their capacity for moral reasoning and contextual understanding. But pairing humans with machines presents a more complex picture. The collaboration may yield superior outcomes by combining complementary strengths.
In human-machine interactions, such as those facilitated by generative AI, the process often begins with human-provided input. The machine generates responses based on this, highlighting the necessity for humans to frame precise and relevant questions. How you frame a question shapes what the machine produces, and the human role in guiding that framing matters (Floridi & Cowls, 2019). But this dynamic is shifting. When machines suggest prompts or generate follow-up questions, they shape the framing that supposedly guides them, and the locus of ethical responsibility becomes harder to locate.
These collaborations can also help mitigate biases. Machines, when designed with inclusive datasets and robust algorithms, can offer insights that challenge entrenched human biases or limitations in scope. The computational capacity to analyse large datasets enables the identification of patterns that a human alone might miss. Conversely, human capacity for empathy and contextual judgment helps ensure machine suggestions are interpreted and applied in ways that respect social norms and ethical boundaries. This combination can broaden perspectives and make decision-making more inclusive (Binns, 2018).
The challenge lies in designing partnerships that balance human oversight with machine capabilities as neither alone is sufficient. Humans bring irreplaceable ethical intuitions and contextual awareness, machines contribute computational efficiency and an expanded solution space, and effective collaboration requires clarity about when to defer to each.
It also requires a stance towards algorithmic outputs that most users do not adopt. The tendency is either to trust uncritically or dismiss entirely. The productive space is in between: use the tool, keep interrogating it. The humans affected by algorithmic decisions need reason to trust them. That reason cannot come from the algorithm itself. It comes from the ongoing work of questioning what it produces and why. You trust something when you contest, interrogate, reject, reshape - so then trust (or belief) is often earned through sparring, not assumed at the start.
New roles for new problems at the intersections
What is often ignored in discussions of responsible AI is the question of roles. Technology teams comprise product managers, designers, engineers, architects, and quality assurance testers (and often specialist roles such as a bioinformatician or clinical scientist). But for systems involving automation and algorithms, there may be a gap between technical QA and ethical oversight.
Just as security engineering moved from afterthought to embedded discipline, ethical reasoning about algorithms needs its own practitioners sitting within technical teams. Ethics engineers, technical ethics leads, responsible AI leads: call them what you want. Without a named role, the work gets absorbed into functions too distant from the decisions that matter.
Academic programmes are increasingly interdisciplinary, combining AI and ethics training. But these graduates enter organisations with no clear home for what they do. They end up absorbed into policy, compliance, or research, none of which sit close enough to the engineering decisions that actually encode values into systems.
Creating shared vocabulary across disciplines is equally important. Terms carrying specific meanings in one field may be understood differently in another (Klein, 1990). Developing glossaries and frameworks tailored to team needs, and involving team members in their creation, can foster mutual understanding and collective ownership (Stokols et al., 2008). Without this, interdisciplinary collaboration remains more aspiration than reality.
The limits of process
Relying solely on procedural guardrails to ensure ethical development is insufficient (Mittelstadt, 2019). Procedures provide structure but can create false security, leading teams to assume compliance equals ethical integrity. Rigid frameworks do not adapt to emerging dilemmas or context-specific considerations.
A culture of questioning intent and ethical impact has to be built. Teams should interrogate the purpose of systems, their potential misuse, and their societal implications. Adaptive processes matter more than static rules, and ethical guidelines need room to evolve with technological change and shifting societal expectations (Floridi et al., 2018).
This represents a shift from rigid compliance to dynamic, context-aware practice. It is harder than following checklists. It is also more honest about what responsible development requires.
Beyond compliance
Ethical robustness cannot be achieved through compliance alone. Frameworks and principles have a role, but they operate at a distance from the actual decisions that shape technical systems: which variables to include, how to define a successful outcome, whose data to train on, what edge cases to include or ignore.
These are design decisions. They encode values and potentially bias debt. The choice is not between ethical and unethical AI, but between systems where those embedded values can be examined and contested, and systems where they cannot.
Compliance asks: did we follow the process? Design asks: what do we build, why, who does it serve, how should it work, what conversations should it evoke?
The maturation of any technical discipline involves learning what else it needs to account for. Software development has done this with usability, with security, with accessibility. Ethical reasoning about data and algorithms requires the same shift, not as oversight but as craft. Genomics is further along this path than most, but no one is there yet. Bias debt is already in the codebases, the datasets, the ways of working and you can’t pay it down until you see it.
References
Beauchamp, T. L. & Childress, J. F. (2019). Principles of Biomedical Ethics (8th ed.). Oxford University Press.
Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.
Binns, R. (2018). Fairness in machine learning: Lessons from political philosophy. Proceedings of the 2018 Conference on Fairness, Accountability, and Transparency, 149–159.
Crawford, K. (2021). Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. Yale University Press.
Floridi, L. & Cowls, J. (2019). A unified framework of five principles for AI in society. Harvard Data Science Review, 1(1).
Floridi, L., Cowls, J., Beltrametti, M. et al. (2018). AI4People: An ethical framework for a good AI society: Opportunities, risks, principles, and recommendations. Minds and Machines, 28(4), 689–707.
Friedman, B. & Nissenbaum, H. (1996). Bias in computer systems. ACM Transactions on Information Systems, 14(3), 330–347.
Gebru, T., Morgenstern, J., Vecchione, B. et al. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86–92.
Jobin, A., Ienca, M. & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389–399.
Jumper, J., Evans, R., Pritzel, A. et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589.
Klein, J. T. (1990). Interdisciplinarity: History, Theory, and Practice. Wayne State University Press.
Martin, A. R., Kanai, M., Kamatani, Y. et al. (2019). Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics, 51(4), 584–591.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1–35.
Mitchell, M., Wu, S., Zaldivar, A. et al. (2019). Model cards for model reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency, 220–229.
Mittelstadt, B. (2019). Principles alone cannot guarantee ethical AI. Nature Machine Intelligence, 1(11), 501–507.
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453.
Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S. & Vertesi, J. (2019). Fairness and abstraction in sociotechnical systems. Proceedings of the Conference on Fairness, Accountability, and Transparency, 59–68.
Stokols, D., Hall, K. L., Taylor, B. K. & Moser, R. P. (2008). The science of team science: Overview of the field and introduction to the supplement. American Journal of Preventive Medicine, 35(2), S77–S89.
Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56.

