Case Study 20: The $4 Billion AI Failure of IBM Watson for Oncology

7. December 2024
Kategorien
Subscribe to our newsletter

The scale of the ambition became visible in 2015, when IBM formally launched Watson Health as a dedicated business unit and began assembling capabilities through a series of acquisitions that transformed the initiative from a product into a strategic platform. The company acquired Explorys and Phytel, paid $1 billion for Merge Healthcare, and announced the $2.6 billion acquisition of Truven Health Analytics, while reporting in 2016 that Watson Health had access to 100 million electronic health records, 200 million claims records, and millions of medical images, supported by a workforce of roughly 7,000 employees. Taken together, these moves suggest that IBM was not testing a hypothesis but committing to a new growth engine that, even on disclosed numbers alone, implied an investment well beyond $4 billion when acquisitions, integration, and internal development are considered (IBM Annual Report 2016, Reuters – Truven Deal).

What followed was not a sudden collapse but a gradual divergence between narrative and reality, as Watson for Oncology struggled to deliver consistent value outside controlled environments while the broader program continued to expand. The system did not fail because the idea of using data to support clinical decisions was fundamentally flawed, but because the conditions required to make that idea work at scale were never fully established before the organization moved into global rollout. The resulting gap between ambition and execution was not primarily technical, but structural and organizational, reflecting a governance failure that allowed the program to scale before its core assumptions had been validated (STAT Investigation – Watson Cancer).

The Build-Up: From Symbolic Victory to Strategic Bet

The early momentum of Watson in healthcare can be understood in the context of a system under pressure, where the volume of medical knowledge was increasing faster than clinicians could absorb it, and oncology decisions were becoming more complex due to advances in genomics and personalized medicine. IBM positioned Watson as a system that could ingest vast amounts of structured and unstructured data, interpret medical literature, and translate that information into actionable treatment recommendations, effectively augmenting physician decision-making. In 2015, Reuters reported that fourteen cancer centers in North America would begin using Watson to guide therapy decisions, and within two years IBM described deployments or implementations across more than fifty hospitals globally, suggesting that early adoption was translating into international scale (Reuters – Watson Cancer Centers).

Beneath this expansion, however, the foundations of the system remained narrow, as Watson’s credibility depended heavily on a small number of elite institutional partnerships, most prominently Memorial Sloan Kettering, whose clinical expertise was used to train and validate the system. This created a situation in which Watson did not represent a generalized model of oncology, but rather a codified version of how specific institutions approached diagnosis and treatment, raising the question of whether such knowledge could be transferred reliably across different healthcare systems, patient populations, and regulatory environments. While this approach provided a powerful initial proof point, it also introduced a dependency on context that would later limit the system’s ability to scale (MIT Technology Review – Watson Health).

The financial commitment reinforced the strategic nature of the initiative, as IBM invested billions into building Watson Health through acquisitions, partnerships, and internal development, effectively constructing a platform before the underlying product had demonstrated consistent performance across real-world settings. Merge Healthcare alone cost $1 billion, Truven Health Analytics $2.6 billion, and additional investments in data, analytics, and integration further expanded the scope of the program, creating a structure that was designed for scale but not yet proven to operate at that scale. This sequence, in which capability is assembled ahead of validation, is common in large transformations, but it increases the risk that expansion becomes driven by momentum rather than evidence (IBM Merge Acquisition, Reuters – Truven Deal).

Timeline: From Breakthrough Narrative to Strategic Retreat

The trajectory of Watson for Oncology can be understood through a sequence of events that illustrate how the program moved from promise to retreat over the course of a decade. The initial collaboration with Memorial Sloan Kettering in 2012 established the foundation, while the launch of Watson Health in 2015 marked the transition into commercialization and global expansion, supported by acquisitions and early deployments that created the appearance of rapid progress. At this stage, the narrative remained coherent, combining technological innovation, institutional credibility, and growing market presence into a story that aligned with both internal strategy and external expectations (MSKCC IBM Collaboration Announcement, Reuters – Watson Cancer Centers).

The first major disruption occurred in 2016, when the University of Texas MD Anderson Cancer Center halted its Oncology Expert Advisor project, another high-profile collaboration with IBM that had consumed approximately $62 million without producing a clinically usable system. This event represented more than a project failure, as it highlighted the difficulty of translating Watson’s capabilities into real-world clinical environments where data quality, workflow integration, and patient variability created challenges that could not be resolved through controlled training alone. While the project was discontinued, the broader program continued to expand, suggesting that the signal was not fully integrated into strategic decision-making (IEEE Spectrum – Watson Health Analysis, UT System Audit).

Between 2017 and 2018, the issues became public, as investigations by STAT reported cases of unsafe or incorrect treatment recommendations and highlighted the gap between IBM’s claims and actual adoption levels. These reports did not introduce new problems but made existing ones visible, shifting the evaluation of Watson from its potential to its performance in real clinical settings. In 2022, IBM sold key Watson Health assets to Francisco Partners for reportedly more than $1 billion, marking the final stage of a strategic retreat in which a program once positioned as a core growth engine was reduced to a set of assets that could be separated and operated independently (STAT Investigation – Watson Recommendations, IBM Watson Health Sale).

The Technical Reality: A System That Could Not Generalize

The core technical challenge faced by Watson for Oncology was not its ability to process data, but its ability to interpret that data in context, because medical decision-making is inherently dynamic, uncertain, and dependent on factors that extend beyond structured datasets. Oncology decisions are influenced by patient history, comorbidities, evolving research, and local treatment availability, much of which is captured in formats that are difficult to standardize or encode, such as physician notes or incomplete medical records. Watson struggled to operate effectively within this environment, as its design relied on structured representations of knowledge that could not fully capture the complexity of real-world clinical practice (IEEE Spectrum – Watson Health Analysis).

What ultimately broke was not the idea of using data to support clinical decisions, but the assumption that medical reasoning could be reduced to a form that Watson could reliably process and generalize across contexts, because the system depended heavily on curated training data and manually encoded decision logic. This approach allowed Watson to perform in controlled scenarios where inputs were well-defined and aligned with its training, but limited its ability to adapt when confronted with variability, ambiguity, and incomplete information. Taken together, this suggests that the system functioned less as a self-learning AI platform and more as a structured knowledge base that required continuous human intervention to remain relevant (IEEE Spectrum – Why Watson Struggled).

Performance varied significantly across use cases, with studies and reports indicating that Watson’s recommendations could align with clinical practice in certain contexts while diverging in others, particularly when applied outside the domains in which it had been trained. This inconsistency reduced trust among clinicians, who needed reliable and context-aware support rather than probabilistic suggestions that required additional validation. In decision-support systems, trust is not incremental but binary, and once confidence in outputs is questioned, adoption becomes difficult to sustain regardless of the system’s theoretical capabilities (ASCO Study – Watson Concordance).

The Delivery Model: Where the Story Broke

What IBM presented as a scalable platform behaved in practice like a series of complex projects, because each deployment required significant customization to align with local clinical workflows, data structures, and treatment protocols. Hospitals needed to adapt their processes, structure their data, and validate Watson’s recommendations, effectively participating in the development and refinement of the system rather than simply adopting a finished product. This created a situation in which scalability depended not on the technology itself but on the amount of effort invested in each implementation, limiting the system’s ability to achieve the efficiencies associated with a true platform (STAT Investigation – Watson Cancer).

The consequences of this delivery model became visible over time, as deployment cycles remained long, improvements required expert intervention, and outcomes varied across institutions depending on the level of customization applied. Instead of benefiting from network effects, where each deployment improves the system for all users, Watson required continuous local adjustments, preventing the emergence of a consistent and scalable operating model. This suggests that the underlying architecture was not designed for the type of generalization required to support global rollout, despite being positioned as such (IEEE Spectrum – Watson Health Analysis).

This mismatch between narrative and delivery is a recurring pattern in large transformation programs, where a solution is described as a platform to justify investment and scale, but behaves as a project in execution, consuming resources without generating proportional returns. In such cases, the gap between expectation and reality does not close over time, but expands, as additional deployments increase complexity while failing to resolve the structural limitations that prevent the system from scaling.

The Governance Failure: Scaling Before Solving

By the time Watson for Oncology entered global rollout, the core issues were already visible within the system, as demonstrated by the MD Anderson project and the reliance on curated data and expert input. These signals indicated that the model had not yet achieved the level of robustness required for large-scale deployment, yet they did not lead to a fundamental reassessment of the program. Instead, expansion continued, suggesting that the decision to scale was driven more by strategic momentum than by validated performance (IEEE Spectrum – Watson Health Analysis).

At the same time, accountability became increasingly diffuse, as the program spanned research, product development, consulting, and partnerships, with each component contributing to the overall system but none holding full responsibility for its outcomes. This fragmentation made it difficult to address systemic issues, as problems could be attributed to specific components rather than the design of the program as a whole. In such environments, governance becomes less effective, not because information is unavailable, but because responsibility is distributed in a way that prevents decisive action (STAT Investigation – Watson Recommendations).

The result was a program that continued to scale despite unresolved uncertainties, increasing both its reach and its exposure, while reducing the organization’s ability to change direction. This sequence, in which expansion precedes validation, is a defining feature of governance failure in large transformation programs, as it shifts the focus from proving that a system works to maintaining the perception that it will.

The Moment It Turned: From Ambition to Exposure

The turning point for Watson for Oncology was not a single internal decision, but the moment when external scrutiny made internal issues visible, shifting the evaluation of the system from its potential to its performance. Investigations by STAT in 2017 and 2018 documented cases of unsafe or incorrect treatment recommendations, bringing attention to problems that had previously been contained within the program. This exposure changed the dynamics of the initiative, as the system was now judged by clinicians, media, and the broader market based on its outputs rather than its ambition (STAT Investigation – Watson Recommendations).

Once this shift occurred, the limitations of Watson became more difficult to manage, as adoption slowed and confidence in the system declined, affecting both its economic viability and its strategic positioning. Programs can sustain themselves on narrative for extended periods, but only as long as that narrative is not directly challenged by observable outcomes, and once such challenges become public, the gap between expectation and reality becomes increasingly difficult to bridge.

By the time IBM moved to sell Watson Health assets in 2022, the outcome had already been determined, as the program no longer supported the narrative that had justified its creation. The sale to Francisco Partners, reportedly for more than $1 billion, reflects not only a financial adjustment but a strategic repositioning, in which Watson Health transitioned from a central growth initiative to a divested business unit (IBM Watson Health Sale, Bloomberg – Watson Sale).

Closing Thoughts

The failure of Watson for Oncology did not occur at the level of technology alone, but at the level of how that technology was positioned, scaled, and governed within a large organization. The decision to treat a context-dependent capability as a scalable product created a trajectory in which expansion outpaced validation, and in which early signals of misalignment were not sufficient to alter the direction of the program. Healthcare provided the opportunity, but it also amplified the consequences of these decisions, as the complexity of clinical environments exposed the limitations of the system more quickly than in other domains (IBM Annual Report 2016).

From a structural perspective, the program illustrates how narrative can become decoupled from evidence, particularly when early successes create momentum that is reinforced by investment, partnerships, and strategic importance. Once a program reaches this stage, it becomes increasingly difficult to question its underlying assumptions, as doing so would require not only technical adjustments but a reconsideration of the broader strategy that has been built around it.

The eventual outcome, in which Watson Health was sold and its original narrative abandoned, reflects not a sudden failure but the accumulation of unresolved issues over time, leading to a point where continuation was no longer sustainable. In this sense, Watson for Oncology is less an exception than a clear example of how large transformation programs can evolve when ambition, structure, and execution are not aligned.

What This Means for Boards

Watson for Oncology highlights a pattern that extends beyond healthcare and artificial intelligence, as it illustrates how large programs can move from controlled experimentation into full-scale deployment without resolving the uncertainties that determine their long-term viability. These uncertainties are often visible early, embedded in delivery models, technical limitations, and initial pilot results, yet they may not be treated as constraints on expansion if the program has already gained strategic importance.

As programs scale, governance can shift from evaluating assumptions to supporting narrative, particularly when the initiative becomes associated with broader organizational goals such as growth, innovation, or market positioning. In this phase, concerns may be reframed as execution issues rather than structural limitations, and the organization’s ability to act on negative signals can diminish, even when those signals directly challenge the viability of the program.

For boards, the implication is not to avoid ambitious initiatives, but to maintain the distinction between evidence and narrative, ensuring that decisions to scale are based on demonstrated performance rather than projected potential. Watson for Oncology shows how quickly this distinction can erode, and how difficult it becomes to restore once a program has crossed the threshold where stopping it is more costly, in organizational and reputational terms, than continuing it.


Most transformation failures do not start with strategy, technology, or vendors. They start with governance, incentives, and blind spots at board level.

If you are currently overseeing a critical transformation, I offer a focused board-level diagnostic to identify where your program is at risk before those risks become visible in financials and delivery.

If this is relevant, get in touch.


Sources

Primary Sources
Secondary Sources

That could also be of interest for you

Case Study 36: RSM and the Search for Platform Economics Without Private Equity

8. June 2026

For a long time, the global mid-tier accounting networks could tell a simple story about themselves. They were large enough to serve international clients, broad enough to offer audit, tax and consulting, and still close enough to the market to avoid the distance, bureaucracy and internal machinery often associated with the Big Four. The promise

Read more

Case Study 35: EY, Wirecard and the Real Economics of Public-Interest Audit

4. June 2026

When Wirecard collapsed in June 2020 after €1.9 billion in supposed cash balances could no longer be verified, the scandal immediately became one of the defining corporate failures of modern Germany. Public attention focused naturally on the missing cash, failed oversight, weak controls, regulatory failures, and the role of EY as long-standing auditor. But for

Read more

Case Study 34: Grant Thornton Australia and the Real Economics of Private Equity in Professional Services

1. June 2026

Private equity entering professional services is no longer a theoretical discussion. Over the past several years, accounting, tax and advisory firms have increasingly explored external capital, alternative practice structures, platform consolidation and sponsor-backed expansion models. The pattern is now visible across Grant Thornton, Baker Tilly, Citrin Cooperman, MHA, Interpath, Vialto and multiple regional accounting roll-ups.

Read more

Case Study 33: Deloitte EMEA – The Quiet Centralization of a Global Partnership

25. May 2026

In February 2026, Deloitte announced the planned launch of Deloitte EMEA, effective 1 June 2026, bringing together 16 participating firms across more than 80 countries into a regional structure representing approximately €20 billion in reported revenue, 6,000 partners and 132,000 professionals. The firm also announced more than €1.5 billion of incremental investment over four years,

Read more

Case Study 32: PwC, Vialto, and the Private Equity Constraint Shift in Professional Services

17. May 2026

In October 2021, PwC agreed to sell its Global Mobility Tax and Immigration Services business to Clayton, Dubilier & Rice. PwC described the unit as a global leader in employee tax, immigration, business travel, mobility managed services, and payroll solutions for multinational organizations. Reuters reported that the deal valued the business at approximately $2.2 billion,

Read more

Case Study 31: BDO’s Third Way – The Accounting Network Trying to Stay Independent While Learning to Live With Private Capital

11. May 2026

For a while, BDO looked like the firm that might give the professional services industry a clean counter-narrative. Grant Thornton had moved into private equity-backed consolidation. Baker Tilly US had accepted external capital. Moore Global had member firms benefiting from sponsor-backed growth. But BDO seemed to be drawing a line. In October 2025, BDO announced

Read more

Case Study 30: Afileon – How Private Capital Enters a Protected Profession Without Owning It

6. May 2026

For decades, the German tax advisory market was not simply fragmented. It was deliberately engineered to remain so. More than 100,000 licensed tax advisors operating across roughly 55,000 firms created a system that prioritized independence, continuity, and professional judgment over scale. Ownership was tightly restricted to qualified professionals, effectively excluding external capital and preventing the

Read more

Case Study 29: When the Firm No Longer Owns Its Talent – PwC vs Unity

27. April 2026

Professional services firms have long operated on a simple but rarely questioned assumption. They do not just employ talent. They contain it. Over decades, partners build client relationships inside the firm, convert those relationships into revenue, and accumulate economic value through profit participation, deferred compensation, and retirement structures that can reach several million dollars. The

Read more

Case Study 28: Forvis Mazars – One Brand, Two Firms, and the Structural Experiment That Runs Against the Industry

21. April 2026

When Mazars and FORVIS officially launched Forvis Mazars in June 2024, the headline numbers made the story look familiar. The new organisation entered the market with roughly $5 billion in combined revenue, around 40,000 professionals, operations in more than 100 countries and territories, and close to 1,800 partners, immediately placing it among the new entrants

Read more

Case Study 27: Baker Tilly and Private Equity – When a Network Starts Becoming a Platform

14. April 2026

Originally published April 2026, updated May 2026. Baker Tilly presents itself as a global firm, and by most external measures, it looks like one. The network operates in more than 140 territories, employs more than 50,000 people, and generates global revenues exceeding $5 billion, placing it among the largest accounting and advisory organisations worldwide, while

Read more