Lessons from Systems Engineering Failures: Determining Why Systems Fail, the State of Systems Engineering Education, and Building an Evidence-Based Network to Help Systems Engineers Identify and Fix Problems on Complex Projects
As the complexity of systems increases, so does what can go wrong with them. For example, the United States Air Force selected McDonnell Douglas’ design for the F-15 Eagle fighter aircraft in 1967 and the aircraft’s first test flight was in 1972, 5 years later. In contrast, the US military selected Lockheed Martin as one of two companies to develop the F-35 Lightning II in 1997 and its first flight was in 2006, 9 years later, and the first production aircraft had its first flight in 2011, 14 years after the selection. This complex program’s problems have been well-documented by the U.S. Government Accountability Office (GAO) and have contributed to the project’s long lead time and skyrocketing budget. GAO reports on other military projects reveal that problems the F-35 project has experienced are shared among all of these projects. In this dissertation I posit that similar problems plague all complex systems engineering projects and that a combination of these problems may lead to negative consequences, such as budget and schedule exceedances, quality concerns, not achieving mission objectives, as well as accidents resulting in loss of human life.
Accidents, or unexpected events resulting in loss, have been well-studied over time and we currently have sophisticated theories that help explain how they occur. The leading theory is that most accidents are a result of an accumulation of “mundane” errors at an organization, and that these errors are similar across industries. However, these mundane errors, such as failing to follow procedures and poorly training personnel, occur in all companies, such as companies that design and manufacture military aircraft. My theory is that these mundane errors accumulate in all organizations and result in many different kinds of systems engineering failures, including failures traditionally referred to as “accidents” that result in loss of life, as well as other types of failures which I refer to as “project failures”.
What can be learned from these systems engineering failures? In this dissertation, I begin by mining publicly-available reports to determine whether seemingly dissimilar failures, accidents and project failures, share common causes. I then explain the similarities and dissimilarities between these causes and provide examples from the failures I studied. To help provide systems engineers with actionable advice on these common causes, I describe how I linked the causes to recommendations from accident reports in a cause-recommendation network. I then discuss the results of interviews I held with systems engineers to determine whether the problems I identified in past failures occur in similar ways to the problems they have encountered on their projects. I also discuss the criticisms these systems engineers have about systems engineering education based on the tasks their newly-hired systems engineers struggle with. I explain how I used what I learned about problems in systems engineering that lead to failures to develop survey questions designed to gauge whether systems engineering education at Purdue prepares students to identify and fix these problems. Then, to help systems engineers learn from the data I collected and solve the problems they encounter on their projects, I describe how I built an interactive, web-based tool that presents expert advice on systems engineering failures. I finally explain the results from feedback I received from experts and novices in systems engineering to determine whether this tool could be useful for engineers in this context.