University of Bielefeld -  Faculty of technology
Networks and distributed Systems
Research group of Prof. Peter B. Ladkin, Ph.D.
Back to Abstracts of References and Incidents Back to Root


Abstracts of References and Incidents

compiled by Peter B. Ladkin

Latest edition 5 May 1998

Contents

Introduction

The world has changed significantly for air travellers in the 1990s. New generation aircraft, such as the Airbus A319/320/321/330/340 series and the Boeing B777 are now in service. These aircraft are `fly-by-wire'-- their primary flight control is achieved through computers. The basic maneuvering commands which a pilot gives to one of these aircraft through the flight controls is transformed into a series of inputs to a computer, which calculates how the physical flight controls shall be displaced to achieve a maneuver, in the context of the current flight environment. While large commercial air transport aircraft have had hydraulically-aided controls for along time, and military air forces have been flying their aircraft under computer control for equally as long, many people including myself believe that the use of computer primary flight control in commercial transports heralds a new era, in which scientists concerned with specification and verification of computer systems, as well as passengers concerned with personal safety, should renew their interest.

It would be pleasant to say there have been no accidents. Unfortunately, as with many new types of aircraft, the Airbus A320/A330/A340 series has had its share. There have been fatal accidents with A320 aircraft in Bangalore, India; in Habsheim, in Alsace in France; near Strasbourg, also in Alsace in France; and in Warsaw, Poland. An A330 crashed on a test flight in Toulouse, killing Airbus's chief test pilot as well as the other crew on board. An A340 suffered serious problems with its flight management computer system en route from Tokyo to Heathrow, and further significant problems on approach to Heathrow. In late 1995 and early 1996, the B757 (not a fly-by-wire aircraft) suffered its first two fatal accidents in a decade and a half of service.

Even transport aircraft which are not fly-by-wire have nowadays their share of safety-critical computer-based systems. New-generation flight management and guidance systems (the so-called `glass cockpit') and Full-Authority Digital Engine Controls (FADEC) are found on most newly-built aircraft.

I collect here a series of comments and papers referring to recent computer-related accidents from various experts and not-so-experts. The collection was originally restricted to fly-by-wire aircraft, but has since broadened to include other computer-related or maybe-computer-related incidents. This page will grow with time. I sincerely hope not by much.

General Information

The RISKS Forum
The incidents, accidents, and analyses reported here have been discussed in Peter Neumann's RISKS-Digest, the Forum on Risks to the Public in Computers and Related Systems, an on-line news journal which has established a reputation as the premier electronic source for news and discussion of these topics over the last ten years. This compendium relies heavily on RISKS comments. Thankyou, Peter, for a very great service. RISKS material is freely available; authors, the RISKS moderator and the ACM have no connection with my use of this material.

sci.aeronautics.airliners
Information and links about all aspects of commercial air transport may be found in the sci.aeronautics.airliners page maintained by the moderator Karl Swartz.

The Bluecoat Project
The Bluecoat Project was initiated by Bill Bulfer an airline pilot who has written user guides for the flight management systems on board certain Boeing aircraft. The purpose of the project is to generate and maintain ongoing discussion between the designers of these computer systems and the pilots who use them, and has expanded to include regulatory authorities and noted researchers. The project maintains public information and reports on its WWW site, as well as private information for members. The public reports are a fine resource for those interested in current computer-related issues in aviation safety.

Aviation Week's Safety Resource Center
Aviation Week and Space Technology is a weekly journal which is a major source for reliable technical information on aviation matters, including safety. Avweek has a WWW site devoted to `credible air safety information', the Safety Resource Center, which contains a wealth of information, including material on accident investigations and also links to aviation safety-related WWW sites (including this one).

Organisations concerned with Air Safety

ICAO and IATA
The International Civil Aviation Organisation is the international body to which most of the countries in the world with significant commercial aviation are signatories. The charter of ICAO promotes uniform international standards, recommended practices and procedures covering the technical fields of aviation, such as rules of the air, licencing, charts and navigation, airworthiness, aeronautical telecommunications, air traffic services, accident investigation, security, etc. The International Air Transport Association is the trade association for commercial air transport. Its mission is to represent and serve the airline industry, and serves four main interested groups: airlines, the general public, governments, and third parties such as travel and cargo agents or equipment and systems suppliers.

The US FAA, NTSB, ASRS at NASA Ames, and RTCA Inc
The Federal Aviation Administration is the US Government organisation responsible for the administration of all aspects of aviation in the United States. It is a part of the US Department of Transportation. The FAA develops, for example, certification standards for commercial transports on which other organisations such as the JAA (Joint Aviation Authority) in Europe base their regulations. The entire Federal Aviation Regulations are on-line. It is essential to understand these regulations in order to understand the environment for flight in the US or with US commercial carriers.

The US National Transportation Safety Board (NTSB) is responsible for analysing mishaps and accidents.

The NASA Ames Research Center in Mountain View, California has run for many years a program called the Aviation Safety Reporting System (ASRS). Users of the National Aerospace System are encouraged to report incidents and events which they feel may affect the safety of flight, or of which knowledge may contribute to flight safety. The reports are guaranteed to remain anonymous, and immunity from punitive administrative action is granted under most circumstances to those who can demonstate that they have reported the relevant safety-related incident to the ASRS. The result is an unparalleled accumulation of data on safety-related incidents. These are summarised in the ASRS monthly newsletter, Callback, and a journal, Directline. On-line copies of recent issues of Callback (since issue 187, December 1994) and Directline (since issue 5, March 1994) are available from the ASRS publications page.

RTCA is a private, not-for-profit organization that addresses requirements and technical concepts for aviation. Products are recommended standards and guidance documents focusing on the application of electronics technology. RTCA was organized as the Radio Technical Commission for Aeronautics in 1935. Since then, it has provided a forum in which government and industry representatives gather to address aviation issues and to develop consensus-based recommendations. It was reorganized and incorporated in 1991. Members include approximately 145 United States government and business entities such as: the Federal Aviation Administration, Department of Commerce, U.S. Coast Guard, NASA; aviation-related associations; aviation service and equipment suppliers; and approximately 35 International Associates such as Transport Canada, the Chinese Aeronautical Radio Electronics Research Institute (CARERI), EUROCONTROL, the UK CAA, Smiths Industries, Sextant and Racal. The Web site includes a broader statement of what they do.

Recommended Standards Documents such as DO-178B Software Considerations in Airborne Systems and Equipment Certification, DO-197A (Active TCAS I for `commuter' aircraft), DO-185 (TCAS I and II, including algorithms in pseudocode), DO-184 (TCAS I Functional Guidelines), as well as the standards for navigation equipment, windshear detection and thundestorm avoidance equipment are all available for purchase via the RTCA Web site. A full list of applicable standards is also available.

The UK Aircraft Accidents Investigation Branch
The equivalent organisation to the US NTSB in Britain is the Aircraft Accidents Investigation Branch (AAIB) of the UK Department of Transport. All Formal Investigations of incidents and accidents since 1994, and the monthly Bulletins since January 1996 are available on-line.

The Transportation Safety Board of Canada
The Canadian Transportation Safety Board CTSB is the equivalent organisation to the US NTSB. Their aircraft accident reports are publicated on the on the WWW.

The Australian Transport Safety Bureau
ATSB is the Australian accident investigation authority. Their reports are very-high quality documents and recent ones are available on the WWW in PDF format. Accident investigation specialists explicitly use James Reason's model of human error for determining not only the active errors which contribute to an accident, but also the latent errors, the organisational, managerial and oversight errors, which Reason and others have identified as being major contributors to complex system accidents - aviation, encompassing regulation, oversight authorities, air traffic control and communication, as well as pilots and aircraft, is a complex system in this sense. Reason, of the University of Manchester in England, is a reknowned specialist in human error who has done considerable work with ICAO. Those wanting good examples of Reason's methods in practice would do well to surf around this site.

The EUCARE Project
EUCARE is the European Confidential Aviation Safety Reporting Network, based in Berlin, Germany, duplicating the remit of the NASA ASRS in the US. It is independent of any national government authority in Europe. It publishes Eucareview, a series of regular bulletins discussing reported incidents, which is available on the WWW.

The CHIRP Project
CHIRP is the UK Confidential Human Factors Incident Reporting Programme, similar to the US's ASRS. Like ASRS and EUCARE, it publishes its organ Feedback quarterly on its WWW site, starting with Number 45, October 1997, as well as its forms for Flightdeck, ATC and maintenance incident reports.

Eurocontrol
Eurocontrol, the European Organisation for the Safety of Air Navigation, is an umbrella organisation which establishes standards and coordinates work on air safety and air traffic control in Europe. It has 26 member states. Eurocontrol has two sites of main interest here: details of its many Air traffic control research programs, and their EATCHIP Phase III Human-Machine Interface work.

The Flight Safety Foundation
The Flight Safety Foundation is an association whose members are organisations with an interest in commercial flight safety. Its goal is to promote the safety of air travel. The monthly issues of its journal, the Flight Safety Digest are available on the WWW in PDF (Acrobat) format and are definitive sources of information and research on air safety issues.

Social Issues

What Do The Statistics Say, If Anything?
Statistics tell of the frequency of problems that have arisen with aircraft on which some systems are computer-controlled. These are referred to as `New-Technology Aircraft' by the Boeing Company's Annual summary of statistics. Boeing publishes the Statistical Summary of Commencial Jet Aviation Accidents each year, and displays data back to 1959 to indicate trends. It's available from Boeing Commercial Airplane Group, P.O. box 3707, Seattle, WA 98124-2207, USA. Excerpts appear below. In response to an article by Mitch Kabay in RISKS on the complexity of the modern pilot's cockpit interface, a report of a lecture to the Royal Society by Michael Bagshaw, I wrote a short commentary for RISKS-18.66 on what those statistics might show for new-technology aircraft. Mark Stalzer contributed a short reply pointing out what those statistics might not show.

Frequency or rate of accidents can tell us what the likelihood of an accident may be, if all contributing factors remain the same. Likelihood of an accident should not be confused with risk. Risk is an engineering term which attempts to combine the likelihood of an accident with the severity of the consequences. For example, you could stub your toe while entering the aircraft from the airway. The likelihood of this could be much greater than the likelihood of you sustaining severe injury on board, but the risk might be much lower, because the consequences (a sore toe) are not severe. Risk is explained by Leveson (Safeware, Addison-Wesley 1995, p179) as the hazard level combined with (1) the likelihood of the hazard leading to an accident [..] and (2) hazard exposure or duration [..]. A hazard is itself explained as a state or set of conditions of a system that, together with other conditions in the environment of the system (or object), will lead inevitably to an accident (op. cit., p177). Hazard severity is measured by assessing the severity of the worst possible accident that could result from the hazard, given the environment in its most unfavorable state (op. cit., p178, modified). Hazard level is a combination of severity with likelihood of occurrence (op. cit., p179). Everything clear?

Consider you are the punk in the Dirty Harry movie at the end of Mr Eastwood's pistol. `Do you feel lucky, today, punk? Well, do yah?'. Mr. Eastwood thus makes it clear he constitutes a hazard. If you feel lucky (the `other conditions in the environment' are that you do, and you will go for it), this will lead inevitably to an accident (for you, not for Mr. Eastwood). Furthermore, the hazard severity is high (being shot is quite severe), and the likelihood? Very close to that of how lucky you feel (Mr. E could in real life suffer a stroke at the very moment you move, but this is the movies so the likelihood is zero).

See the section The Measurement of Risk, below, for some more comments on risk and perception of risk (which are known to differ, according to research by social psychologists).

There follows a short synopsis of 1996 accident statistics, with a list of significant fatal airline accidents. Boeing has for many years produced an annual statistical summary of aircraft accidents. Some excerpts from the 1959-1995 summaries show:

The complete report, Statistical Summary of Commercial Jet Aircraft Accidents, is a standard work in aircraft safety, published every year. It may be obtained from Boeing Commercial Airplane Group, P.O.Box 3707, Seattle, WA 98124-2207, USA.

People seem to like to make comparisons between the risk to life of driving a car and the risk to life of flying on commercial carriers. But exactly what are these figures, where did they come from, and who did the comparisons? There were some articles in the journal Risk Analysis from 1989-91, using figures from the late 70's to late 80's, which I summarise in the essay To Drive or To Fly - Is That Really The Question?. The answer is that it depends on who you are. But one thing is pretty certain. If you're a drunken teenager, better take the bus.

Keith Hill's Comments on Aspects of Mary Schiavo's Book "Flying Blind, Flying Safe"
Keith Hill dealt with the FAA for a number of years as a Boeing employee and as an FAA Designated Engineering Representative (DER). He was Chief Engineer for embedded software for the Boeing 777 airplane, and was ultimately responsible for embedded software for all Boeing airplanes. He has contributed some comments on aspects of ex-DOT executive Mary Schiavo's 1997 book "Flying Blind, Flying Safe" concerning safety oversight of aviation in the USA. Keith brings his experience to bear on some of Schiavo's positions.

Some Opinions on Aviation Computer Safety
Vincent Verweij, a journalist with the Dutch National Television Channel 3, asked me some general questions about the safety of computer automation in commercial aviation. They were perceptive questions. I hope that the questions and my answers can serve as a general starting point for understanding some of the issues with automation in commercial aircraft.

Perrow's `System Accidents' - An Aviation Example
Charles Perrow's book Normal Accidents:Living With High-Risk Technologies (New York: Basic Books, 1984) is a standard amongst those of us recently concerned with accidents in complex systems. Perrow's thesis is that some accidents cannot be put down to failures of individual parts of the system, but to unfortunate and complex interactions of many traditional parts (I say `traditional', because the interface between parts throught which the interactions take place can itself properly be considered a `part' of the system). Two characteristics that Perrow singles out are interactive complexity and tight coupling. The first concept refers to how complex the interfaces between traditional parts are; the second to how easily events in one part of the system can propagate their effects through to remoter parts of the system. Perrow says "if interactive complexity and tight coupling - system charateristics - inevitably will produce an accident, I believe we are justified in calling it a normal accident, or a system accident." (p5). (He doesn't quite mean that - it's not those characteristics alone which produce accidents, since they're both state predicates; an accident is an event, and you normally cannot produce an event from a state alone on the level of quantum mechanics. Events cause other events, given a certain state.) He focuses on six system components: design, equipment, procedures, operators, supplies and materials, and environment; the DEPOSE components, and analyses various systems and accidents in terms of these.

Perrow is a pioneer in this area, and whether you agree with him in detail or not (and I have some reservations), it's pretty much required that one understand his work. A recent essay on an aviation accident by William Langewiesche, who writes responsibly and well, and whose prose is a joy to read, attempts to apply Perrow's ideas (and those of later sociologists Scott Sagan and Diane Vaughan), as well as describing in terms which one rarely reads exactly what it is like being around a major accident. I must confess to being speechless, sad, a little frightened, and feeling much too close to things after reading his account. Even though The Lessons of ValuJet 592 deals with an accident that is not computer-related, as far as anyone knows, the immediacy which Langewiesche brings to his descriptions, and the consideration to his thoughts, illuminates the horror and tragedy in our subject in a way I am unlikely to forget. Which is why I include it here.

On Keeping Your Mouth Shut: The Merits of Public Discussion
The goal of this compendium is to provide reliable information and discussion of computer-related issues in commercial airplane safety. Not everyone is content with such public discussion of the issues, in particular before the final report on the accident has been issued. A discussion arose in RISKS-18 as to the merits of such public discussion before "the final decision is in".

Overview of the Issues
I wrote a short Overview entitled Summary of Safety-Critical Computers in Transport Aircraft for RISKS-16.16 on 15 June 1994 . This is a good place to start if you're not already expert in the area. It describes the difference between the A320/330/340-series aircraft, which have `fly-by-wire' primary flight controls, and the older Airbus A300 and A310 aircraft, which have autopilots and flight management systems which include computer control, but whose primary flight controls are `conventional'. Many people have been confused over the difference. A similar distinction holds between the Boeing B777 and the B757/767 aircraft.

Technical Problems and Research

Flight Deck Automation Issues
Oregon State University and Research Integrations, Inc., under contract with the US FAA Human Factors Office (AAR-100), have completed a study intended to provide a comprehensive list of human factors issues in flight deck automation and a well-organised, reasonably complete set of data and other objective evidence related to those issues. This material is available on the WWW site Flight Deck Automation Issues. Project leaders are Kenneth Funk of Oregon State University, and Beth Lyall of Research Integrations, Inc.

The FAA Human Factors Team Report on Interfaces between Flightcrew and Modern Flight Deck Systems
Because of the Cali and Nagoya accidents, as well as an American Airlines' MD-83 incident at Bradley Airport in Connecticut in November 1995 (as stated in the foreword), the FAA convened a Human Factors team to report on the The Interfaces Between Flightcrews and Modern Flight Deck Systems, which report appeared on June 18, 1996. (This report is in Adobe Acrobat format. Those who don't have an Acrobat viewer may download one from http://www.adobe.com/Acrobat/.)

NASA Office of Aeronautics Human Factors Division
NASA Ames Research Center maintains a research division on human factors, whose mission statement includes the enhancement of aviation system safety "by integrating human performance characteristics and technological functions to anticipate (and minimize) the probability and adverse consequences of human or system errors." On-line publications on human-newtech interfaces include analysis of problems similar to those which have arisen in incidents reported elsewhere in this Compendium. Examples include "Oops, it didn't arm." - A Case Study of Two Automation Surprises by Everett Palmer; Altitude Deviations: Breakdowns of an Error-Tolerant System by Palmer, Edwin Hutchins, Richard Ritter and Inge van Cleemput; Mode Usage in Automated Cockpits: Some Initial Observations by Asaf Degani, Michael Shafto and Alex Kirlik; Use of the Operator Function Model to Represent Mode Transitions by Degani, Christine Mitchell and Alan Chappell; and On the Types of Modes in Human-Machine Interaction, by Asaf Degani.

Do Passenger Electronics Interfere With Aircraft Systems?
With the increasing use by passengers of electronic systems such as laptop computers, Gameboys, and (even though it's illegal almost everywhere) portable personal telephones, coupled with the increasing use of electronics for safety-critical and safety-related aircraft systems, there is concern about whether passenger electronics can interfere with aircraft systems. Such concerns are borne out by an increasing number of suspected-interference reports by line pilots -- FAA/NASA's ASRS, Europe's EUCARE and British Airways BASIS reporting systems all contain anecdotes. RTCA Special Committee 177 was formed in 1992 at FAA request to investigate and try to substantiate these incidents, but so far the interference patterns seem to have resisted easy duplication in a laboratory. The essay Electromagnetic Interference with Aircraft Systems: why worry? gives some background, and collects comments and first-hand anecdotes from colleagues, some of whom are professional pilots, on some of the phenomena which need to be explained. I also suggest some ways in which changes in the regulatory environment might aid in reporting and investigating such incidents. Albert Helfrick's short article, Avionics and Portable Electronics: Trouble in the Air ( Acrobat PDF, 26K), which discusses some of the technical background, appeared in Avionics News Magazine, September 1996.

In her article The Fall of TWA 800: The Possibility of Electromagnetic Interference, New York Review of Books, Special Supplement, April 9, 1998, pp59-76 (also available at http://jya.com/twa800-emi.htm), Elaine Scarry proposed that EMI might have been a causal factor in the crash of TWA800 in July 1996. I find this supposition highly implausible, and I wrote a critique of her argument, EMI and TWA800: Critique of a Proposal, Report RVS-J-98-03, on 10 April 1998.

Integrity of Navigation Data Used in FMS's
The final report on the Cali accident cited as one of four `contributing factors' to the accident the `FMS-generated navigational information that used a different naming convention from that published in navigational charts'. Recommendations 1 and 7 (of 17) to the FAA concern possible differences between authoritative navigation information and the use of this information in Flight Management Systems (FMS's). Recommendation 3 (of 3) to ICAO suggests to `Establish a single standard worldwide that provides an [sic] unified criteria for the providers of electronic navigational databases used in Flight Management Systems'. Jeppesen has responded to this report with specific Changes to NDB Navaid Identification in [Jeppesen] NavData [Database] (http://www.jeppesen.com/cali06.html).

National authorities supply navigational information to five industry data suppliers (Jeppesen, Racal Avionics, Aerad (British Airways), Swissair and GTE Government Services), which then supply this information to the almost twenty manufacturers of FMS's, many of whom have many different models. There is some concern about quality control in implementation of this data in FMS's. Although there is a standard, ARINC 424 from the industry/user group ARINC, which is `loosely followed' by the industry, this standard has no regulatory force and is not connected with any regulatory process. Shawn Coyle, of Transport Canada's Safety and Security division, has written a working paper, Aircraft On-Board Navigation Data Integrity - A Serious Problem, assessing the situation. It is not good. Coyle's argument is that FMS's are proliferating; that soon they will be used as primary navigation devices as GPS approaches come into use (they are advisory devices only at the moment - other avionics are the primary navigational devices); that they will therefore be used for precision instrument approaches, in which integrity of data is vital; and that there is no regulatory oversight into the integrity of the data used by these devices, nor into the process by which the data is implemented or updated in the FMS. Coyle gives eight examples in which nav data implemented in an FMS leads an aircraft to fly a profile different from the published procedure. Coyle says that Transport Canada is the first organisation to have systematically identified the problem.

U.S. Air Traffic Control Center Outages and the Advanced Automation System
1996-1997

Synopsis As reported occasionally in RISKS, the U.S. Air Traffic Control (ATC) system has suffered degradation of service and occasionally complete outages, due to power failures and computer problems. Some of the equipment running the Air Route Traffic Control Centers (ARTCCs) is very old, not maintained by the manufacturer any more, and only with difficulty maintained by the user - and many of the skilled maintenance engineers are reaching retirement age. This is the well-known problem of `legacy equipment'.

The FAA's system modernisation effort was started in 1981. The Advanced Automation System (AAS) contract with Loral (formerly IBM Federal Systems Division) was cancelled in mid-1994 because of `schedule slips and cost overruns', and a much-reduced AAS design is being implemented. The NTSB produced Special Investigation Report NTSB/SIR-96-01 on January 23, 1996 in which they assessed the safety implications of the outages and the planned modernisation effort. They found that, despite sometimes severe degradation of service (delays to traffic), for the one-year period from September 12, 1994 to September 12, 1995, there was only one reported `operational error', a loss-of-separation incident, at Oakland Center on August 9, 1995, and that the modernisation efforts were appropriate in their new, evolutionary, form.

The U.S. General Accounting Office (GAO) has also kept a close watch on the AAS. Reports RCED-97-51, Air Traffic Control: Status of FAA's Standard Terminal Automation System Replacement Project, AIMD-97-30, Air Traffic Control: Complete and Enforced Architecture Needed for FAA Systems Modernization, AIMD-97-47, Air Traffic Control: Immature Software Acquisition Processes Increase FAA System Acquisition Risks are available on the WWW. Overviews of what the GAO calls High-Risk Projects, which it considers "at high risk for waste, fraud, abuse and mismanagement" (!), and which include the FAA AAS, are also available: HR-97-1, High-Risk Series: An Overview, HR-97-9, High-Risk Series: Information Management and Technology, and HR-97-2, High-Risk Series: Quick Reference Guide.

A recent perspective on the U.S. Air Traffic Control system, suggesting that the most worrying aspects lie on the human side, in the working conditions of air traffic controllers as air traffic increases, was proposed by pilot and journalist William Langewiesche in the October 1997 Atlantic Monthly article Slam and Jam. Whether one agrees with Langewiesche's perspective or not, he writes responsibly and well, and is a joy to read.

Development Problems with the new U.K. National En-Route Center (NERC) air traffic control system
1996-1998

Synopsis Great Britain is building what was billed as the most advanced En-Route Air Traffic Control system in the world at the National En-Route Center (NERC) in Swanwick, Hampshire, to control traffic in the London Flight Information Region (FIR), which covers southern British airspace. The £350+ million system has run into problems, experiencing successive delivery delays, and some scaling problems. The contractor was also building the U.S. AAS system before it was cancelled, and I understand about 1M LOC (out of 2M OLC total) are being reused. I wrote a short note entitled Software problems with new-generation air-traffic control center about the problems as reported in a Flight International article in May 1997 for RISKS-19.18. A further comment by Andres Zellweger appeared in RISKS-19.23. Having been briefed by Bob Fletcher of NAV Canada concerning the new Canadian system, CAATS, and Bob Ratner (Ratner and Associates) and Bob Peake (Airservices Australia) on the new Australian system TAAATS, I wrote a memorandum to the Transport Subcommittee of the House of Commons, who were considering the question of NERC on 19 November, 1997, expressing my concern and giving my reasons. Subsequently, I was invited to give oral evidence before the Transport Subcommittee on 11 March, 1998 on the issues (a) how long an audit, the purpose of which would be to determine if the system could be made to work and when, would take; and (b) whether Sir Ronald Mason's assertion, that dual-sourcing is a `basic principle' of safety-critical system development, applied to the case of the new twin centers NERC and NSC (the New Scottish Center to be built in Prestwick) and therefore that the NSC contract, awarded to the same contractor as NERC, should be awarded instead to another contractor. My written evidence to the Transport Subcommittee expresses my views, and those of some colleagues, on these two issues. The Environment, Transport and Regional Affairs Committee's Report recommended inter alia a technical audit of the NERC system, to determine whether it can be made to work; to assess the safety of the current operation; whether traffic growth has been underestimated; and whether dissimilar systems should be used at NERC and its future companion NSC in Scotland.

The GPS Study by the Johns Hopkins University Applied Physics Laboratory
January 1999

Synopsis The Johns Hopkins University Applied Physics Laboratory, www.jhu.edu, was commissioned by the Air Transport Association (in concert with the FAA and the Aircraft Owners and Pilots Association) to determine the ability of GPS, GPS/WAAS and GPS/LAAS to satisfy Required Navigation Performance (RNP) "as expressed by accuracy, integrity, continuity and availability requirements." In other words, an assessment of the risks of using GPS, or GPS/WAAS, or GPS/LAAS, as a sole source of navigation information for flight. In brief, GPS alone won't hack it, although the other two might/will. Although the study dealt with the issue of intentional interference ("jamming") at some depth, it has been criticised for being somewhat cavalier about the consequences of a jamming incident onset while an aircraft is, say, on short final.

We include a PDF version of the full report, the Risk Assessment Study: Final Report

Non-Computer-Related Automation Problems
In view of recent incidents and accidents to the Airbus A320, A330 and A340 fly-by-wire aircraft, it is as well to remember that there has been hydro-mechanical (analogue! :-) control automation on aircraft for a long time, and this sort of automation can also have its problems. One view held by many in the industry is that computers alone alleviate more risks than they pose. A more sophisticated view would measure on something different from an ordinal scale the risks involved in the increasing uses of digital computers in avionics. (An ordinal scale is a scale on which every two states are in the relation `more' or `less' or `equally' to each other. See the classical work Foundations of Measurement Theory, Vol. 1 by D. H. Krantz, R. D. Luce, P. Suppes, and A. Tversky, Academic Press, 1971.)

The Boeing B737 and Airbus A320 are rival airplane series. The A320 is the subject of many reports in this compendium. The B737 has recently come under investigation for suspected and reported rudder-control anomalies. Some investigators suspect that such anomalies may have played a role in the unexplained crashed of United Airlines Flight 585 on 3 March 1991 near Colorado Springs, and USAir 427 on 8 September 1994 near Pittsburgh. The NTSB has prepared an extensive report on its investigations, released at a public meeting on 16 October 1996, which contains recommendations A-96-107 through A-96-120.

Fly-By-Wire Anomalies in Research Aircraft
John Rushby has collected some anecdotes about anomalies which appeared during flight research in USAF and NASA research aircraft, which appear in Anomalies in Digital Flight Control Systems (DFCS), a part of his forthcoming book with Cambridge University Press on the use of formal methods.

Pilot Authority, Automated Help, and CFIT Avoidance Manoeuvres in FBW Aircraft
March 1, 1999

Synopsis Captain Ron Rogers of the Air Line Pilots Association has produced two reports evaluating the Controlled Flight Into Terrain avoidance manoeuvre in highly-automated transports, and assessing the trade-off between pilot authority and automatic manoeuvring of the fly-by-wire (FBW) aircraft (those with digitally-automated primary control systems). These two issues have been at the center of debate concerning FBW in commercial air transports. Flight Test Results of the Controlled Flight Into Terrain (CFIT) Avoidance Maneuver in Fly-By-Wire (FBW) Transports and Pilot Authority and Aircraft Protections are both by Captain Rogers and published by the Air Line Pilots Association (ALPA).

The Benefits of Automation in Aircraft
We should not forget that the reason that automation is used in aircraft control and navigation is to make flying safer and easier for all concerned. While this survey concentrates on the risks, there are examples of clear safety benefits. Nancy Leveson in her Risks 17.21 contribution Good News For a Change notes some incidents in which the TCAS, the Traffic Alert and Collision Avoidance System, helped avoid potential accidents. TCAS is required by the FAA for commercial flightflights in the USA. Leveson was a consultant for the FAA on TCAS. Peter Ladkin in Digital Flight Control Systems help the U.S. Navy (Risks 17.89) notes an article in Flight International on how the U.S. Navy is speeding up acquisition of digital flight control systems for the F-14 in an effort to reduce loss-of-control accidents. A very visible example of the potential benefits of increased automation is the loss of a US Air Force T-43A (a military version of the B737-200) carrying US Secretary of Commerce Ron Brown near Dubrovnik in Croatia on April 3, 1996.

The Measurement of Risk
Even though air travel is reckoned by some measures to be very safe, many people become nervous when they must travel by air. There are probably many reasons for this, but one which has attracted the attention of aviation specialists and others is the psychological effect of perceived risk, the risk as experienced by the participant, rather than as assessed by the engineer. There seem to be some regularities to the circumstances in which perceived risk is higher than engineer's-risks. These regularities are summarised in
The Measurement of Risk: Community Measures vs. Scientific Measures by Dave Shaw;
Re: The Measurement of Risk by Peter Mellor;
Re: The Measurement of Risk by Martin Minow;
Re: The Measurement of Risk by Robert Walking-Owl;

Thinking about Causes
There is yet no general theory, although there is developed procedure, of how we ascertain causes in accident investigations. The three papers
The X-31 and A320 Warsaw Crashes: Whodunnit? by Peter Ladkin,
A Model for a Causal Logic for Requirements Engineering, by Jonathan Moffett, Jon Hall, Andrew Coombes and John McDermid, and
Reasons and Causes by Peter Ladkin
discuss the notions of reasons and causes with respect to aviation accidents as steps towards a general theory. These papers use the A320 Warsaw accident as an example (the references are repeated below).

WB-Analysis
WB-Analysis (Why...Because... Analysis) is a series of techniques of formal semantics and logic developed by Peter Ladkin and associates in the RVS group at the University of Bielefeld for the formal analysis and explication of accidents. WB-analysis has been applied successfully to various accident reports to clarify the exact causal role of the known events and system states, and the causal sequences arising in the course of the accident. Analyses and technical papers are to be found on the WB-Analysis Home Page.

Applications of Formal Methods
In the computer science community, the term Formal Methods denotes the use of mathematical methods, in particular those of mathematical logic and universal algebra, in the specification and verification of computational systems. (Here, the term verification means a formal mathematical proof that an implementation of a system fulfils its formal specification. Such proofs can be very hard and are almost always complicated. Therefore, formal verification is not without its detractors!) Formal methods in current industrial practice usually means the use of precise languages with a precise semantics to specify system behavior. It is undeniable that the use of formal methods aids precision. What is normally questioned is the actual cost/benefit ratio of using a particular formal method in a particular project. There is also common acknowledgement that developing formal methods, including verification, is a hard research area. Research in Formal Methods has led to Turing awards (the `Nobel prize' of computer science) for Britons Tony Hoare and Robin Milner, the Dutchman Edsger Dijkstra, and the Americans Robert Floyd, Dana Scott and John Hopcroft; and the 1991 Current Prize of the American Mathematical Society Prize in Automated Theorem Proving for US computer scientists Robert Boyer and J. Strother Moore.

The Formal Methods and Dependable Systems Group in the Computer Science Laboratory at SRI International has been pioneering formal methods for aerospace for a quarter century. Much of their work concerning SIFT (the first attempt to develop a provably-correct digital flight control system) in the 70's, and the subsequent development of the logic specification and proof systems EHDM and PVS, and their application to problems in digital flight control and avionics, is accessible with the WWW.

Much of SRI's work on formal methods for aviation systems has been supported by the NASA Langley Formal Methods Team, who also have publications in this area.

Nancy Leveson and her group at the University of Washington are applying formal methods (using RSML, a language for describing state machines suitable for requirements specification) to the analysis of TCAS II, the Traffic Alert and Collision Avoidance System, Version II. These papers may be found under the page for the Safety Research Project. Nancy Leveson has moved in 1999 to the Department of Aeronautics and Astronautics at MIT.

Other recent published academic research on applications of formal methods to aviation includes

The Incidents and Accidents

News on the China Airlines A300 Accident, Taipei, February 1998
16 February 1998

Synopsis There is as yet no evidence at all that this accident is computer-related. However, there has been considerable speculation by the Japanese and Taiwanese press (as well as by CNN!) as to whether there were computer-related problems. A background article, The Crash of Flight CI676, a China Airlines Airbus A300, Taipei, Taiwan, Monday 16 February, 1998: What We Know So Far, Report RVS-J-98-01 from the RVS group at the University of Bielefeld, gives extensive background to this accident, including a discussion of the causes of the Nagoya accident (below), also to a China Airlines A300, in 1994. The article also assesses what needs to be known before any conclusions at all can be drawn. The article is designed to be continuously revised as new information comes in.

The Korean Air Lines B747 CFIT Accident in Guam
6 August 1997

Synopsis Approaching Won Pat International Airport on a Localiser-only ILS approach to Runway 6 Left at night, Korean Air Lines Flight 801 impacted Nimitz Hill at 658ft just a few hundred yards from the VORTAC antenna, whose DME KE801 should have been using, and nearly 800ft below the minimum altitude at that point on the approach.

While initially the accident seemed to have little to do with automated systems, it turned out that the Minimum Safe Altitude Warning (MSAW) System used by the Agana tower controllers and installed at nearby Andersen Air Force Base some 10 nautical miles beyond the departure end of Rwy 6L had unbeknowst to controllers not been operational, due to software errors in a new software installation.

Furthermore, when the descent profile and CVR transcript became available, questions were raised about the crew's "resource management" that are also pertinent to dealing with more recent automation and procedures.

These two points were sufficient for us to include two documents on this accident in the compendium.

The first, The Crash of Flight KE801, a Boeing B747-300, Guam, Wednesday 6 August, 1997: What We Know So Far, puts together publically-available information from the weeks after the crash; analyses this information with a view to determining what facts were available; and compiles and comments on the often confusing, unreliable and occasionally frankly false information distributed by news organisations as well as other organisations involved in the crash. Part of the purpose of this commentary is to establish a `social context' for the aftermath of a crash, as the author attempted also to do for the case of Aeroperu Flight 603 in 1996.

We shall include the full set of documents provided by the US National Transportation Safety Board for the Public Hearings on the accident in March 1998, which is a local copy of the original Public Hearings documents on the NTSB Web site.

The FedEx MD11 Accident on Landing at Newark
30 July 1997

Synopsis A FedEx MD11 bounced on landing at Newark, lost an engine and wing, flipped over onto its back and burned. There were no injuries. The first landing induced a vertical acceleration of +1.67g and horizontal acceleration of 0.2g to the right. The aircraft bounced about 7 feet off the runway, stayed airborne about 3 seconds, pitched down and landed again, inducing vertical +1.7g and horizontal 0.4 right. According to Aviation Week, "investigators are assessing whether the pitchdown was a result of pilot input or a failure of the aircraft's flight controls." At that point, the right nacelle touched, the engine subsequently broke free and the right wing failed at the root and separated. Because the FDR's wiring was cut approximately five seconds before the end of the flight, investigators are also trying to recover data from the non-volatile memory of the vertical-acceleration tracking system, and also from the non-volatile memory of the full-authority digital engine controls (FADECs). Aviation Week's Safety Resource Center has an excellent page, SafeNews: FedEx MD-11 Crash, giving detailed information on the accident and the investigation.

The Birgen Air B757 Accident near Puerto Plata, Dominican Republic
6 February 1996

Synopsis This is only the second fatal accident to the B757 since introduction into service. The aircraft crashed into the sea minutes after takeoff. The CVR and FDR were recovered on February 28, 1996, and yielded good quality recordings.

A draft of the final report (in German!) from the Dirección General de Aeronáutica Civil of the Dominican Republic, obtained from the Deutsche Luftfahrtbundesamt, was digitised from a copy sent by Karsten Munsky of EUCARE in Berlin, to whom we are very grateful. This draft includes only report body. I understand there are 100+pp of attachments also.

On February 7, the FAA issued a Press Release (Office of public affair Press Releases) clarifying the role played by the U.S. FAA (Federal Aviation Administration) and the NTSB (National Transportation Safety Board) in the investigation. On March 1, a short statement of Factual Information from a preliminary review of CVR and FDR data was made available by the NTSB on behalf of the Dominican Republic civil aviation authorities. On March 18, a longer Press Release, accompanied by the CVR transcript, explained further what the FDR and CVR data indicated. David Learmount in Flight International, 27 March - 2 April 1996, deduced from the CVR transcript four salient observations on the crew behavior and I provide a fifth from the B757 Operations Manual B757 Air Data System description and schematic diagram(JPEG, GIF). To paraphrase Learmount's points, although confusion about operation of computer-assisted systems (autopilot, warning annunciations) played a role, this confusion would not have arisen but for inappropriate pilot decisions and actions. However, a blocked pitot tube and inappropriate pilot behavior are not the only potential factors under study. The NTSB has identified a potential improvement in B757/767 operating manual as a result of further analysis (short note). note on the Puerto Plata and Cali accidents, highlighting the human-computer interface (HCI) issues, appeared in RISKS-18.10, was rebroadcast on Phil Agre's RRE mailing list (May 7th), and became the subject of the what's happening column of the British Computer Society HCI interest group magazine Interactions, July/August 1996, p13.

The Aeroperu B757 Accident
2 October 1996

Synopsis AeroPeru Flight 603 took off from Lima, and almost immediately noticed problems with the air data. Without reliable air data readings from the usual sources, the aircraft eventually CFIT'd into the ocean. Evidence from the CVR transcript (in Spanish) and altimeters after the accident suggested the pilots were supposing they were much higher. The altimeters were recovered stuck at over 9000ft. Masking tape was found to be covering the left-side static ports, as shown in this JPEG picture, left there after a cleaning operation. There was initial speculation from various Peruvian authorities that there was a computer-related failure, which turned into acknowledgement that this was a non-computer-related failure mode when the tape was discovered. Another incident (thankfully not an accident) which involved a maintenance-induced common failure mode was that of an Eastern Airlines L-1011 out of Miami on May 5 1983, for which both the full NTSB Report and Synopsis are available below.

The information here is a resumé of known information, a high-level analysis of the failure modes of a B757 which would lead to an accident, details of the B757 pitot-static system, and a brief history of the news reports and statements made about this accident, for those whose interests stretch to the sociological. I would still recommend against attributing any cause prematurely (as had occurred and might still be occurring).

The CVR transcript (in Spanish) is also available.

A shorter note on this accident appeared in RISKS-18.51, the new findings were announced by Peter Neumann in a short note in RISKS-18.57, and my note detailing the latest findings appeared in RISKS-18.59

The Ariane 5 Failure
4 June 1996

Synopsis On 4 June 1996 the maiden flight of the Ariane 5 launcher ended in a failure, about 40 seconds after initiation of the flight sequence. At an altitude of about 3700 m, the launcher veered off its flight path, broke up and exploded. The failure was caused by "complete loss of guidance and attitude information" 30 seconds after liftoff. To quote the synopsis of the official report: "This loss of information was due to specification and design errors in the software of the inertial reference system. The extensive reviews and tests carried out during the Ariane 5 development programme did not include adequate analysis and testing of the inertial reference system or of the complete flight control system, which could have detected the potential failure." Because of this conclusion, the accident has generated considerable public and private discussion amongst experts and lay persons. Code was reused from the Ariane 4 guidance system. The Ariane 4 has different flight characteristics in the first 30 seconds of flight and exception conditions were generated on both IGS channels of the Ariane 5. Even though the Ariane is not a transport category airplane, I include it as an instructive example. It suggests that we have as much or more reason to worry about the `new, improved, extended Mark 2 version' as about the original version of FBW software. Henry Petroski, in Design Paradigms: Case Histories of Error and Judgement in Engineering (Cambridge University Press, 1994) makes this very point about the history of bridge-building in the nineteenth and twentieth centuries. Petroski notes that failures often came not from the first, careful, conservative implementation of a design, but from its extension. The European Space Agency has provided a summary of the Ariane accident report as a Press Release, and also the full text of the Inquiry Board Report on the Web.

The problem was caused by an `Operand Error' in converting data in a subroutine from 64-bit floating point to 16-bit signed integer. One value was too large to be converted, creating the Operand Error. This was not explicitly handled in the program (although other were) and so the computer, the Inertial Reference System (SRI) halted, as specified in other requirements. There are two SRIs, one `active', one `hot back-up' and the active one halted just after the backup, from the same problem. Since no inertial guidance was now available, and the control system depends on it, we can say that the destructive consequence was the result of `Garbage in, garbage out' (GIGO). The conversion error occurred in a routine which had been reused from the Ariane 4, whose early trajectory was different from that of the Ariane 5. The variable containing the calculation of Horizontal Bias (BH), a quantity related to the horizontal velocity, thus went out of `planned' bounds (`planned' for the Ariane 4) and caused the Operand Error. Lots of software engineering issues arise from this case history.

Jean-Marc Jézéquel and Bertrand Meyer wrote a paper, Design by Contract: The Lessons of Ariane, IEEE Computer 30(2):129-130 January 1997, in which they argued that a different choice of programming language would have avoided the problem. Taken at face value, they are clearly right -- a language which forced explicit exception handling of all data type errors as well as other non-normal program states (whether expected or not) would have required an occurrence of an Operand Error in this conversion to be explicitly handled. To reproduce the problem, a programmer would have had to have written a handler which said `Do Nothing'. One can imagine that as part of the safety case for any new system, it would be required that such no-op handlers be tagged and inspected. An explicit inspection would have caught the problem before launch. As would, of course, other measures. Jézéquel and Meyer thus have to make the case that the programming language would have highlighted such mistakes in a more reliable manner than other measures. Ken Garlington argues in his Critique of "Put it in the contract: The lessons of Ariane" [sic] that they do not succeed in making this case.

The paper The Ariane 5 Accident: A Programming Problem? by Peter Ladkin discusses the characterisation of the circumstances of the Ariane Flight 501 failure in the light of the extensive discussion amongst computer scientists of the failure. Gérard Le Lann has proposed in his article The Failure of Satellite Launcher Ariane 4.5 that the failure has little connection with software, but is a systems engineering failure, and his argument is compelling. Le Lann's analysis is also supported by inspection of the WB-Graph of the Ariane 501 Failure, prepared by Karsten Loer from the significant events and states mentioned in the ESA Accident Report.

This is not the first time that computers critical to flight control of an expensive, complex and carefully-engineered system have failed. See The 1981 Space Shuttle Incident.

The T-43A Accident near Dubrovnik
3 April 1996

Synopsis A US Air Force T-43A, a military version of the Boeing B737-200, crashed into terrain while on approach to Dubrovnik airport, Croatia, in conditions close to or below the published minimums for the approach. The US military publically released the results of its investigation in June, 1996. This report is culled from published news in the professional and general press concerning the accident and the final report, with some information on USAF safety report procedures from Lt.-Col. Thomas Farrier of the USAF Office of the Chief of Safety. A short note on automation and risk citing this accident appeared in RISKS-18.08.

It transpires that the aircraft was equipped with only one ADF (Automated Direction Finder), a navigation device described as `primitive' by certain Air Force staff (see report). It seems the aircraft was not as well equipped as normal civilian standards would require, and it flew off course and hit a mountain while in the last stages of approach (a CFIT, Controlled Flight into Terrain, accident). One speculated almost immediately that more sophisticated navigation equipment would have helped avoid the accident; and immediately on publication of the report, US Defense Secretary William Perry ordered equipment changes.

One may conclude from Secretary Perry's executive order that this is an example of an accident in which lack of sophisticated avionics played a role. I conclude it is an example of the risk of not using up-to-date avionics - a lesson we may forget when thinking solely about the risks of using computers.

Reports on the Martinair B767 EFIS-loss incident near Boston, MA
28 May 1996

Synopsis A Martinair Holland B767-300 on a scheduled flight from Amsterdam to Orlando, FL, suffered a partial power failure and lost all EFIS information, which includes all flight and navigation information. It continued on the electro-mechanical backup displays and diverted to Boston, where it landed with only partial flight control.

Early information on the incident is collected in a short report culled from reports in Flight International and Aviation Week. The investigation subsequently led to FAA Airworthiness Directive 96-19-10. The AD "is prompted by reports of interruptions of electrical power during flight due to improper installation of the main battery shunt and ground stud connection of the main battery. The actions specified in this AD are intended to prevent such electrical power interruptions, which could result in loss of battery power to the source of standby power to the airplane." Effective date is October 2, 1996. I am grateful to Hiroshi Sogame of All Nippon Airways Safety Promotion Committee for advising me of this AD. short note on the incident appeared in RISKS-18.19.

The American Airlines B757 Accident in Cali
20 December 1995

Synopsis The Boeing B757 aircraft is not `fly-by-wire', but relies on an FMGS and other computer systems for its normal operation. The accident report has not yet appeared. I dedicate this section to computer scientist Paris Kanellakis, who perished with his family in the aircraft. This CFIT (Controlled Flight Into Terrain) accident is the first accident for a B757 in a decade and a half of service.

The Aircraft Accident Report (the final report) from the Colombian Aeronautica Civil was released by the NTSB on 27 September, 1996. It is included here in two parts, the text with Appendix A and the Appendices B-F. The NTSB Recommendations to the FAA were published on October 16, 1996. I thank Barry Strauch, Chief of the Human Performance Division of the NTSB, for sending me copies of the final report and the recommendations; and Marco Gröning for engineering the pictures from Appendices B-F.

The paper Analysing the Cali Accident with a WB-Graph contains a WB-Graph causal analysis of the events and states in the Cali Report, prepared by Thorsten Gerdsmeier, Peter Ladkin and Karsten Loer. WB-analyses determine the causal relations between the events of an accident according to a rigorous formal semantics, and may provide insight into the accident. These analyses are presented in the form of a graph whose nodes are critical events and states. The Cali WB analysis-exposes some fundamental causal factors that were mentioned in the report, and also addressed in the NTSB's recommendations to the FAA, but not included in the report's list of probable causes and contributory factors.

Early News The NTSB issued a Press Release containing factual data, whose text contains the press release signed by the Head of the Columbian Oficina de Control y Seguridad Aerea.
The two relevant arrival and approach navigation plates are the Cali VOR/DME/NDB Rwy 19 Instrument Approach Procedure (http://www.jeppesen.com/cali-1.html) and the Cali Rozo One Arrival Procedure (http://www.jeppesen.com/cali.html).
The specialist weekly Flight International included reports and comment in its January editions. Computer-relevance appears in the crew's handling of the FGMS in concert with other procedures. However, they descended below the cleared altitude, and there appear to be other procedural inadequacies in their flying (see also Wally Roberts's TERPS Page for a further comment on this). The FAA is conducting a review of training at AA.

The short paper Comments on Confusing Conversation at Cali by Dafydd Gibbon and Peter Ladkin points out some linguistic features of the ATC - AA965 radio conversation immediately prior to the accident which might have contributed to the crew's confusion.

For those whose patience or WWW-bandwidth is limited, there is a synopsis of contemporary news concerning the FMC memory readout, giving the probable reason for the left turn away from course (namely that the pilots selected the ROZO beacon based on its identifier, but there is a specified difference between the ROZO beacon identifier and its identifier in the FMC database), the probable causes as contained in the final report, and suggested probable causes contained in American Airlines' submission to the docket in August.

As a result of the investigation of this accident, the NTSB issued a collection of safety recommendations on October 1, 1996, with the concurrence of the Aeronautica Civil of Colombia. These recommendations address various issues such as pilot and aircraft performance after the GPWS warning, specifically the feasibility of (retro-)fitting automatic speedbrake retractors (the pilots failed to retract speedbrakes, and also pulled up too far, momentarily going "through" the stick-shaker warning - some investigators believe that the aircraft could have missed the terrain, had an optimal escape manoeuver been executed: Flight International, 9-15 October, p9), modifications to FMS data presentation, evaluation of an Extended-GPWS system, a requirement to positively cross-check positional information on the FMS, certain enhancements to navigation charts, an ICAO review of navaid naming conventions, a program to enhance English fluency of controllers, and various other measures. (These last two measures address concerns also raised in the Confusing Conversation note.) note on the Puerto Plata and Cali accidents, highlighting the human-computer interface (HCI) issues, appeared in RISKS-18.10, was rebroadcast on Phil Agre's RRE mailing list (May 7th), and became the subject of the what's happening column of ACM Interactions, July/August 1996, p13.

The A320 Maintenance Incident at Gatwick
21 February 1995

Synopsis An A320 operated by Excalibur Airways took off from Gatwick, and the pilot found he could not turn left, and needed full left stick to keep the wings level. The airplane immediately returned for landing and landed safely. The airplane had come out of maintenance and some right-hand spoilers had been left in maintenance mode, during which they are free-moving. Reduced upper-wing pressure in flight caused them partially to deploy. Peter Mellor analyses the official report of the incident and draws some conclusions.

An edited abbreviation of AAIB Aircraft Accident Report 2/95 published in Aerospace, April 1995, the monthly of the Royal Aeronautical Society, London.

Computer-Related Factors in Incident to A320 G-KMAM, Gatwick on 26 August 1993,
by Peter Mellor.

The A330 Flight-Test Accident in Toulouse
30 June 1994

Synopsis An Airbus A330 aircraft on flight test in Toulouse crashed while performing an autopilot test during a maximum-performance go-around (in which the aircraft aborts an approach to landing and climbs away quicly into a holding pattern). The aircraft rolled sharply, and was not recovered by Airbus's chief test pilot in time to avoid hitting the ground. A very sad day.

Questions arose not only as to what the pilots had been doing, but also how they were aided or hindered by the design of the systems, including the cockpit interface and the behavior of the aircraft. The Rapport Préliminaire of the Commission d'Enquête is in French. The RISKS reports are:
A330 crash: Press Release by Peter Mellor;
Re: A330 crash by Curtis Jackson and Peter Ladkin;
A Correction ... by Peter Ladkin;
A330 crash investigation .... by Erik Hollnagel;
Some comments .... by Peter Ladkin.

The Tokyo-London A340 FMGS Problem
30 June 1994

Synopsis This incident attained Page 1 of the British daily newspaper The Independent when the report was published. The BBC also reported on it in the news on 15 March 1995. An Airbus A340 aircraft on a scheduled flight from Tokyo to London experienced intermittent failure of navigational and flight-status information on one or another of the EFIS diplays en route to London. When being sequenced for approach to Heathrow, the airplane displayed an odd reaction to an autopilot command (it went the `long way round' to capture a heading) and abrupt and undesired manoeuvering while attempting to capture the ILS (the localiser and glideslope). They came in on a radar surveillance approach, the pilots having lost faith in their on-board equipment to assure them a safe approach and landing. While the incident brought to light some problems with the ILS broadcast (the aircraft encountered a `false lobe'), the British CAA considered the problems with the A340 flight management computers severe enough that they asked the JAA (Joint Aviation Authority, which has major responsibility for coordinating the certification of civil transports in Europe) if they were aware of such problems during certification.

The original AAIB incident report, AAIB Bulletin No: 3/95.

A340 shenanigans by Les Hatton;
Re: A340 incident by Peter Ladkin and John Rushby;

A slight change... by Ric Forrester via Dave Horsfall refers to the same incident.

The A300 Crash in Nagoya
26 April 1994

Synopsis A China Airlines A300 (a non-`fly-by-wire' Airbus) crashed on landing at Nagoya in Japan. It turns out that the pilot flying had inadvertently triggered the `go-around' mode, as noticed by the captain (the non-flying pilot) but did not disconnect the autopilot, despite repeated instructions from the captain to do so (the A300 Operations Manual explicitly requires the pilot to disconnect the autopilot in such circumstances) until 40 seconds after it was noticed. The pilot flying tried to force the nose of the airplane down, and the autopilot, in go-around mode, reacted to the lack of climb by trimming pitch even further up. When the pilot eventually stopped pushing and the AP was disconnected, the captain took over. However, without the forward pressure on the yoke, the nose rose sharply, due to the extreme nose-up trim, and the plane stalled in an extreme nose-high configuration, and hit the ground tail-first. There were early rumors of unusually high levels of blood alcohol in the pilots' bodies (more than is expected as a natural by-product of death), and a complete power failure before the crash, but neither of these figured in the final report. The question, why the pilot flying did not disconnect the autopilot as he is required to and was instructed to multiple times, probably cannot be answered. As a result of this accident and other recent incidents and accidents, the US FAA started to `work with' China Air on its pilot training programs. final report (HTML) is large. The HTML version has been prepared for the WWW by Hiroshi Sogame of the Safety Promotion Committee of All Nippon Airways and Peter Ladkin. The Appendices, Photographs and Figures from the report have been prepared for the WWW by Marco Gröning. Included with the Appendices and Figures are HTML versions of the CVR transcript and (the English version of) the letter from the French Bureau Enquêtes Accidents, prepared for the WWW by Hiroshi Sogame of All Nippon Airways and Karsten Loer. [There are over 100 figures and photographs. The FDR charts have not yet been included, but will appear shortly. PBL]

The short paper WB-Graph of the A300 Accident at Nagoya contains the textual form of a WB-Graph causal analysis of the events and states in the Nagoya Report, prepared by Peter Ladkin and Karsten Loer. WB-analyses determine the causal relations between the events of an accident according to a rigorous formal semantics, and may provide insight into the accident.

For those without the desire to wade through the entire report, a synopsis and commentary on the final report is based on an article in Aviation Week and Space Technology, July 29th, 1996 issue. The final report contained no surprises, based on what was known shortly after the accident.

A note on the accident report appeared in RISKS-18.33. Early discussion of this accident in RISKS in 1994 led to much discussion about Airbus aircraft and accident statistics in general:
China Airlines A300 Crash by Mark Stalzer;
Re: China Air ... by David Wittenberg;
More on the A300 crash ... by Peter Ladkin;
Re: China Airlines ... by John Yesberg;
Re: China Airlines ... by Mark Terribile;
How to feel safer in an Airbus by Peter Ladkin;
Airbus A3(0?)0 deductions by Phil Overy;
Further Discussion by Mary Shafer, Robert Dorsett, Phil Overy and Wesley Kaplow;
Further Discussion by Robert Dorsett, Peter Ladkin, Wesley Kaplow, Peter Mellor and Bob Niland;
Summary of Safety-Critical Computers in Transport Aircraft by Peter Ladkin;
A320 Hull Losses by Peter Mellor.

The A320 Accident in Warsaw
14 September 1993

Synopsis A Lufthansa A320 landed at Warsaw airport in a thunderstorm. The landing appeared to be normal, smooth, even though somewhat fast. The pilots were unable to activate any of the braking mechanisms (spoilers, reverse thrust, wheelbrakes) for 9 seconds after `touchdown', at which point the spoilers and reverse thrust deployed. The wheelbrakes finally became effective 13 seconds after touchdown. The aircraft was by this time way too far along the runway to stop before the runway end. It ran off the end, and over an earth bank near the end of the runway, before stopping. Both pilots were very experienced A320 operators. The captain was returning to duty after illness and the first officer was a senior Airbus captain and training officer, who was monitoring the captain's flying skills on his return to service. The first officer died in the accident, as did a passenger who was overcome by smoke and didn't evacuate the aircraft, which burned.

The text of the Accident Report from the Polish authorities is reproduced here, along with
selected Appendices, namely
Section 4.2, CVR transcripts,
Section 5, Documentation of the Braking System, and
Section 6, Performance and Procedures Documentation.

The paper Analysing the 1993 Warsaw Accident with a WB-Graph contains a WB-Graph causal analysis of the events and states in the Warsaw Report, prepared by Michael Höhl and Peter Ladkin. WB-analyses determine the causal relations between the events of an accident according to a rigorous formal semantics, and may provide insight into the accident. These analyses are presented in the form of a graph whose nodes are critical events and states. The Warsaw WB analysis-exposes some fundamental causal factors that were mentioned in the report, but not included in the report's list of probable causes and contributory factors.

Clive Leyman, formerly reponsible for A320 landing-gear engineering at British Aerospace, and now a Visiting Professor at City University, London, has prepared an Engineering Analysis of the Landing Sequence which analyses the effects of all the factors on the stopping distance of DLH2904. Referenced in the analysis are graphs he plotted of Airspeed on Approach, Altitude and Windspeed in the Final Phases, Flare and Derotation Details, Calculated vs. Actual Distances, Stopping Distances, Ground Deceleration, and Runway Friction.

Questions from computer scientists and system experts focused on why the braking systems didn't deploy as expected by the pilots. The RISKS comments are:
Lufthansa in Warsaw by Peter Ladkin;
More News... by Peter Ladkin;
Re: Lufthansa Airbus ... by Udo Voges;
Lufthansa Warsaw Crash--A Clarification by Peter Ladkin;

More and more technical literature is discussing this accident for one reason or another.
The X-31 and A320 Warsaw Crashes: Whodunnit? by Peter Ladkin discusses causes and how to try to ensure more complete coverage of causal relations. The X-31 accident and the Warsaw A320 accident are analysed as examples.
A Model for a Causal Logic for Requirements Engineering, by Jonathan Moffett, Jon Hall, Andrew Coombes and John McDermid suggests a logical theory of causality for engineering and applies that to analyse braking in the Warsaw accident.
Reasons and Causes by Peter Ladkin discusses those notions in general, and comments extensively on the proposal of Moffett et al.

The Air Inter A320 Accident near Strasbourg
20 January 1992

Synopsis Approaching Strasbourg Airport on a VOR/DME approach to Runway 05, Air Inter Flight 148 DA initiated a 3,300 fpm descent at 11 DME from the VOR (STR) at 5,000ft altitude. At 9DME, they collided with Mont St.-Odile at an altitude of 2,620ft, at which point they would normally have been at 4,300ft altitude and an 800fpm descent to cross STR at 1320ft on their way to touch down.

The investigation commission used the SHEL model, which provides a conceptual framework for understanding the interfaces between different `subsystems' in operation. SHEL focuses on the four basic subsystems: software, hardware, "environment" and "liveware" (people). No definitive story was determined as to how the extraordinary rate-of-descent was actually initiated. The commission analysed all of the possible alternative scenarios thoroughly and based their conclusions and recommendations on these alternatives.

This accident generated much discussion and controversy within the aviation community, focusing often on the design of the autopilot interface, specifically the mode change between HDG V/S (Heading and Vertical Speed mode) and TRK FPA (Track and Flight Path Angle) modes, which were set by a `toggle'-type switch.

We include the Report of the Commission of Inquiry (in French) in full.

The Sydney A320/DC10 Incident
21 August 1991

Synopsis Sydney Airport was conducting Simultaneous Runway Operations on intersecting Rwys 34 and 25. A Thai Airways International DC-10 was landing on Rwy 34, and at the same time an Ansett Australian A320 on Rwy 25. Landing instructions to the DC-10 included that it stop short of the intersection. The A320 captain, who was not Pilot-Flying, judged that the DC-10 might not be able to comply, and initiated a successful go-around. The enquiry discovered "anomalies [....] with regard to the attitude control inputs on the A320, and in the braking system of the DC-10." In particular, the A320 DFDR recorded neutral and nose-down control inputs from the co-pilot, after he thought he had relinquished control to the captain (on the latter's go-around request); and there was an autobrake malfunction on the DC-10 that wheel braking on the DC-10, which should have occurred 4 seconds after deployment of ground spoilers, didn't commence until 23 seconds after spoiler deployment. I include some excerpts from the report B/916/3032 of the australian Bureau of Air Safety, forwarded to me by Robert Wilson of The Australian newspaper. My thanks to Robert for bringing this incident to my attention.

The Lauda Air B767 Accident
26 May 1991

Synopsis A Lauda Air Boeing B767-329ER suffered an in-flight upset and breakup over Thailand while climbing out at 7000m after takeoff from Bangkok. Analysis of the accident was hindered by damage to the Flight Data Recorder (FDR), which rendered it unreadable. Airline owner Niki Lauda said on 2 June 1991 that a thrust reverser had deployed in flight. Boeing initially denied that this was possible - the thrust reverser mechanism had an electro-hydraulic interlock which prevented this. Simulator trials showed that, if a thrust reverser were actually to deploy during flight, the B767 would be incapable of controlled flight unless "full wheel and full rudder were applied within 4-6s after the thrust reverser deployed" (Reverser balmed in Lauda crash report, Flight International, 1-7 September 1993, p5). Windtunnel data determined that the aerodynamic effect of the reverser plume in flight as the engine ran down to idle was a 25 per cent loss in lift across the wing. The report further determined that "[...] recovery from the event was uncontrollable [sic] for an unexpecting flight crew".

Further testing showed that disintegration of an oil seal could physically block a valve essential for the functioning of the interlock, leading to a scenario in which the reverser could, in fact, reverse thrust in flight. It was not determined if such an event happened to the accident aircraft. Subsequent to the discovery of this potential interlock failure mode, the FAA issued in August 1991 an AD prohibiting use of thrust-reverse on late-model B767s. Similar mechanims were also to be found on other aircraft, and after a solution to the problem was developed, Boeing retrofitted B737, B757 and B767 aircraft, 2,100 of them in all, with a third, mechanical, thrust-reverser interlock (which also required a hydraulic system mod on the B767).

There was a report by Bill Richards in the Seattle Post-Intelligencer of 14 December 1991 of the view of Darrell Smith, an ex-Boeing engineer, who had reported to Boeing that faults in the `proximity switch electronics unit' (PSEU) could have resulted in actual thrust-reverser deployment. Boeing passed on the report to the software writer, Eldec Corp (Boeing contracts out much of its software), but neither company had, as the time of reporting, studied Smith's argument in detail. I do not know the resolution of this issue. Thus, one may consider this accident to remain `computer-related' until one knows the resolution of Smith's reports. synopsis as well as the final accident report from the Thai authorities has been prepared for the WWW by Hiroshi Sogame of All-Nippon Airways Safety Promotion Committee, to whom we are are very grateful.

The official report on the crash determined

" [...] the probable cause of this accident to be uncommanded in-flight deployment of the left-engine thrust reverser, which resulted in loss of flightpath control. The specific cause of the thrust-reverser deployment has not been positively identified."
(op.cit., Flight International, 1-7 September 1993, p5).

The report of The Times, 3 June 1991, was relayed to RISKS-11.78 and RISKS-11.82 by the articles
Lauda Air Crash by Paul Leyland, and
Re: Lauda Air Crash by Steve Philipson.
Hermann Kopetz reported to RISKS-11.82 what appeared in the Austrian press in the article
Lauda Air Boeing 767 Aircraft Crash.
Boeing's initial denials were reported in the Washington Post of 3 June, relayed to RISKS-11.82 in
Lauda Air plane crash by Joe Morris.
The Wall Street Journal of 3 June 1991 reported that in order to obtain certification of the B767, Boeing had had to demonstrate the effects of in-flight reversal by flight test: also conyeved to RISKS-11.82 in
Re: Lauda Air crash by Jeremy Grodberg.
Peter Neumann reported on some of the details of the FADEC design in RISKS-11.84:
Lauda 767 crash by Peter G. Neumann.
The European, a weekly newspaper, carried an article by Mark Zeller entitled Boeing skipped essential test on Lauda crash jet, which clarified the situation over certification of the reverser mechanism. According to the FAA administrator at the time, James Busey, the interlock was demonstrated by attempted in-flight deployment, but only at low airspeed and idle thrust. Boeing had argued to the certification authority that `...sophisticated flight control computers made an accidental inflight deployment of the thrust reversers impossible' (I think Zeller meant FADECs - the B767 has no flight control computers in the strict sense). The report also stated that examination of the wreckage and the CVR showed that one reverser `...failed to lock in place...' and that the pilots had been discussed what to do about the warning light when the upset took place. The European's article was relayed to RISKS-11.95 and discussed by Peter Mellor:
Lauda air crash by Peter Mellor.
An article in the Seattle Times of 23 August 1991, Flawed part in 767 may be flying on other jets by Brian Acohido, reported in detail the possible oil-seal disintegration problem and that it didn't seem to be restricted to late-model B767 aircraft. This commentary was relayed to RISKS-12.16 by Nancy Leveson:
More on the Lauda air crash by Nancy Leveson.
Nancy also relayed Bill Richards' reporting of Darrell Smith's concerns about the PSEU to RISKS-12.69 in
More on Lauda crash and computers by Nancy Leveson.

Other comment on this accident may be found in RISKS-11.78, RISKS-11.79, RISKS-11.82, RISKS-11.84, RISKS-11.95.

Subsequently, there was some discussion whether measures taken by other manufacturers in the wake of the Lauda Air crash to prevent in-flight deployment of reversers had contributed to their lack of deployment when required in the A320 Warsaw accident:
Lufthansa in Warsaw by Peter B. Ladkin, and
Re: Lufthansa Airbus Warsaw Crash 14 Sep 93 by Udo Voges.
Noted human-factors expert Erik Hollnagel cited some CVR material from the crash whilst discussing the efficacy and design of alarms in
Re: alarms and alarm-silencing risks in medical equipment by Erik Hollnagel.

British Midland B737-400 at Kegworth
8 January 1989

Synopsis A Boeing 737-400 operated by British Midland Airways Ltd crashed short of East Midlands Aerodrome on an emergency approach with engine problems. The airplane crashed across the M1 motorway, coming to rest partly on the motorway embankment. G-OBME left Heathrow Airport at 1952hrs with 118 Passengers destined for Belfast. Climbing through 28,300 feet, a portion of a blade in the left (No. 1) engine 1 detached. This resulted in engine vibration and smoke, and fluctuations in the engine instruments. The crew misdiagnosed a fire in one of the two engines. The crew misidentified the source, throttled engine 2 back and shut it down. As engine 2 closed down, engine 1 stabilised, falsely confirming to the crew that they had acted correctly. The aircraft made an emergency diversion to East Midlands Airport, near Kegworth, Leicestershire, adjacent to the M1. The aircraft intercepted the Rwy 27 localizer 6nm from the threshold, for an ILS approach. During final approach, the engine vibrations resumed, leading to a loss of power and the crew were unable to maintain glideslope. The final report, AAIB Accident Report 4/90, Report on the accident to Boeing 737-400 G-OBME near Kegworth, Leicestershire on 8 January 1989 is available on the WWW. We have prepared an extract of the most relevant information concerning the automation. It is notable for investigating the digital presentation of engine status information in the `glass cockpit' 400-series B737; and also for being the only accident I am aware of in which a mistaken crew action correlated with a `false positive', a simultaneous event which mistakenly seemed to confirm that the action they had taken was right.

A 1985 B747 Incident
19 February 1985

Synopsis A China Air B747, flying on autopilot high over the Pacific, suffered an engine failure, followed by loss of control, and entered an inverted spin at about 40,000 ft. After experienced +6 to -4G on the way down, it was recovered at 9,000 ft and flown carefully to SFO in a mildly damaged state. Pilot judgement was faulty in three main respects (and excellent in one - recovering quickly from the spin when it was possible!). The major misjudgement was operating the aircraft at an altitude at which an engine loss would not enable the airplane to continue flying straight-and-level above stall speed - immediate nose-down was essential for recovery. The autopilot was not designed to operate in those conditions, and gave different control inputs which caused the aircraft to enter the inverted spin. The pilots took some time to determine what was happening. The final accident report, NTSB Aircraft Accident Report NTSB/AAR-86/03 was prepared for the WWW by Hiroshi Sogame of All-Nippon Airways Safety Promotion Committee, to whom we are especially grateful. A short note recounting the accident appeared in China Air incident... the real story by Peter Trei in RISKS in October 1986, summarising an article in Flying magazine's October 1986 issue.

The Eastern Airlines L1011 Common Mode Engine Failure
5 May 1983

Synopsis Eastern Airlines Flight 855, a Lockheed L1011 aircraft, suffered a common mode failure of all three engines after departure from Miami. The aircraft returned and landed safely at Miami - apparently only just. The incident is not computer-related, but illustrates maintenance-induced common mode failures such as happened to Aeroperu Flight 603. The full NTSB Report (175K + GIFs + JPEGs) has been digitised and prepared by Hiroshi Sogame of the Safety Promotion Committee of All-Nippon Airlines for inclusion here. We are very grateful to Mr. Sogame for his work.

A Space Shuttle Control Incident
1981

Synopsis After a delay in a space shuttle mission in 1981, the crew put in some time in the simulator in Houston. They tested a "Transatlantic abort" sequence, which dumps fuel and leads to a landing in Spain. The flight control computers "locked up and went catatonic". It turns out that an exception condition was generated by a `computed GOTO' (in the avionics software written in HAL/S), which led to an operating system livelock (the FCOS was written in assembler). The incident was recounted by Tony Macina and Jack Clemons to Alfred Spector and David Gifford for the Case Study: The Space Shuttle Primary Computer System, in Communications of the ACM 27(9), September 1984, pp874-900. The relevant excerpt is reprinted here.

Should we consider the Shuttle to be a transport category airplane? (Civilians have travelled on it.) Whatever, the incident is instructive, as well as interesting history.

The American Airlines DC10 Takeoff Accident in Chicago
25 May 1979

Synopsis At about 3pm CDT, AA Flight 191 took off from O'Hare's runway 32R in clear weather and fifteen miles visibility. During the takeoff rotation, the left engine and pylon assembly and a part of the leading edge of the left wing fell off. Climb continued until about 325ft altitude, when the aircraft rolled left inverted, the nose fell through and the aircraft impacted into open ground 4,600ft northwest of the departure end of the runway. The separation of the engine and pylon severed hydraulic lines, causing the high-lift devices (slats) on the leading edge to retract uncommanded. The slat position indicating system in the cockpit was inoperative, and the crew had no means of visually inspecting the state of the left wing during takeoff procedures. They apparently flew the takeoff according to recommended procedures, which specified a climb speed below the stall speed of the left wing with slats retracted. The left wing stalled and the roll began. Although the aircraft had no digital automation to speak of, the accident highlights both common-mode problems and human-interface issues, both important contributors to computer-related aircraft accidents. The initiating event was the separation of the engine and pylon and retraction of leading-edge slats due to the number 3 hydraulic lines being severed. However, the aircraft remained flyable - had the crew known. But the devices informing the crew of the aircraft's condition also failed due to the same event. We include the NTSB Aircraft Accident Report of December 21, 1979, the final report.

Miscellany

Bernard Ziegler Interview with Der Spiegel
Airbus Industrie's Technical Director Bernard Ziegler gave an interview on the Airbus accidents and their import to the German newsweekly Der Spiegel. A summary of and comment on this interview may be found in Summary... by Peter Ladkin.

A320 Third-Party Maintenance
Finally, one should not miss the excellent spoof A320 software goes on "3rd Party" maintenance by Peter Mellor, who thought it a reasonable story for April 1. Pete had later to explain the significance of the date to some horrified RISKS readers who took it seriously.....

Here's wishing everyone safe flying.
Peter Ladkin


Back to Top

Copyright © 1999 Peter B. Ladkin, 1999-02-08
Last modification on 1999-06-15
by Michael Blume