F-16 Flight Control Systems
The DFCS of the AFTI-F16 employed an ``asynchronous'' design. In such designs, the redundant channels run fairly independently of each other: each computer samples sensors independently, evaluates the control laws independently, and sends its actuator commands to an averaging or selection component that drives the actuator concerned. Because the unsynchronized individual computers may sample sensors at slightly different times, they can obtain readings that differ quite appreciably from one another. The gain in the control laws can amplify these input differences to provide even larger differences in the results submitted to the output selection algorithm. During ground qualification of the AFTI-F16, it was found that these differences sometimes resulted in a channel being declared failed when no real failure had occurred . Accordingly, a rather wide spread of values must be accepted by the threshold algorithms that determine whether sensor inputs and actuator outputs are to be considered ``good.'' For example, the output thresholds of the AFTI-F16 were set at 15% plus the rate of change of the variable concerned; in addition, the gains in the control laws were reduced. This increases the latency for detection of faulty sensors and channels, and also allows a failing sensor to drag the value of any averaging functions quite a long way before it is excluded by the input selection threshold; at that point, the average will change with a thump that could have adverse effects on the handling of the aircraft .
An even more serious shortcoming of asynchronous systems arises when the control laws contain decision points. Here, sensor noise and sampling skew may cause independent channels to take different paths at the decision points and to produce widely divergent outputs. This occurred on Flight 44 of the AFTI-F16 flight tests . Each channel declared the others failed; the analog back-up was not selected because the simultaneous failure of two channels had not been anticipated and the aircraft was flown home on a single digital channel. Notice that all protective redundancy had been lost, and the aircraft was flown home in a mode for which it had not been designed-yet no hardware failure had occurred.
Another illustration is provided by a 3-second ``departure'' on Flight 36 of the AFTI-F16 flight tests, during which sideslip exceeded 20 degrees, normal acceleration exceeded first -4g, then +7g, angle of attack went to -10 degrees, then +20 degrees, the aircraft rolled 360 degrees, the vertical tail exceeded design load, all control surfaces were operating at rate limits, and failure indications were received from the hydraulics and canard actuators. The problem was traced to a fault in the control laws, but subsequent analysis showed that the side air-data probe was blanked by the canard at the high angle of attack and sideslip achieved during the excursion; the wide input threshold passed the incorrect value through, and different channels took different paths through the control laws. Analysis showed this would have caused complete failure of the DFCS and reversion to analog backup for several areas of the flight envelope .
Several other difficulties and failure indications on the AFTI-F16 were traced to the same source: asynchronous operation allowing different channels to take different paths at certain selection points. The repair was to introduce voting at some of these ``software switches.'' (The problems of channels diverging at decision points, and also the thumps caused as channels and sensors are excluded and later readmitted by averaging and selection algorithms, are sometimes minimized by modifying the control laws to amp in and out more smoothly in these cases. However, modifying control laws can bring other problems in its train and raises further validation issues.) In one particular case, repeated channel failure indications in flight were traced to a roll-axis software switch. It was decided to vote the switch (which, of course, required ad hoc synchronization) and extensive simulation and testing were performed on the changes necessary to achieve this. On the next flight, the problem was there still. Analysis showed that although the switch value was voted, it was the unvoted value that was used . (This bug is an illuminating example. At first, it looks like programming slip-the sort of late-lifecycle fault that was earlier claimed to be very reliably eliminated by conventional V&V. Further thought, however, shows that it is really a manifestation of a serious design oversight in the early lifecycle (the requirement to synchronize channels at decision points in the control laws) that has been kludged late in lifecycle.)
The AFTI-F16 flight tests revealed numerous other problems of a similar nature. Summarizing, Mackall, the engineer who conducted the flight-test program, writes :
``The criticality and number of anomalies discovered in flight and ground tests owing to design oversights are more significant than those anomalies caused by actual hardware failures or software errors.
``...qualification of such a complex system as this, to some given level of reliability, is difficult ... [because] the number of test conditions becomes so large that conventional testing methods would require a decade for completion. The fault-tolerant design can also affect overall system reliability by being made too complex and by adding characteristics which are random in nature, creating an untestable design.
``As the operational requirements of avionics systems increase, complexity increases... If the complexity is required, a method to make system designs more understandable, more visible, is needed.
``... The asynchronous design of the [AFTI-F16] DFCS introduced a random, unpredictable characteristic into the system. The system became untestable in that testing for each of the possible time relationships between the computers was impossible. This random time relationship was a major contributor to the flight test anomalies. Adversely affecting testability and having only postulated benefits, asynchronous operation of the DFCS demonstrated the need to avoid random, unpredictable, and uncompensated design characteristics.''Clearly, much of Mackall's criticism is directed at the consequences of the asynchronous design of the AFTI-F16 DFCS. Beyond that, however, I think the really crucial point is that captured in the phrase ``random, unpredictable characteristics.'' Surely, a system worthy of certification in the ultra-dependable region should have the opposite properties-should, in fact, be predictable: that is, it should be possible to achieve a comprehensive understanding of all its possible behaviors. What other basis for an ``engineering judgment'' that a system is fit for its purpose can there be, but a complete understanding of how the thing works and behaves? Furthermore, for the purpose of certification, that understanding must be communicated to others-if you understand why a thing works as it should, you can write it down, and others can see if they agree with you. Of course, writing down how something as complicated as how a fault-tolerant flight-control system works is a formidable task-and one that will only be feasible if the system is constructed on rational principles, with aggressive use of abstraction, layering, information-hiding, and any other technique that can advance the intellectual manageability of the task. This calls strongly for an architecture that promotes separation of concerns (whose lack seems to be the main weakness of asynchronous designs), and for a method of description that exposes the rationale for design decisions and that allows, in principle, the behavior of the system to be calculated (i.e., predicted or, in the limit, proved). It is, in my view, in satisfying this need for design descriptions which, in principle at least, would allow properties of the designs to be proved, that formal methods can make their strongest contribution to quality assurance for ultra-dependable systems: they address (as nothing else does) Mackall's plea for ``a method to make system designs more understandable, more visible.''
The AFTI-F16 flight tests are unusually well documented; I know of no other flight-control system for which comparable data are publicly available. However, press accounts and occasional technical articles reinforce the AFTI-F16 data by suggesting that timing, redundancy management, and coordination of replicated computing channels are tricky problems that are routinely debugged during flight test.