Knight Capital illustrates danger of speed to market, says software research scientist Bill Curtis
Bill Curtis, vice president and chief scientist at CAST Research Labs, and a Fellow of the IEEE, has commented on the Knight Capital debacle, noting that the financial industry faces a critical battle to ensure quality control of software.
Speed kills. This admonition is just as true in software as in any other endeavour. The challenge is software is even more complex and therefore kills in even more insidious ways.
The speed to market demanded in competition among financial institutions, such as Knight Capital, is exceptional. Underpinning this speed is the use of computers where nanoseconds can be the difference between winners and watchers. However, this pace of competition demands a constant upgrading of software capabilities to best, or at least equal, the competition. Unfortunately the discipline of software development is too often stretched to its limits to satisfy these demands with dependable quality. The pressure to cut corners to meet the demands of speed is one cause of the increasing stream of high-profile, mega-costly failures in financial systems.
Software quality comes in two forms. First is the functionality, what the software is supposed to do. Second is the structural integrity, the engineering soundness of the system. Think of the difference as “this room is designed for the functions it needs to support” versus “the walls of this room are strong enough to keep the roof from caving in.” Software developers are reasonably good at delivering the correct functionality, and the majority of the functional flaws are caught with standard testing. Although testers have been steadily improving the speed with which they can adequately test the functionality, shortcuts through testing lead to the inevitable operational problems. Nevertheless, delivering the correct functionality has frequently not been the primary problem causing high-profile outages. The causes are most frequently in the system’s structure rather than its functionality.
Structural flaws are very difficult to detect through the normal tests since testing usually focuses on the functional requirements of the system. Worse, the complexity of modern systems, which are composed of sub-systems written in different programming languages on different machines, guarantees no single person or even group of people can know enough to understand the entire system. Complicating this is different parts of the system may be developed by different sources, many employed offshore by different companies. Testing alone is no longer sufficient to ensure a dependable system. Full system dependability requires an analysis of the structural integrity of the entire integrated system and a dynamic analysis of its performance under realistic operational conditions. These techniques must be applied to every piece of software to be inserted into the operational environment, even if they are touted as only a minor upgrade.
Too often we hear that a new upgrade to the system has caused a failure. This is frequently a problem of two subsystems that worked correctly by themselves, but encountered a fatal flaw in their interaction. The only way to protect against these insidious failures is to ensure suppliers are using proven high-quality software development techniques and that the integrated system is thoroughly evaluated before it is deployed into operations. The rush to operations shortens the time available for quality analysis, resulting in only the obvious flaws being detected. The more complex structural flaws that require more complex forms of evaluation remain lurking in the system. Even modern development techniques called agile methods, designed to shorten the time to deliver working software, focus their testing more on the functionality with less time available for structural analysis of the entire system before delivering software.
No organisation can remove all the defects in its software before going operational. The problem now is they do not do enough analysis to make strategic decisions about which defects present the most serious risks to business operations. If financial institutions are going to stem the flood of costly failures, they must temper speed with software engineering discipline and adopt at least the minimum quality practices to reduce business risk. The software engineering world has developed methods to steadily shorten delivery times while increasing quality, but no advance is sufficient to reduce the risks of imprudent shortcuts.