CSE 15L, Winter 2010

Scientific Debugging

The Scientific Debugging Method

The scientific method applied to debugging software is broadly similar to the scientific method used to study nature, but there are important differences. We can outline the method in a number of steps:

Characterize intended behavior. Focus on what matters to someone using the program—what observable behavior the program is supposed to exhibit—not how it works internally. Ask: What input does it take? What output should it give? And so on.

At first, assume the program will behave as intended. Predict what it should do under certain conditions such as particular inputs, and run the program under those conditions to see if you get what you expect. This process of testing is how we look for bugs. For each bug, one at a time, we find we follow the steps below, which might be called "debugging proper":

Identify a failure. A failure is a deviation from intended behavior, not to be confused with a defect in the code. To identify a failure, then, we have to know first of all what is intended; this is why step zero comes first. A failure generally falls into one of three categories: compile-time, run-time, or logic.
Form a hypothesis. A hypothesis is an attempt to understand and explain what is happening. It can be wrong. A hypothesis to explain a failure normally suggests a defect that causes a failure under certain conditions, and explains how the defect produces the failure under those conditions.
Make a prediction. A hypothesis explains what you've already observed; a prediction states what you expect to observe under new conditions. A scientific prediction logically follows from a hypothesis, so that if the prediction is false, the hypothesis is incorrect.
Perform an experiment. A good experiment involves control conditions identical to those that produced prior observations, and usually exactly one experimental condition is changed. In scientific debugging you might give the program different input, add debugging output, use an interactive debugger, or (with due caution) alter the code in some way.
Observe results. What is the behavior now? Remember, results are observations, not interpretations. "The program printed 42" is an observation. "The program worked" is an interpretation.
Reach a conclusion. A conclusion is an interpretation. Did the results match your prediction? If not, your hypothesis must be false (unless you made some other mistake). Do the results suggest a new hypothesis? Note: if you changed the code and the results aren't what you expected, you should probably undo the change!

Use the scientific method to test hypotheses like the bug is fixed, this change fixed the bug, and the program behaves as intended too! These lead to predictions, experiments, and results of their own. Don't jump to the conclusion that a bug is fixed, or that a program works as intended, without good (documented) evidence.

Note also the process isn't as linear as it may appear above. For tricky bugs especially, early hypotheses are often wrong, so one failure can certainly lead to more than one hypothesis. Each hypothesis should give rise to multiple cycles of prediction, experiment, results, and conclusion.

When you see the defect first

Sometimes you may spot a defect in the code before you've observed a failure. In this case you've got a hypothesis, but you still need to observe the failure before you remove the defect. If you're not fixing a failure, you're not debugging! Predict the failure you think the defect will cause, do an experiment, and record the results. If you see the failure you expected, go ahead and try the fix, and do an experiment to confirm your prediction the failure won't recur.