Loop Exits and Structured Programming:
                          Reopening the Debate

                            Eric S. Roberts
                     Department of Computer Science
                          Stanford University


NOTE

This paper has been submitted to the Twenty-sixth SIGCSE Technical
Symposium on Computer Science Education.  Copyright is retained by the
authors prior to publication release.

ABSTRACT

Internal exits from loops represent a critically important control
structure that should be taught in the introductory CS1 curriculum.
Without access to those facilities, students are often incapable of
solving simple programming problems that occur frequently in
applications.  This paper reviews the existing evidence in support of
such facilities and argues that it is important to reconsider our
traditional pedagogical approach as we adopt new languages of
instruction.

1.  INTRODUCTION

Twenty-five years ago, a fierce debate raged within the computer science
community over the issue of structured programming.  At its essence,
structured programming represents a disciplined approach to software
development based on abstraction, stepwise refinement, and a formal
approach to program correctness [Dahl72].  Unfortunately, as the debate
progressed, attention was focused less on the general issues raised by
structured programming than on a relatively small detail: whether or not
it is ever appropriate to use the goto statement in a well-structured
program.  Launched by a letter from Edsger Dijkstra in 1968
[Dijkstra68], the goto controversy raged back and forth over the next
several years, often resembling a religious war in emotional intensity.
In the late 1970s, the debate died down, and a period of relative peace
ensued.

The underlying controversy, however, has not gone away, and the question
of which control structures are acceptable remains a source of
passionate concern.  The intensity of that concern has been demonstrated
by some of the reactions I have received to the new CS1/CS2 curriculum
at Stanford and to my textbook, The Art and Science of C: A Library-
Based Approach [Roberts95], which reflects the curricular revisions we
have undertaken at Stanford.  When we converted the curriculum from
Pascal to C [Roberts93], I decided -- on the basis of many years of
classroom experience -- that we should move beyond the limited control
structures available in Pascal and allow students to exit from the
interior of a loop in the following two cases:

  1.  When the structure of a loop requires some preparation (such
      as reading an input value) before making the test for
      termination, I encourage students to use the break statement
      to force explicit exit from the interior of a loop.  This
      strategy, which is discussed in Section 3, allows students to
      solve what Dijkstra calls the loop-and-a-half problem in a way
      that both appeals to the student's intuition and avoids
      duplication of code.

  2.  In the context of a function, I encourage students to use the
      return statement as soon as the value of the function becomes
      known, even if that return statement would force an early exit
      from a loop within the function.  Allowing return to be used
      in this way makes it much easier for students to solve a
      variety of problems, as illustrated in Section 4.

Although the response to my text has been quite favorable, the decision
to allow internal loop exits in these restricted cases has elicited some
negative response.  One reviewer, for example, wrote that using break
"is akin to using the proverbial goto."  Another charged that my
approach violates the basic tenets of structured programming.  A third
declared that my use of break to exit from a while loop is simply
"unacceptable," ruling out any further consideration.

The negative reactions expressed in such reviews underscore the
continued existence of controversy over what control structures are
acceptable for academic use.  This paper summarizes the evidence
supporting the use of internal loop exits and embedded return
statements, and examines the dangers associated with restricting
students to a more tightly constrained control model.

For many years, it was easy to discount the evidence in favor of
internal loop exits.  As long as Pascal was overwhelmingly accepted as
the best language for teaching introductory programming, the issue had
little practical relevance.  In Pascal, internal loop exits are simply
not available, and those who adhere to the standard version of the
language are therefore forced to accept the control structures that
Pascal provides.

In the last few years, however, many schools have abandoned Pascal for
more modern languages.  For the most part, the new languages -- even
those that follow directly in the Pascal tradition such as Modula-2 and
Ada -- include both a structured mechanism for exiting from the interior
of a loop and an active return statement.  The availability of these
facilities gives new relevance to the underlying pedagogical question of
how to teach control structures at the introductory level.  It is time
to reopen the debate.

2.  SUFFICIENCY VERSUS PRACTICAL UTILITY

Before exploring the accumulated evidence that supports the use of
internal loop exits, it is important to acknowledge that the control
structures provided by Pascal have a sound theoretical basis.  In their
1966 paper, Boehm and Jacopini proved that it is possible to code any
flowchart program using four control structures: atomic actions,
sequential composition, if-then-else, and while-do.  These structures
became known as D structures, after Edsger Dijkstra, and were soon
extended to form a larger set of control primitives called D' structures
[Ledgard75] that also includes the if-then, repeat-until, and case
statements as they exist in Standard Pascal.  Thus, the statements
provided by Pascal indeed constitute a sufficient set of control
primitives, in the theoretical sense.

The key point, however, is that theoretical sufficiency is not in itself
the principal criterion for program language design.  After all, the
arithmetic if and goto statements provided by Fortran II also constitute
a sufficient set by this definition, but no one would seriously advocate
using those statements as the basis for control structures in this day
and age.  The discipline encouraged by the use of D structures leads
empirically to more reliable and more easily maintained programs than
Fortran's primitives support.  Thus, although the evolution of D
structures represented an important practical advance for programming
language design, theoretical sufficiency was not the primary reason for
its success.  If it were, language designers would have seen no need to
augment the set of D structures with the extremely useful D' forms.

A similar illustration can be drawn from the standard set of Boolean
operators.  Most languages define and, or, and not as the primitive
operations on Boolean data.  If one's primary concern is linguistic
economy, this set could easily be considered wasteful, because the
single operator nand (or, equivalently, the operator nor) would be
sufficient in theoretical terms.  This theoretical result is critical
for hardware designers, who are able to exploit the sufficiency of a
single operator to achieve greater regularity in the design of
integrated circuits.  For programmers, on the other hand, the fact that
nand is by itself sufficient has little practical relevance.  The
average applications programmer has no intuition regarding the behavior
of nand and therefore would not know how to use it effectively.
Programming problems tend to be phrased in terms of the traditional
logical connectives, and it is for this reason -- the admittedly
subjective criterion of "naturalness" in the sense of being consistent
with intuition -- that and, or, and not form the basis of Boolean
calculation.

The question of what set of control structures is appropriate is
therefore larger than the question of whether a particular set of
primitives is sufficient.  It is also important to consider whether the
control structures are suitable for the problems that tend to arise in
practical programming, or, perhaps more importantly, whether the
structures suit the ways in which programmers conceive of those
problems.  If they do not, and if empirical evidence indicates that
transforming a conceptual image of a problem into an approved control
framework tends to introduce error or unnecessary complexity, it is
important to examine that framework to determine whether extensions are
required.

3.  THE LOOP-AND-A-HALF PROBLEM

The fact that some loop structures seem to have a natural exit in the
middle has long been recognized.  In his 1974 comments on the goto
controversy [Knuth74], Don Knuth outlined his own perspective, as
follows:

      The iteration statements most often proposed as alternatives
      to goto statements have been "while B do S" and "repeat S
      until B".  However, in practice the iterations I encounter
      very often have the form

                A:  S;
                    if B then goto Z fi;
                    T;
                    goto A
                Z:

      where S and T both represent reasonably long sequences of
      code.  If S is empty, we have a while loop, and if T is
      empty we have a repeat loop, but in the general case it is a
      nuisance to avoid the goto statements.  A typical example of
      such an iteration occurs when S is the code to acquire or
      generate a new piece of data, B is the test for end of data,
      and T is the processing of that data.

Loops of this sort, in which some processing must precede the test, come
up quite often in programming and constitute what Dijkstra has called
the loop-and-a-half problem.  The canonical example of the loop-and-a-
half problem is precisely the one that Knuth describes: reading a set of
input values until some sentinel value appears.  If the basic strategy
is expressed in pseudocode, the problem has the following solution:

          loop
            read in a value;
            if value = Sentinel then exit;
            process the value
          endloop;

In this example, the loop/exit/endloop statement is an extension to the
Pascal repertoire and represents a loop that is terminated only by the
execution of the exit statement in the interior of the loop body.  The
loop/exit/endloop extension makes it possible to express the algorithm
so that the input value is read in from the user, tested against the
sentinel, and then processed as necessary.

The loop/exit/endloop approach, however, offends the sensibilities of
many computer science instructors, who have learned through their
experience with Pascal to "unwind" the loop so that the test appears at
the top of the while loop, as follows:

          read in a value;
          while value != Sentinel do begin
            process the value;
            read in a value
          end;

Unfortunately, this approach has two serious drawbacks.  First, it
requires two copies of the statements required to read the input value.
Duplication of code presents a serious maintenance problem, because
subsequent edits to one set of statements may not always be made to the
other.  The second drawback is that the order of operations in the loop
is not the one that most people would expect.  In any English
explanation of the solution strategy, the first step is to read in a
number, and the second is to process it by adding it to the total.  The
traditional Pascal approach reverses the order of the statements within
the loop and turns a read/process strategy into a process/read strategy.
As the rest of this section indicates, there is strong evidence to
suggest that the read/process strategy is more intuitive, in the sense
that it corresponds more closely to the cognitive structures programmers
use to solve such problems.

The problem of code duplication is in some sense objective.  The
statements that read in an input value do in fact appear twice in the
program, and it is easy to imagine that an edit in one copy might not be
made symmetrically in the other.  The second problem, however, is
inherently more subjective because it hinges on the question of which
loop order is more "intuitive."  What is intuitive for some may not be
intuitive for others, and it may be that a preference for one strategy
over another represents nothing more than an individual bias.  For that
reason, it is important to rely on empirical measurement of the success
that students have in using the different strategies.

The best known study of this form was conducted by Elliot Soloway,
Jeffrey Bonar, and Kate Ehrlich at Yale University in the early 1980s.
In their experiments, student programmers with varying levels of
experience were divided into experimental and control groups.  The
students in each group were then asked to write a program to average a
list of integers terminated by a sentinel value.  Those in the control
group were instructed to solve the problem using Standard Pascal.  Those
in the experimental group were asked to use a variant of Pascal (Pascal
L) in which the standard looping statements while and repeat were
replaced by a new loop construct having the following form:

          loop
            statements
            if condition leave;
            statements
          again

The loop statement allows for a structured exit from the interior of the
loop and therefore permits the use of a read/process strategy.  The
results of the experiment appear in the abstract of the paper published
in Communications of the ACM  [Soloway83]:

      Subjects overwhelmingly preferred a read/process strategy
      over a process/read strategy.  When writing a simple looping
      program, those using the loop . . . leave . . . again
      construct were more often correct than were those using the
      standard Pascal loop constructs.

The correctness argument is particularly compelling.  At all levels of
experience, students using Pascal L were more likely to write correct
programs than were their counterparts in the control group.  The outcome
is all the more significant because the students in the experimental
group had little chance to become familiar with the new loop construct,
as indicated by the following passage from the paper:

      Given that students were exposed to the loop . . . leave
      . . . again construct of Pascal for only a few minutes, and
      given that they had much more familiarity and experience
      with Pascal's standard loop constructs, we were quite
      impressed with the high performance of the Pascal L users.
      Thus, these data support the claim that people will write
      correct programs more often if they use the language that
      facilitates their preferred strategy.

I have looked for other studies that might contradict the findings of
the Soloway study.  Although I have found authors who assert that the
use of internal loop exits is wrong, I have encountered none that
support their claims with objective evidence.

It is also possible to use standard Pascal texts as evidence that the
read/process strategy in fact corresponds to student intuition.  An
examination of at the approach used to solve the read-until-sentinel
problem in the five best-selling Pascal textbooks [Cooper92, Dale92,
Koffman91, Leestma87, Savitch91] reveals that:

   o  In each of the textbooks, the problem is solved by unwinding
      the loop so that the read phase of the operation occurs twice:
      once prior to the loop and once at the end of the loop body.

   o  In every text except Leestma and Nyhoff, the authors derive
      their solution by first presenting the read/process strategy
      and subsequently modifying it to fit the process/read form
      required by Pascal's control structures.

To me, the approach these authors take suggests that they themselves
accept that students find the read/process strategy more intuitive.  To
encourage students to write programs that fit Pascal's restrictions,
these texts teach them -- often at great length -- to abandon their
initial strategy in favor of one that reverses the order of the
operations within the loop.  My own classroom experience certainly
supports this conclusion.  By introducing loop exits and using the
read/process strategy, I find that I can save almost an entire day's
lecture.

The concern, of course, about introducing an internal loop exit form is
that students will end up abusing it, since it is certainly possible to
do so.  Our experience at Stanford, however, suggests that this problem
does not in fact arise.  Students tend to write code based on the models
they are given, using internal loop exits only in the disciplined way
outlined in the class examples.  In the last three years, over 2500
students have completed CS1 at Stanford, and the teaching assistants
have reported no problems with overuse of the break construct.  This
finding is consistent with the results of a study by Sheppard, which
demonstrated that internal loop exits do not in fact compromise program
readability [Sheppard79].

4.  SEQUENTIAL SEARCH

In addition to encouraging the use of an internal loop exit to solve the
loop-and-a-half problem, I also teach students that it is appropriate to
use a return statement at any point in a function where the result value
becomes known, even if the return statement is nested in such a way that
it acts as a nonstandard exit from an enclosing control structure.  The
paradigmatic example that underscores the value of this technique is the
sequential search problem.

In essence, the sequential search problem consists of writing an
implementation for the following function header line:

          function Search(var list: IntList;
                          n, key: integer) : integer;

The parameter list is an array of integers that conforms to the Pascal
type definition

          type
            IntList = array [1..MaxList] of integer;

where MaxList is some constant upper bound on the list size.  The actual
number of elements actively stored in the array is typically smaller and
is given by the parameter value n.  The problem is to implement Search
in such a way that it returns the least index i for which list[i] is
equal to key; if the value key does not appear in list, the function
Search should return 0.  Functions that have this basic structure come
up frequently in programming, even at the introductory level.

If the language includes an active return statement, such as the one
provided by C, Modula-2, or Ada, the Search function has the following
concise and natural implementation:

          function Search(var list: IntList;
                          n, key: integer) : integer;
          var
            i: integer;
          begin
            for i := 1 to n do
              if list[i] = key then return i;
            return 0
          end;

If students are restricted to the facilities available in Standard
Pascal, the Search function, as simple and as fundamental as it is,
turns out to be extremely hard to code -- so difficult in fact that many
of them are incapable of developing a correct solution.  If you haven't
encountered this problem, it is worth spending a few minutes to write
the code on your own.

The difficulty of coding the sequential search algorithm was described
in 1980 in an insightful paper by Henry Shapiro that never received the
attention it deserved.  The statistics he reports are far more
conclusive than those in the Soloway study of internal loop exits.  Of
the students in his study who attempted to use the for loop strategy,
all of them managed to solve the problem correctly, even though they
were forced to use a goto statement to achieve the effect of the return
statement in the preceding example.  In fact, Shapiro notes that

      I have yet to find a single person who attempted a program
      using [this style] who produced an incorrect solution.

Students who attempted to solve the problem without using an explicit
return from the for loop fared much less well: only seven of the 42
students attempting this strategy managed to generate correct solutions.
That figure represents a success rate of less than 20% [Shapiro80].

That students would have difficulty in developing a general solution to
this problem is not really surprising.  When they are limited to
Pascal's control structures, the answer is quite complicated.  To see
just how complicated the solutions can get, it is again useful to
consider how the leading Pascal texts handle the problem.

The following code, adapted from the solution given on page 300 of
Leestma and Nyhoff [Leestma87], illustrates that a correct solution is
indeed possible:

          function Search(var list: IntList;
                          n, key: integer) : integer;
          var
            i: integer;
            found: boolean;
          begin
            i := 1;
            found := FALSE;
            while (i <= n) and not found do
              if list[i] = key then
                found := TRUE
              else
                i := i + 1;
            if found then
              Search := i
            else
              Search := 0
          end;

This solution introduces a Boolean variable and uses 11 lines of code to
do the work accomplished by three in the solution based on the return
statement.  It does, however, return the correct result.  The Dale and
Weems text [Dale92] uses a slight variation of this strategy that also
returns the correct result.

The other leading Pascal texts adopt a different approach.  The code
suggested independently by Cooper, Koffman, and Savitch [Cooper92,
Koffman91, Savitch91] has the following general form:

          function Search(var list: IntList;
                          n, key: integer) : integer;
          var
            i: integer;
          begin
            i := 1;
            while (i < n) and (list[i] <> key) do
              i := i + 1;
            if list[i] = key then
              Search := i
            else
              Search := 0
          end;

These authors have taken great pains to avoid some of Pascal's pitfalls.
The bounds test for the while loop specifies continuation only if i < n,
even though the test i <= n might seem more natural in this context.
The problem with the test i <= n is that Pascal's handling of the and
operator can generate a subscript violation if n is equal to MaxList.
On the last cycle, Pascal evaluates both halves of the condition and
therefore evaluates list[i] even if it has already determined that
i > n.  Fixing this problem also requires that the test in the if
statement check the element value and not the value of the index.
Unfortunately, this change in the final test introduces a new problem:
the function can fail if n is 0.  As written, the implementation tests
the first element in the array against the key even if the array has no
active elements.

In fairness to the authors, it is important to point out that the
original examples are not in fact buggy in the strictest sense.  Koffman
specifically rules out the possibility that n is 0 in the preconditions
of the function, although this relationship is not tested within the
implementation.  In both Cooper and Savitch, the upper bound is a
constant rather than a variable.  Since the constant is not declared to
be zero, the problem does not actually arise.  Even so, the important
point is that the coding from these texts cannot be considered as an
appropriate model for a general-purpose search function.  A student
trying to solve the general case might well do exactly what I did: go
through the functions and replace the constant bound with the
corresponding variable.  At that point, the implementation fails.

The conclusion that the sequential search problem is difficult for
students who use the Pascal strategy is strongly supported by our
experience at Stanford.  When we taught the introductory course in
Pascal, we worked hard to present the sequential search algorithm in a
careful and rigorous way that would ensure correct operation in
special-case situations.  Despite that care, we discovered that even our
best students had trouble regenerating the solution after taking the
course.  The situation is entirely different now that we use a return
statement to implement the sequential search algorithm.  Students who
manage to understand arrays at all understand the sequential search
algorithm immediately; it no longer represents a source of difficulty.

5.  CONCLUSIONS

When students are limited to the control structures available in Pascal,
they tend to have significant difficulty solving certain programming
problems that come up frequently in practical applications.  Allowing
students to use control forms that provide a structured mechanism for
loop exit increases their comprehension along with their ability to
write correctly functioning code.  Although these facilities are absent
from Pascal, they are widely available in modern programming languages
and should be introduced when those languages are used in an
instructional setting.

To a significant extent, the discipline that Pascal encouraged has had a
positive effect on the computer science curriculum.  As we move to new
programming languages, however, it is crucial to analyze our experience
to determine what aspects of traditional practice are predicated more on
the particular nature of Pascal than on some more general principle.  If
our programming methodology can be defended on its own merits, we should
continue to support it.  On the other hand, if our students are unable
to write correct code using the existing paradigms, we must reevaluate
them.  As Dijkstra observed, "the tools we use have a profound (and
devious!) influence on our thinking habits, and, therefore, on our
thinking abilities" [Dijkstra82].  As we adopt new tools, we must be
ready to think in new ways.

REFERENCES

[Boehm66] C. Boehm and G. Jacopini, "Flow diagrams, Turing machines, and
languages with only two formation rules," Communications of the ACM, May
1966.

[Cooper92] Doug Cooper, Oh! Pascal! Turbo Pascal 6.0 (third edition),
New York: Norton, 1992.

[Dahl72] Ole Johan Dahl, Edsger W. Dijkstra, and C.A.R.  Hoare,
Structured Programming, London: Academic Press, 1972.

[Dale92] Nell Dale and Chip Weems, Turbo Pascal (third edition),
Lexington, MA: D.C. Heath, 1992.

[Dijkstra68] Edsger W. Dijkstra, "Goto statement considered harmful,"
letter to the editor, Communications of the ACM, March 1968.

[Dijkstra82] Edsger W. Dijkstra, "How do we tell truths that might
hurt," SIGPLAN Notices, March 1982.

[Knuth74] Donald Knuth, "Structured programming with goto statements,"
Computing Surveys, December 1974.

[Koffman91] Elliot Koffman with Bruce Maxim, Turbo Pascal (third
edition), Reading, MA: Addison-Wesley, 1991.

[Ledgard75] Henry Ledgard and Michael Marcotty, "A genealogy of control
structures," Communications of the ACM, November 1975.

[Leestma87] Sanford Leestma and Larry Nyhoff, Pascal: Programming and
Problem Solving (second edition), New York: Macmillan, 1987.

[Roberts93] Eric S. Roberts, "Using C in CS1: Evaluating the Stanford
Experience," SIGCSE Bulletin, March 1993.

[Roberts95] Eric S. Roberts, The Art and Science of C: A Library-Based
Approach, Reading, MA: Addison-Wesley, 1995.

[Savitch91] Walter Savitch, Pascal: An Introduction to the Art and
Science of Programming (third edition), Redwood City, CA:
Benjamin/Cummings, 1991.

[Shapiro80] Henry Shapiro, "The results of an informal study to evaluate
the effectiveness of teaching structured programming," SIGCSE Bulletin,
December 1980.

[Sheppard79] S. Sheppard, B. Curtis, P. Millman, and J.  Clement,
"Modern coding practices and programmer performance," Computer, December
1979.

[Soloway83] Elliot Soloway, Jeffrey Bonar, and Kate Ehrlich, "Cognitive
strategies and looping constructs: an empirical study," Communications
of the ACM, Vol.  26, No. 11, November 1983.

[Yuen94] C. K. Yuen, "Programming the premature loop exit: from
functional to navigational," SIGPLAN Notices, March 1994.