What kind of bugs do "goto" statements lead to? Are there any historically significant examples?

What kind of bugs do “goto” statements lead to? Are there any historically significant examples?

03/11/2022 softwareengineering

I understand that save for breaking out of loops nested in loops; the goto statement is evaded and reviled as a bug prone style of programming, to never be used.

XKCD
^{Alt Text: “Neal Stephenson thinks it’s cute to name his labels ‘dengo’ “
See the original comic at: http://xkcd.com/292/}

Because I learned this early; I don’t really have any insight or experience on what types of bugs goto actually leads to. So what are we talking about here:

Instability?
Unmaintainable or unreadable code?
Security vulnerabilities?
Something else entirely?

What kind of bugs do “goto” statements actually lead to? Are there any historically significant examples?

Here’s a way to look at it I have not seen in the other answers yet.

It is about scope. One of the main pillars of good programming practice is keeping your scope tight. You need tight scope because you lack the mental ability to oversee and understand more than just a few logical steps. So you create small blocks (each of which becomes “one thing”) and then build with those to create bigger blocks which become one thing, and so on. This keeps things manageable and comprehensible.

A goto effectively inflates the scope of your logic to the entire program. This is sure to defeat your brain for any but the smallest programs spanning only a few lines.

So you will not be able to tell if you made a mistake anymore because there is just too much to take in and check for your poor little head. This is the real problem, bugs are just a likely result.

It isn’t that the goto is bad by itself.
(After all, every jump instruction in a computer is a goto.)
The problem is that there is a human style of programming that pre-dates structured programming, what could be called “flow-chart” programming.

In flow-chart programming (which people of my generation learned, and was used for the Apollo moon program) you make a diagram with blocks for statement executions and diamonds for decisions, and you could connect them with lines that go all over the place. (So-called “spaghetti code”.)

The problem with spaghetti code is that you, the programmer, could “know” it was right, but how could you prove it, to yourself or anybody else?
In fact, it might actually have a potential misbehavior, and your knowledge that it is always correct could be wrong.

Along came structured programming with begin-end blocks, for, while, if-else, and so on.
These had the advantage that you could still do anything in them, but if you were at all careful, you could be sure your code was correct.

Of course, people can still write spaghetti code even without goto. The common method is to write a while(...) switch( iState ){..., where different cases set the iState to different values. In fact, in C or C++ one could write a macro to do that, and name it GOTO, so saying you’re not using goto is a distinction without a difference.

As an example of how code-proving can preclude unrestricted goto, a long time ago I stumbled on a control structure useful for dynamically-changing user interfaces.
I called it differential execution.
It is fully Turing-universal, but its correctness proof depends on pure structured programming – no goto, return, continue, break, or exceptions.

enter image description here

Why is goto dangerous?

goto doesn’t cause instability by itself. Despite about 100,000 gotos, the Linux kernel is still a model of stability.
goto by itself should not cause security vulnerabilities. In some languages however, mixing it with try/catch exception management blocks could lead to vulnerabilities as explained in this CERT recommendation. Mainstream C++ compilers flag and prevent such errors, but unfortunately, older or more exotic compilers don’t.
goto causes unreadable and unmaintainable code. This is also called spaghetti code, because, like in a spaghetti plate, it’s very difficult to follow the flow of control when there are too many gotos.

Even if you manage to avoid spaghetti code and if you use only a few gotos, they still facilitate bugs like and resource leaking:

Code using structure programming, with clear nested blocks and loops or switches, is easy to follow; its flow of control is very predictable. It’s therefore easier to ensure that invariants are respected.
With a goto statement, you break that straightforward flow, and break the expectations. For example, you might not notice that you have still to free resources.
Many goto in different places can send you to a single goto target. So it’s not obvious to know for sure the state you are in when reaching this place. The risk of making wrong/unfounded assumptions is hence quite big.

Additional information and quotes:

E.Dijkstra wrote an early essay about the topic already in 1968: “Go To Statement Considered Harmful“
Brian.W.Kernighan & Dennis.M.Ritchie wrote in the C programming language:

C provides the infinitely-abusable goto statement and labels to
branch to. Formally the goto is never necessary, and in practice it
is almost always easy to write code without it. (…)
Nonetheless we will suggest a few situations
where goto ‘s may find a place. The most common use is to abandon
processing in some deeply nested structures, such as breaking out of
two loops at once. (…)
Although we are not dogmatic about the matter, it does seem that goto statements should be used sparingly, if at all.

James Gosling & Henry McGilton wrote in their 1995 Java language environment white paper:

No More Goto Statements
Java has no goto statement. Studies illustrated that goto is (mis)used more often than not simply “because
it’s there”. Eliminating goto led to a simplification of the language
(…) Studies on approximately 100,000 lines of C code determined that
roughly 90 percent of the goto statements were used purely to obtain
the effect of breaking out of nested loops. As mentioned above,
multi-level break and continue remove most of the need for goto
statements.
Bjarne Stroustrup defines goto in his glossary in these inviting terms:

goto – the infamous goto. Primarily useful in machine generated C++ code.

When could goto be used?

Like K&R I’m not dogmatic about gotos. I admit that there are situations where goto could be ease one’s life.

Typically, in C, goto allows multilevel loop exit, or error handling requiring to reach an appropriate exit point that frees/unlocks all the resources that were allocated so far (i.e.multiple allocation in sequence means multiple labels). This article quantifies the different uses of the goto in the Linux kernel.

Personally I prefer to avoid it and in 10 years of C, I used maximum 10 gotos. I prefer to use nested ifs, which I think are more readable. When this would lead to a too deep nesting, I’d opt either to decompose my function in smaller parts, or use an boolean indicator in cascade. Today’s optimizing compilers are clever enough to generate almost the same code than the same code with goto.

The use of goto heavily depends on the language:

In C++, proper use of RAII causes the compiler to automatically destroy objects that go out of scope, so that the resources/lock will be cleaned anyway, and no real need for goto any more.
In Java there’s no need for goto (see Java’s author quote above and this excellent Stack Overflow answer): the garbage collector that cleans the mess, break,continue, and try/catch exception handling cover all the case where goto could be helpful, but in a safer and better manner. Java’s popularity proves that goto statement can be avoided in a modern language.

Zoom on the famous SSL goto fail vulnerability

Important Disclaimer: in view of the fierce discussion in the comments, I want to clarify that I don’t pretend that the goto statement is the only cause of this bug. I don’t pretend that without goto there would be no bug. I just want to show that a goto can be involved in a serious bug.

I don’t know how many serious bugs are related to goto in the history of programming: details are often not communicated. However there was a famous Apple SSL bug that weakened the security of iOS. The statement that led to this bug was a wrong goto statement.

Some argue that the root cause of the bug was not the goto statement in itself, but a wrong copy/paste, a misleading indentation, missing curly braces around the conditional block, or perhaps the working habits of the developer. I can’t neither confirm any of them: all these arguments are probable hypotheses and interpretation. Nobody really knows. (meanwhile, the hypothesis of a merge that went wrong as someone suggested in the comments seems to be a very good candidate in view of some other indentation inconsistencies in the same function).

The only objective fact is that a duplicated goto led to exit the function prematurely. Looking at the code, the only other single statement that could have caused the same effect would have been a return.

The error is in function SSLEncodeSignedServerKeyExchange() in this file:

    if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
        goto fail;
    if ((err =...) !=0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;  // <====OUCH: INDENTATION MISLEADS: THIS IS UNCONDITIONDAL!!
    if (...)
        goto fail;

    ... // Do some cryptographic operations here

fail:
    ... // Free resources to process error

Indeed curly braces around the conditional block could have prevented the bug:
it would have led either to a syntax error at compilation (and hence a correction) or to a redundant harmless goto. By the way, GCC 6 would be able to spot these errors thanks to its optional warning to detect inconsistent indentation.

But in first place, all these gotos could have been avoided with more structured code. So goto is at least indirectly a cause of this bug. There are at least two different ways that could have avoided it:

Approach 1: if clause or nested ifs

Instead of testing lots of conditions for error sequentially, and each time sending to a fail label in case of problem, one could have opted for executing the cryptographic operations in an if-statement that would do it only if there was no wrong pre-condition:

    if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) == 0 &&
        (err = ...) == 0 ) &&
        (err = ReadyHash(&SSLHashSHA1, &hashCtx)) == 0) &&
        ...
        (err = ...) == 0 ) )
    {
         ... // Do some cryptographic operations here
    }
    ... // Free resources

Approach 2: use an error accumulator

This approach is based on the fact that almost all the statements here call some function to set an err error code, and execute the rest of the code only if err was 0 (i.e., function executed without error). A nice safe and readable alternative is:

bool ok = true;
ok =  ok && (err = ReadyHash(&SSLHashSHA1, &hashCtx))) == 0;
ok =  ok && (err = NextFunction(...)) == 0;
...
ok =  ok && (err = ...) == 0;
... // Free resources

Here, there is not a single goto: no risk to jump to quickly to the failure exit point. And visually it would be easy to spot a misaligned line or a forgotten ok &&.

This construct is more compact. It is based on the fact that in C, the second part of a logical and (&&) is evaluated only if the first part is true. In fact, the assembler generated by an optimizing compiler is almost equivalent to the original code with gotos: The optimizer detects very well the chain of conditions and generate code, which at the first non null return value jumps to the end (online proof).

You could even envisage a consistency check at the end of the function that could during the testing phase identify mismatches between the ok flag and the error code.

assert( (ok==false && err!=0) || (ok==true && err==0) );

Mistakes such of a ==0 inadvertently replaced with a !=0 or logical connector errors would easily be spotted during the debugging phase.

As said: I don’t pretend that alternative constructs would have avoided any bug. I just want to say that they could have made the bug more difficult to occur.

The famous Dijkstra article was written at a time when some programming languages were actually capable of creating subroutines having multiple entry and exit points. In other words, you could literally jump into the middle of a function, and jump out at any place within the function, without actually calling that function or returning from it in the conventional way. That’s still true of assembly language. Nobody ever argues that such an approach is superior to the structured method of writing software that we now use.

In most modern programming languages, functions are very specifically defined with one entry and one exit point. The entry point is where you specify the parameters to the function and call it, and the exit point is where you return the resulting value and continue execution at the instruction following the original function call.

Within that function, you ought to be able to do whatever you wish, within reason. If putting a goto or two in the function makes it clearer or improves your speed, why not? The whole point of a function is to sequester a bit of clearly-defined functionality, so that you don’t have to think about how it works internally anymore. Once it’s written, you just use it.

And yes, you can have multiple return statements inside a function; there’s still always one place in a proper function from which you return (the back side of the function, basically). That’s not at all the same thing as jumping out of a function with a goto before it has a chance to properly return.

enter image description here

So it’s not really about using goto’s. It’s about avoiding their abuse. Everyone agrees that you can make a terribly convoluted program using gotos, but you can do the same by abusing functions as well (it’s just a lot easier to abuse gotos).

For what it’s worth, ever since I graduated from line-number-style BASIC programs to structured programming using Pascal and curly-brace languages, I’ve never had to use a goto. The only time I’ve been tempted to use one is to do an early exit from a nested loop (in languages that don’t support multi-level early exit from loops), but I can usually find another way that is cleaner.

What kind of bugs do “goto” statements lead to? Are there any historically significant examples?

I used to use goto statements when writing BASIC programs as a child as a simple way to get for and while loops (Commodore 64 BASIC didn’t have while-loops, and I was too immature to learn the proper syntax and usage of for-loops).
My code was frequently trivial, but any loop bugs could be immediately attributed to my usage of goto.

I now use primarily Python,
a high level programming language that has determined it has no need for goto.

When Edsger Dijkstra declared “Goto considered harmful” in 1968 he did not give a handful of examples where related bugs could be blamed on the goto, rather, he declared that goto was unnecessary for higher level languages and that it should be avoided in favor of what we consider normal control flow today: loops and conditionals. His words:

The unbridled use of the go to has an immediate consequence that it becomes terribly hard to find a meaningful set of coordinates in which to describe the process progress.
[…]
The go to statement as it stands is just too primitive; it is too much an invitation to make a mess of one’s program.

He probably had mountains of examples of bugs from every time he debugged code with goto in it. But his paper was a generalized position statement backed up by proof that goto was unnecessary for higher level languages. The generalized bug is that you may have no ability to statically analyze the code under question.

Code is much harder to statically analyze with goto statements, especially if you jump back in your control flow (which I used to do) or to some unrelated section of code. Code can and was written this way. It was done to highly optimize for very scarce computing resources and thus the architectures of the machines.

An Apocryphal Example

There was a blackjack program that a maintainer found to be quite elegantly optimized but also impossible for him to “fix” due to the nature of the code. It was programmed in machine code which is heavily reliant on gotos – so I think this story quite relevant. This is the best canonical example I can think of.

A Counter-Example

However, the C source of CPython (the most common and reference Python implementation) uses goto statements to great effect. They are used to bypass irrelevant control flow inside functions to get to the end of functions, making the functions more efficient without losing readability. This respects one of the ideals of structured programming – to have a single exit point for functions.

For the record, I find this counter-example to be quite easy to statically analyze.

When the wigwam was current architectural art, their builders could undoubtedly give you sound practical advice regarding construction of wigwams, how to let smoke escape, and so on. Fortunately, today’s builders can probably forget most of that advice.

When the stagecoach was current transportational art, their drivers could undoubtedly give you sound practical advice regarding horses of stagecoaches, how to defend against highwaymen, and so on. Fortunately, today’s motorists can forget most of that advice, too.

When punched cards were current programming art, their practitioners could likewise give you sound practical advice regarding organization of cards, how to number statements, and so on. I am not sure that that advice is very relevant today.

Are you old enough even to know the phrase “to number statements”? You don’t need to know it, but if you don’t, then you aren’t familiar with the historical context in which the warning against goto was principally relevant.

The warning against goto is just not very relevant today. With basic training in while/for loops and function calls, you won’t even think to issue a goto very often. When you do think of it, you probably have a reason, so go ahead.

But can the goto statement not be abused?

Answer: sure, it can be abused, but its abuse is a pretty tiny problem in software engineering compared to far, far more common mistakes such as using a variable where a constant will serve, or such as cut-and-paste programming (otherwise known as neglect to refactor). I doubt that you’re in much danger. Unless you are using longjmp or otherwise transferring control to faraway code, if you think to use a goto or you would just like to try it for fun, go ahead. You’ll be fine.

You may notice the lack of recent horror stories in which goto plays the villain. Most or all of these stories seem to be 30 or 40 years old. You stand on solid ground if you regard those stories as mostly obsolete.

To add one thing to the other excellent answers, with goto statements it can be difficult to tell exactly how you got to any given place in the program. You can know that an exception occurred at some specific line but if there are goto‘s in the code there is no way to tell what statements executed to result in this exception causing state without searching the whole program. There is no call stack and no visual flow. It is possible that there is a statement 1000 lines away which put you in a bad state executed a goto to the line which raised an exception.

goto is harder for humans to reason about than other forms of flow control.

Programming correct code is hard. Writing correct programs is hard, determining if programs are correct is hard, proving programs correct is hard.

Getting code to vaguely do what you want is easy compared to everything else about programming.

goto solves some programs with getting code to do what you want. It does not help make checking for correctness easier, while its alternatives often do.

There are styles of programming where goto is the appropriate solution. There are even styles of programming where its evil twin, comefrom, is the appropriate solution. In both of these cases, extreme care has to be done to ensure you are using it in an understood pattern, and lots of manual checking that you aren’t doing something difficult to ensure correctness with it.

As an example, there is a feature of some languages called coroutines. Coroutines are threadless threads; execution state without a thread to run it on. You can ask them to execute, and they can run part of themselves and then suspend themselves, handing back flow control.

“Shallow” coroutines in languages without coroutine support (like C++ pre-C++20 and C) are possible using a mixture of gotos and manual state management. “Deep” coroutines can be done using the setjmp and longjmp functions of C.

There are cases where coroutines are so useful that writing them manually and carefully is worth it.

In the case of C++, they are being found useful enough that they are extending the language to support them. The gotos and manual state management is being hidden behind a zero cost abstraction layer, permitting programmers to write them without the difficulty of having to prove their mess of goto, void**s, manual construction/destruction of state, etc is correct.

The goto gets hidden behind a higher level abstraction, like while or for or if or switch. Those higher level abstractions are easier to prove correct and check.

If the language was missing some of them (like some modern languages are missing coroutines), either limpling along with a pattern that doesn’t fit the problem, or using goto, becomes your alternatives.

Getting a computer to vaguely do what you want it to do in common cases is easy. Writing reliably robust code is hard. Goto helps the first far more than it helps the second; hence “goto considered harmful”, as it is a sign of superficially “working” code with deep and hard to track down bugs. With sufficient effort you can still make code with “goto” reliably robust, so holding the rule of as absolute is wrong; but as a rule of thumb, it is a good one.

Take a look at this code, from http://www-personal.umich.edu/~axe/research/Software/CC/CC2/TourExec1.1.f.html that was actually part of a large Prisoner’s Dilemma simulation. If you’ve seen old FORTRAN or BASIC code, you’ll realize it’s not that unusual.

C  Not nice rules in second round of tour (cut and pasted 7/15/93)
   FUNCTION K75R(J,M,K,L,R,JA)
C  BY P D HARRINGTON
C  TYPED BY JM 3/20/79
   DIMENSION HIST(4,2),ROW(4),COL(2),ID(2)
   K75R=JA       ! Added 7/32/93 to report own old value
   IF (M .EQ. 2) GOTO 25
   IF (M .GT. 1) GOTO 10
   DO 5 IA = 1,4
     DO 5 IB = 1,2
5  HIST(IA,IB) = 0

   IBURN = 0
   ID(1) = 0
   ID(2) = 0
   IDEF = 0
   ITWIN = 0
   ISTRNG = 0
   ICOOP = 0
   ITRY = 0
   IRDCHK = 0
   IRAND = 0
   IPARTY = 1
   IND = 0
   MY = 0
   INDEF = 5
   IOPP = 0
   PROB = .2
   K75R = 0
   RETURN

10 IF (IRAND .EQ. 1) GOTO 70
   IOPP = IOPP + J
   HIST(IND,J+1) = HIST(IND,J+1) + 1
   IF (M .EQ. 15 .OR. MOD(M,15) .NE. 0 .OR. IRAND .EQ. 2) GOTO 25
   IF (HIST(1,1) / (M - 2) .GE. .8) GOTO 25
   IF (IOPP * 4 .LT. M - 2 .OR. IOPP * 4 .GT. 3 * M - 6) GOTO 25
   DO 12 IA = 1,4
12 ROW(IA) = HIST(IA,1) + HIST(IA,2)

   DO 14 IB = 1,2
     SUM = .0
     DO 13 IA = 1,4
13   SUM = SUM + HIST(IA,IB)
14 COL(IB) = SUM

   SUM = .0
   DO 16 IA = 1,4
     DO 16 IB = 1,2
       EX = ROW(IA) * COL(IB) / (M - 2)
       IF (EX .LE. 1.) GOTO 16
       SUM = SUM + ((HIST(IA,IB) - EX) ** 2) / EX
16 CONTINUE

   IF (SUM .GT. 3) GOTO 25
   IRAND = 1
   K75R = 1
   RETURN

25 IF (ITRY .EQ. 1 .AND. J .EQ. 1) IBURN = 1
   IF (M .LE. 37 .AND. J .EQ. 0) ITWIN = ITWIN + 1
   IF (M .EQ. 38 .AND. J .EQ. 1) ITWIN = ITWIN + 1
   IF (M .GE. 39 .AND. ITWIN .EQ. 37 .AND. J .EQ. 1) ITWIN = 0
   IF (ITWIN .EQ. 37) GOTO 80
   IDEF = IDEF * J + J
   IF (IDEF .GE. 20) GOTO 90
   IPARTY = 3 - IPARTY
   ID(IPARTY) = ID(IPARTY) * J + J
   IF (ID(IPARTY) .GE. INDEF) GOTO 78
   IF (ICOOP .GE. 1) GOTO 80
   IF (M .LT. 37 .OR. IBURN .EQ. 1) GOTO 34
   IF (M .EQ. 37) GOTO 32
   IF (R .GT. PROB) GOTO 34
32 ITRY = 2
   ICOOP = 2
   PROB = PROB + .05
   GOTO 92

34 IF (J .EQ. 0) GOTO 80
   GOTO 90

70 IRDCHK = IRDCHK + J * 4 - 3
   IF (IRDCHK .GE. 11) GOTO 75
   K75R = 1
   RETURN

75 IRAND = 2
   ICOOP = 2
   K75R = 0
   RETURN

78 ID(IPARTY) = 0
   ISTRNG = ISTRNG + 1
   IF (ISTRNG .EQ. 8) INDEF = 3
80 K75R = 0
   ITRY = ITRY - 1
   ICOOP = ICOOP - 1
   GOTO 95

90 ID(IPARTY) = ID(IPARTY) + 1
92 K75R = 1
95 IND = 2 * MY + J + 1
   MY = K75R
   RETURN
   END

There’s a lot of issues here that go well beyond the GOTO statement here; I honestly think the GOTO statement was a bit of a scapegoat. But the control flow is absolutely not clear here, and code is mixed together in ways that make it very unclear what’s going on. Even without adding comments or using better variable names, changing this to a block structure without GOTOs would make it much easier to read and follow.

One of the principles of maintainable programming is encapsulation. The point is that you interface with a module/routine/subroutine/component/object by using a defined interface, and only that interface, and the results will be predictable (assuming that the unit was effectively tested).

Within a single unit of code, the same principle applies. If you consistently apply structured programming or object-oriented programming principles, you will not:

create unexpected paths through the code
arrive in sections of code with undefined or impermissible variable values
exit a path of the code with undefined or impermissible variable values
fail to complete a transaction unit
leave portions of code or data in memory that are effectively dead, but ineligible for garbage cleanup and reallocation, because you have not signalled their release using an explicit construct that invokes releasing them as the code control path exits the unit

Some of the more common symptoms of these processing errors include memory leaks, memory hoarding, pointer overflows, crashes, incomplete data records, record add / chg / delete exceptions, memory page faults, and so on.

The user-observable manifestations of these problems include user interface lock-up, gradually decreasing performance, incomplete data records, inability to initiate or complete transactions, corrupted data, network outages, power outages, embedded systems failures (from the loss of control of missiles to the loss of tracking and control ability in air traffic control systems), timer failures, and so on and so on. The catalog is very extensive.

goto is dangerous because it is usually used where not needed. Using anything that is not needed is dangerous, but goto especially. If you google you will find lots of errors around caused by goto, this is not per-se a reason to not using it (bugs always happens when you use language features because that’s intrinsic in programming), but some of them are clearly highly related to goto usage.

Reasons to use/not use goto:

If you need a loop, you have to use while or for.
If you need to do conditional jump, use if/then/else
If you need procedure, call a function/method.
If you need to quit a function, just return.

I can count on my fingers places where I’ve seen goto used and used properly.

CPython
libKTX
probably few more

In libKTX there is a function that has the following code

if(something)
    goto cleanup;

if(bla)
    goto cleanup;

cleanup:
    delete [] array;

Now in this place goto is usefull because the language is C:

we are inside a function
we cannot write a cleanup function (because we enter another scope, and making accessible caller function state is more burden)

This use case is usefull because in C we do not have classes, so the simplest way to cleanup is using a goto.

If we had the same code in C++ there’s no need for goto anymore:

class MyClass{
    public:

    void Function(){
        if(something)
            return Cleanup(); // invalid syntax in C#, but valid in C++
        if(bla)
            return Cleanup(); // invalid syntax in C#, but valid in C++
    }

    // access same members, no need to pass state (compiler do it for us).
    void Cleanup(){

    }



}

What kind of bugs it can lead to? Anything. Infinite loops, wrong order of execution, stack screw..

A documented case is a vulnerability in SSL that allowed Man in the middle attacks caused by wrong use of goto: here’s the article

That’s a typo error, but went unnoticed for a while, if the code were structured in an other way, tested properly such error could have not been possible.

LEAVE A COMMENT Hủy