What is so bad with goto when it’s used for these obvious and relevant cases?

I have always known that goto is something bad, locked in a basement somewhere never to be seen for good but I ran into a code example today that makes perfect sense to use goto.

I have an IP where I need to check if is within a list of IPs and then proceed with the code, otherwise throw an exception.

<?php

$ip = '192.168.1.5';
$ips = [
    '192.168.1.3',
    '192.168.1.4',
    '192.168.1.5',
];

foreach ($ips as $i) {
    if ($ip === $i) {
        goto allowed;
    }
}

throw new Exception('Not allowed');

allowed:

...

If I don’t use goto then I have to use some variable like

$allowed = false;

foreach ($ips as $i) {
    if ($ip === $i) {
        $allowed = true;
        break;
    }
}

if (!$allowed) {
    throw new Exception('Not allowed');
}

My question is what’s so bad with goto when it’s used for such obvious and imo relevant cases?

1

GOTO itself is not an immediate problem, it’s the implicit state machines that people tend to implement with it. In your case, you want code that checks whether the IP address is in the list of allowed addresses, hence

if (!contains($ips, $ip)) throw new Exception('Not allowed');

so your code wants to check a condition. The algorithm to implement this check should be of no concern here, in the mental space of your main program the check is atomic. That’s how it should be.

But if you put the code that does the check into your main program, you lose that. You introduce mutable state, either explicitly:

$list_contains_ip = undef;        # STATE: we don't know yet

foreach ($ips as $i) {
  if ($ip === $i) {
      $list_contains_ip = true;   # STATE: positive
      break;
  }
                                  # STATE: we still don't know yet, huh?                                                          
                                  # Well, then...
  $list_contains_ip = false;      # STATE: negative
}

if (!$list_contains_ip) {
  throw new Exception('Not allowed');
}

where $list_contains_ip is your only state variable, or implicitly:

                             # STATE: unknown
foreach ($ips as $i) {       # What are we checking here anyway?
  if ($ip === $i) {
    goto allowed;            # STATE: positive
  }
                             # STATE: unknown
}
                             # guess this means STATE: negative
throw new Exception('Not allowed');

allowed:                     # Guess we jumped over the trap door

As you see, there’s an undeclared state variable in the GOTO construct. That’s not a problem per se, but these state variables are like pebbles: carrying one is not hard, carrying a bag full of them will make you sweat. Your code will not stay the same: next month you’ll be asked to differentiate between private and public addresses. The month after that, your code will need to support IP ranges. Next year, someone will ask you to support IPv6 addresses. In no time, your code will look like this:

if ($ip =~ /:/) goto IP_V6;
if ($ip =~ ///) goto IP_RANGE;
if ($ip =~ /^10./) goto IP_IS_PRIVATE;

foreach ($ips as $i) { ... }

IP_IS_PRIVATE:
   foreach ($ip_priv as $i) { ... }

IP_V6:
   foreach ($ipv6 as $i) { ... }

IP_RANGE:
   # i don't even want to know how you'd implement that

ALLOWED:
   # Wait, is this code even correct?
   # There seems to be a bug in here.

And whoever has to debug that code will curse you and your children.

Dijkstra puts it like this:

The unbridled use of the go to statement has as an immediate consequence that it becomes terribly hard to find a meaningful set of coordinates in which to describe the process progress.

And that’s why GOTO is considered harmful.

8

There are some legitimate use cases for GOTO. For example for error handling and cleanup in C or for implementing some forms of state machines. But this is not one of these cases. The second example is more readable IMHO, but even more readable would be to extract the loop to a separate function and then return when you find a match. Even better would be (in pseudocode, I don’t know exact syntax):

if (!in_array($ip, $ips)) throw new Exception('Not allowed');

So what is so bad about GOTO‘s? Structured programming uses functions and control structures to organize the code so the syntactic structure reflects the logical structure. If something is only conditionally executed, it will appear in a conditional statement block. If something is executed in a loop, it will appear in a loop block. GOTO enables you to circumvent the syntactic structure by jumping around arbitrarily, thereby making the code much harder to follow.

Of course if you have no other choice you use GOTO, but if the same effect can be achieved with functions and control structures, it is preferable.

21

As others have said, the problem isn’t with the goto itself; the problem is with how people use goto, and how it can make code harder to understand and maintain.

Assume the following snippet of code:

       i = 4;
label: printf( "%dn", i );

What value gets printed for i? When does it get printed? Until you account for every instance of goto label in your function, you can’t know. The simple presence of that label destroys your ability to debug code by simple inspection. For small functions with one or two branches, not much of a problem. For not-small functions…

Way back in the early ’90s we were given a pile of C code that drove a 3d graphical display and told to make it run faster. It was only about 5000 lines of code, but all of it was in main, and the author used about 15 or so gotos branching in both directions. This was bad code to begin with, but the presence of those gotos made it so much worse. It took my co-worker about 2 weeks to puzzle out the flow of control. Even better, those gotos resulted in code so tightly coupled with itself that we could not make any changes without breaking something.

We tried compiling with level 1 optimization, and the compiler ate up all available RAM, then all available swap, and then panicked the system (which probably had nothing to do with the gotos themselves, but I like throwing that anecdote out there).

In the end, we gave the customer two options – let us rewrite the whole thing from scratch, or buy faster hardware.

They bought faster hardware.

Bode’s rules for using goto:

  1. Branch forward only;
  2. Do not bypass control structures (i.e., do not branch into the body of an if or for or while statement);
  3. Do not use goto in place of a control structure

There are cases where a goto is the right answer, but they are rare (breaking out of a deeply nested loop is about the only place I’d use it).

EDIT

Expanding on that last statement, here’s one of the few valid use cases for goto. Assume we have the following function:

T ***myalloc( size_t N, size_t M, size_t P )
{
  size_t i, j, k;

  T ***arr = malloc( sizeof *arr * N );
  for ( i = 0; i < N; i ++ )
  {
    arr[i] = malloc( sizeof *arr[i] * M );
    for ( j = 0; j < M; j++ )
    {
      arr[i][j] = malloc( sizeof *arr[i][j] * P );
      for ( k = 0; k < P; k++ )
        arr[i][j][k] = initial_value();
    }
  }
  return arr;
}

Now, we have a problem – what if one of the malloc calls fails midway through? Unlikely an event as that may be, we don’t want to return a partially allocated array, nor do we want to just bail out of the function with an error; we want to clean up after ourselves and deallocate any partially allocated memory. In a language that throws an exception on a bad alloc, that’s fairly straightforward – you just write an exception handler to free up what’s already been allocated.

In C, you don’t have structured exception handling; you have to check the return value of each malloc call and take the appropriate action.

T ***myalloc( size_t N, size_t M, size_t P )
{
  size_t i, j, k;

  T ***arr = malloc( sizeof *arr * N );
  if ( arr )
  {
    for ( i = 0; i < N; i ++ )
    {
      if ( !(arr[i] = malloc( sizeof *arr[i] * M )) )
        goto cleanup_1;

      for ( j = 0; j < M; j++ )
      {
        if ( !(arr[i][j] = malloc( sizeof *arr[i][j] * P )) )
          goto cleanup_2;

        for ( k = 0; k < P; k++ )
          arr[i][j][k] = initial_value();
      }
    }
  }
  goto done;

  cleanup_2:
    // We failed while allocating arr[i][j]; clean up the previously allocated arr[i][j]
    while ( j-- )
      free( arr[i][j] );
    free( arr[i] );
    // fall through

  cleanup_1:
    // We failed while allocating arr[i]; free up all previously allocated arr[i][j]
    while ( i-- )
    {
      for ( j = 0; j < M; j++ )
        free( arr[i][j] );
      free( arr[i] );
    }

    free( arr );
    arr = NULL;

  done:
    return arr;
}

Can we do this without using goto? Of course we can – it just requires a little extra bookkeeping (and, in practice, that’s the path I’d take). But, if you’re looking for places where using a goto isn’t immediately a sign of bad practice or design, this is one of the few.

6

return, break, continue and throw/catch are all essentially gotos–they all transfer control to another piece of code and could all be implemented with gotos–in fact I did so once in a school project, a PASCAL instructor was saying how much better Pascal was than basic because of the structure…so I had to be contrary…

The most important thing about Software Engineering (I’m going to use this term over Coding to refer to a situation where you are being paid by someone to create a codebase together with other engineers that requires ongoing improvement and maintenance) is making code Readable–getting it to do something is almost secondary. Your code will be written only once but, in most cases, people will spend days and weeks revisiting/relearning, improving and fixing it–and every time they (or you) will have to start from scratch and try to remember/figure out your code.

Most of the features that have been added to languages over the years are to make software more maintainable, not easier to write (although some languages go in that direction–they often cause long-term problems…).

Compared to similar flow control statements, GOTOs can be nearly as easy to follow at their best (A single goto used in a case like you suggest), and a nightmare when abused–and are very easily abused…

So after dealing with spaghetti nightmares for a few years we just said “No”, as a community we are not going to accept this–too many people mess it up if given a little leeway–that’s really the only problem with them. You could use them… but even if it’s the perfect case, the next guy will assume you are a terrible programmer because you don’t understand the history of the community.

Many other structures have been developed just to make your code more comprehendible: Functions, Objects, Scoping, Encapsulation, Comments(!)… as well as the more important patterns/processes like “DRY” (preventing duplication) and “YAGNI” (Reducing over-generalization/complication of code)–all really only import for the NEXT guy to read your code (Who will probably be you–after you’ve forgotten most of what you did in the first place!)

6

GOTO is a tool. It can be used for good or for evil.

In the bad old days, with FORTRAN and BASIC, it was almost the only tool.

When looking at code from those days, when you see a GOTO you have to figure out why it is there. It can be part of a standard idiom that you can understand quickly… or it can be part of some nightmarish control structure that should never have been. You don’t know until you have looked, and it is easy to be mistaken.

People wanted something better, and more advanced control structures was invented. These covered most of the use cases, and people who were burned by bad GOTOs wanted to completely ban them.

Ironically, GOTO isn’t so bad when it is rare. When you see one, you know there is something special going on, and it is easy to find the corresponding label since it is the only label nearby.

Fast forward to today. You are a lecturer teaching programming. You could say “In most cases you should use the advanced new constructs, but in some cases a simple GOTO can be more readable.” Students are not going to understand that. They are going to abuse GOTO to make unreadable code.

Instead you say “GOTO bad. GOTO evil. GOTO fail exam.” Students will understand that!

1

With the exception of goto, all flow constructs in PHP (and most languages) are scoped hierarchically.

Imagine some code examined through squinted eyes:

a;
foo {
    b;
}
c;

Regardless of what control construct foo is (if, while, etc.), there are only certain allowed orders for a, b, and c.

You could have abc, or ac, or even abbbc. But you could never have bc or abac.

…unless you have goto.

$a = 1;
first:
echo 'a';
if ($a === 1) {
    echo 'b';
    $a = 2;
    goto first;
}
echo 'c'; 

goto (in particular backwards goto) can be troublesome enough that it’s best to just leave it alone, and used hierarchical, blocked flow constructs.

gotos have a place, but mostly as micro-optimizations in low-level languages. IMO, there’s no good place for it in PHP.


FYI, the example code can be written even better than either of your suggestions.

if(!in_array($ip, $ips, true)) {
    throw new Exception('Not allowed');
}

1

In low level languages GOTO is inevitable. But in high level it should be avoided (in the case the language supports it) because it makes programs more difficult to read.

Everything boils down to making the code more difficult to read. High level languages are supossedt o make code easier to read than low level languages like, say, assembler or C.

GOTO doesn’t cause global warming nor it causes poverty in the third world. It just makes code more difficult to read.

Most modern languages have control structures that make GOTO unnecessary. Some like Java don’t even have it.

In fact, the term spaguetti code comes from convoluted, difficult to follow code causes by unstructured branching structures.

7

Nothing wrong with goto statements themselves. The wrongs are with some of the people that inappropriately use the statement.

In addition to what JacquesB said (error handling in C), you are using goto to exit a non-nested loop, something that you can do by using break. In this case you better use break.

But if you had a nested loop scenario, then using goto would be more elegant/simpler.

Bonous point: if your list of IPs is small, your method is fine. But if the list grows, know that your approach has an asymptotic worst run-time complexity of O(n). As your list grows, you may wish to use a different method that achieves O(log n) (such as a tree structure) or O(1) (a hash table with no collisions).

7

With goto I can write faster code!

True. Don’t care.

Goto exists in assembly! They just call it jmp.

True. Don’t care.

Goto solves problems more simply.

True. Don’t care.

In the hands of a disciplined developer code that uses goto can be easier to read.

True. However, I’ve been that disciplined coder. I’ve seen what happens to code over time. Goto starts out fine. Then the urge to reuse code sets in. Fairly soon I find myself at a breakpoint having no damn clue what’s going on even after looking at program state. Goto makes it hard to reason about code. We’ve worked really hard creating while, do while, for, for each switch, subroutines, functions, and more all because doing this stuff with if and goto is hard on the brain.

So no. We don’t want to look at goto. Sure it’s alive and well in the binary but we don’t need to see that in source. In fact, if is starting to look a little shaky.

15

Assembly languages typically have only conditional/unconditional jumps (the equivalent of GOTO. Older implementations of FORTRAN and BASIC had no control block statements beyond a counted iteration (the DO loop), leaving all other control flow to IFs and GOTOs. The DO loop in these languages was terminated by a numerically labeled statement. As a result, code written for these languages could be, and often was, hard to follow and prone to mistakes.

To underscore the point, there is the facetiously invented “COME FROM” statement.

There is practically no need to use GOTO in languages like C, C++, C#, PASCAL, Java, etc.; alternative constructions can be used which will almost certainly be just as efficient and far more maintainable. It’s true that one GOTO in a source file will not be a problem. The problem is that it doesn’t take many to make a unit of code difficult to follow and error-prone to maintain. That’s why the accepted wisdom is to avoid GOTO whenever possible.

This wikipedia article on the goto statement might be helpful

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *