Immutable String and Integer in Java: What is the point if assignment in effect changes the value?

If immutability is “good” and yet you can in effect change the value in an Integer or String variable (never mind that you get a new reference — the value has changed) what good is it that Integer and String are immutable?

If Integer were mutable, what sort of bugs would be harder to find (etc.) than in the case that Integer is immutable? Or with String?

5

never mind that you get a new reference

No! Do mind that fact – it is the key in understanding the point of immutable objects.

— the value has changed

No, it hasn’t. You have a different object with a different value in this place in the code.

But any other part of the code which had a reference to the original object still has that reference to that object with the original value.

Immutability is good because it prevents you from making a change to an object and having that change affect a completely different part of the code which wasn’t written with the possibility in mind that the objects it operates on could be changed somewhere else (very little code is really written to cope with that).

This is particularly useful with multithreaded code (where a change done by a different thread could happen in between the operations of a single line of code), but even single-threaded code is much easier to understand when methods you call can’t change the objects you pass into them.

18

I think an example can clarify things here a lot (I use C#, but the actual differences to Java are of minor importance for this question). Let us think of a function which does some string formatting:

string MyStringFormat(string s)
{ 
    s=s.Trim();
    s=s.Replace(",", ".");
    return s;
}

Now, some calling code uses this function:

string test = "  123,45";
Console.Writeline(MyStringFormat(test));
Console.Writeline(test);

This will give the output

123.45
  123,45

So MyStringFormat neither changes the content of the variable test, nor the string to which test refers to, though it replaces the local reference s by a reference to a new string which has a different value. MyStringFormat is not able to change anything here, because of the immutability of string objects. Even if the code is refactored, maintained and evolved, as long as noone change the signature of that function, it won’t become able to change test, neither intentionally, nor accidentally, nor by calling some other function from some 3rd party lib outside of the control of the dev who maintains MyStringFormat.

That would be pretty different if a string object would provide mutating operations, like a mutating Trim or Replace operation. If a string type would provide such methods, callers could never rely on the called function not to change the value, and without any additional means, that would open the door for all kinds of nasty bugs. Imagine, for example, if there were mutating operations for strings, lets call them TrimInplace and ReplaceInplace. Now some “clever” dev thinks “hey, inplace operations are probably more memory efficient, so lets optimize the code a bit by using the mutating variants”:

string MyStringFormat(string s)
{ 
    s.TrimInplace();
    s.ReplaceInplace(",", ".");
    return s;
}

And now assume this code to be called from an UI layer where some string is taken from a text box, passed through 10 layers of intermediate calls through a lot of UI logic, controller logic and business logic until it ends in the function above – I guess you can imagine what will happen with the program, and how much debugging effort such a change may cause when the UI code suddenly stops working as intended.

To be sure not to get any unwanted side effect, most calls to MyStringFormat had to make a defensive copy of the original string, for example like this:

  Console.Writeline(MyStringFormat(test.Clone()));

This results in extra memory usage and extra CPU cycles, and it is error prone since the caller can easily forget that this kind of copy is required.

With immutability, however, the caller can 100% rely on the called functions that they don’t mess around with the values of the passed arguments, without such a copy. Without immutability, in a real-world application with several layers of abstraction, such defensive copies must be made on most of the intermediate layers to get the same level of confidence.

3

Mentioned or hinted at by the other answers, but not clearly written out to my liking yet, is that having String, along with Integer and friends be immutable in Java makes them act like primitive values.

For int and other primitive types, there is no ambiguity or complication in Java for the following code:

int x = 1;
f(x);
g(x);

This code will always pass 1 to g(), no matter what f() does with its variable x.

By making Integer immutable, the same thing is guaranteed even if we define x as Integer x and f() takes a parameter of type Integer. Thus, Integer and int variables and values work in approximately the same way.

This was a design decision; of course there are uses for something like a mutable Integer, and in theory they could made Integer be that kind of object. But the decision to make Integer and int act the same is a good one, because in Java, Integers are supposed to substitute for ints wherever they need to also be Objects. It’s there for compatibility reasons that can come up in code.

If for some (wacky?) reason I want to write code like the following:

Integer x = 1;
int y = 2;
f(new Object[] { x, "and", y });

Once again, neither x or y can be changed by f(), so they’re acting the same. (If Integers were mutable, x could be changed, but y not.)

Finally, since Integer and friends are there for compatibility reasons to act similar to int values, it’s nice to not have to constantly think if you meant to write x = 5 or x.setValue(5); you don’t have this option to think about for ints, so not for Integers either.

Having String be immutable likewise provides you with an immutable String as an option, for all the situations where it’s nice for it to be passed by immutable value.

2

The issue is shared mutable state. There are two ways to avoid that:

If you share memory then don’t let it mutate. Immutable objects follow this.

If you mutate memory then don’t share it. Primitives, like int, and references follow this. They only copy the value in the memory. They don’t share the memory location. They keep that private.

Why?

If two separate pieces of code both have access to the same memory one can change the value without the other knowing about it. This makes it hard for humans to reason about code when reading it. Even when threads aren’t involved.

One way to avoid that is to keep your own defensive copy of the value in your own private memory. That’s how primitives likes int’s are usually dealt with in languages that pass them by value. The value can be shared because changes only affect your own copy.

Another way is immutable objects that exist in one place in memory, take on one state, and can’t be changed. String is the most popular example. These can be passed by reference and provide access to the same memory. Since the state can’t be changed you don’t have to worry about other code affecting the memory you’re depending on.

The problem can still crop up though. Pointers have this problem if you share them even when pointing at strings (though pointers aren’t a Java thing). Shared mutable collections (even of strings) still have this problem. This very issue is why iterators are invalidated when a collection is mutated.

Anything that has state, is shared, and is mutable shouldn’t be expected to stay the same as it was the last time you touched it if other code that knows about it has been run. A situation best avoided when you have better things to think about.

Now sure, you can use assignment to change the immutable object that a string variable points to. But that only changes the reference that the variable holds. That reference sits in your own private memory. It works the same way the int does. It’s state. It’s mutable. But it’s not shared. So it’s fine.

2

int is a value type.  It is mutable.  But as a value type it is shared between caller and callee by copy (rather than by reference).  Thus, a callee can change the value (of their copy) without affecting the caller’s value of same.

Integer (the boxed int), and String are a reference type, which means that the target object is shared between caller and callee.  There are two concepts with reference types: the variable that is a reference, and the object.  Because a string object is immutable, a caller will not see any changes made in the callee’s variable, neither the reference variable (which can be modified by the callee, but is a copy of the reference) nor string object (which cannot be modified).

Other objects are also reference types, but not necessarily immutable.  When that’s the case, and a caller shares the object with a callee, the callee can change the object itself and that change will be visible to the caller (should they look or care).  Should the callee change the reference variable only (e.g. to null or another object), such change is not visible to the caller.

Beyond callers and callees, object references can be held in other objects via instance variables, or in class static variables, with similar consequences of the visibility of mutation and sharing.

7

First there is a difference between a Variable, and a referenced Object.

A variable is a slot in a scope for a piece of information. If that slot can never be changed you would have a constant, useful but limiting if everything were constant.

What makes that slot interesting is that the specific information in the slot can be changed. Which allows a function/program to change its behaviour.

Now the problem with slots is that they have to have a predetermined size when the scope they are in is created. That might be determined at compile time, or when the scope is created. But once created the size cannot be made bigger.

This is an issue for interesting types like strings, because we don’t want to allocate 2GB of space for a single character string. We would like to make an 80-character string longer say 85 characters, but if we weren’t prescient enough to allocate extra space we can not store it.

The heap to the rescue. The heap allows us to create slots at runtime of any size we desire. We can allocate exactly the space needed for an 8-character string or even an 80-character string. The new problem is where in the heap this string lives?

References to the rescue. The interesting thing about the heap is that every address is the same length. That means we can create a slot of known length in a known location (a variable) and assign an address to it (a reference).


Data comes in many flavours:

  • Data that is part of the compiled code instructions
  • Data in a register
  • Data in a global segment
  • Data on the stack
  • Data on the heap

Each of these types is subtly different and yet the same, they are after all – all data.

The problem though is that some of these data sources have more unknowns and unpredictable behaviour than other sources.

  • Data in instructions is fully known. The program literally was compiled with it in mind.
  • Data in a register is mostly known, the specific value might not be but where it is is exactly known. With the exception of some special registers, they cannot be changed by external influence, only by the code being run.
  • Data in a global static segment is kind of like a register in that where, and what type it is is known. The problem though is that it’s value might suddenly change due to external influence. This is similar for thread local segments in the presence of strands.
  • Data on the stack is like a register in that only your own code is changing it. The problem though is the exact where has to be calculated on the fly.
  • The heap is just the worst. You have to keep track of what, where, and to top it off that can change suddenly due to external influence.

The strength of immutability is that it takes some of the unknowns found in the heap, and reduces them down to constants. Given that it is known that it won’t change the code when compiled can cache the data with certainty, some operations can be faster because the answer doesn’t change between runs, and it is also more memory efficient one copy/many readers.

Without immutability we could not guarantee tat an answer is still current.

  • How long is the string? When last checked it was 5, but now well it could be anything. Need to check again.
  • Need to copy the string, great start by copying each character. Now done, right? Well what if someone changed the first character before you finished copying the last character. Did you in fact copy the string?

6

Look at a language like Objective-C where you have both immutable and mutable strings, that makes it a lot clearer.

When you assign a string to a variable, you actually assign a reference to a string object. If the string is immutable, printing the variable for example will only ever print different characters if the variable is changed to be a reference to a different string. If the string is mutable, then someone could modify the characters that the object contains, and printing the variable could print different characters without the variable changing.

If integers where not immutable… Would you be every reason of any math expressions – like what 2 * 2 or i = i + 1 mean?

Code in an imaginary language tries to show how hard it would be to work on the code if it is the case

Int two = 2;
  
SomeMethodThatCanMutateIntegers();
print("Two by Two is :")
print( two * two); // 9 
print( 2 * 2); // that actually also print 9...

SomeMethodThatCanMutateIntegers()
{
  2 = 3; // assignment - this "language" allows mutating integers. 
  // now 2 is actually the same value as 3 (which is "3" to start with...
  // unless someone changed 3 to be some other number like 42).
}

Note that it is way more likely to happen in real life if values are interned (same value is represented by single object, often used for string constants). As an example Python interns “small” integer values and if one gets access to that internal structure holding those values they can inadvertently (or maliciously) change 2 to 3.

Presumably FORTRAN IV and FORTRAN 77 would flat out allow such fun – https://everything2.com/title/Changing+the+value+of+5+in+FORTRAN

Literal constants are not put into code inline. Instead, they are allocated an area of memory which is then assigned the value. That is, in FORTRAN, ‘5’ behaves like a variable. An expression containing a literal 5 references the memory area allocated for it.
All parameters to subroutines and functions are passed by reference.

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *