It seems that thread-safety is always/often mentioned as the main benefit of using immutable types and especially collections.
I have a situation where I would like to make sure that a method will not modify a dictionary of strings (which are immutable in C#). I’d like to constrain things as much as possible.
However I am not sure whether adding a dependency to a new package (Microsoft Immutable Collections) is worth it. Performance is not a big issue either.
So I guess my question is whether immutable collections are strongly advised for when there are no hard performance requirements and there are no thread safety issues? Consider that value semantics (as in my example) might or might not be a requirement as well.
9
Immutability simplifies the amount of information you need to track mentally when reading code later. For mutable variables, and especially mutable class members, it’s very hard to know what state they will be in at the specific line you’re reading about, without running through the code with a debugger. Immutable data is easy to reason about – it will always be the same. If you want to change it, you need to make a new value.
I would honestly prefer making things immutable by default, and then changing them to mutable where it’s proven that they need to be, whether this means you need the performance, or an algorithm you have doesn’t make sense for immutability.
37
Your code should express your intention. If you don’t want an object to be modified once created, make it impossible to modify.
Immutability has several benefits:
-
The intention of the original author is expressed better.
How would you be able to know that in the following code, modifying the name would cause the application to generate an exception somewhere later?
public class Product { public string Name { get; set; } ... }
-
It’s easier to make sure that the object wouldn’t appear in an invalid state.
You have to control this in a constructor, and only there. On the other hand, if you have a bunch of setters and methods which modify the object, such controls may become particularly difficult, especially when, for example, two fields should change at the same time in order for the object to be valid.
For example, an object is valid if the address is not
null
or GPS coordinates are notnull
, but it’s invalid if both the address and the GPS coordinates are specified. Can you imagine the hell to validate this if both the address and GPS coordinates have a setter, or both are mutable? -
Concurrency.
By the way, in your case, you don’t need any third party packages. .NET Framework already includes a ReadOnlyDictionary<TKey, TValue>
class.
7
There are many single-threaded reasons to use immutability. For example
Object A contains Object B.
External code queries your Object B and you return it.
Now you have three possible situations:
- B is immutable, no problem.
- B is mutable, you make a defensive copy and return that. Performance hit but no risk.
- B is mutable, you return it.
In the third case the user code may not realize what you have done and may make changes to the object, and by so doing change the internal data of your object with you having no control or visibility of that happening.
Immutability can also greatly simplify the implementation of garbage collectors. From GHC’s wiki:
[…]
Data immutability forces us to produce a lot of temporary data but it also helps to collect this garbage rapidly. The trick is that immutable data NEVER points to younger values. Indeed, younger values don’t yet exist at the time when an old value is created, so it cannot be pointed to from scratch. And since values are never modified, neither can it be pointed to later. This is the key property of immutable data.This greatly simplifies garbage collection (GC). At anytime we can scan the last values created and free those that are not pointed to from the same set (of course, real roots of live values hierarchy are live in the stack).
[…]
So it has a counter-intuitive behavior: the larger percent of your values are garbage – the faster it works.
[…]
Expanding on what KChaloux summarised very well…
Ideally, you have two types of fields, and hence two types of code using them. Either fields are immutable, and code doesn’t have to take mutability into account; or fields are mutable, and we need to write code that either takes a snapshot (int x = p.x
) or gracefully handles such changes.
In my experience, most code falls in between the two, being optimistic code: it freely references mutable data, assuming that the first call to p.x
will have the same result as the second call. And most of the time, this is true, except when it turns out it no longer is. Oops.
So, really, turn that question around: What are my reasons for making this mutable?
- Reducing memory alloc/free?
- Mutable by nature? (e.g. counter)
- Saves modifiers, horizontal noise? (const/final)
- Makes some code shorter/easier? (init default, possibly overwrite after)
Do you write defensive code? Immutability will save you some copying around. Do you write optimistic code? Immutability will spare you the insanity of that weird, impossible bug.
Another benefit of immutability is that it is the first step in rounding up these immutable objects into a pool. Then you can manage them so that you don’t create multiple objects that conceptually and semantically represent the same thing. A good example would be Java’s String.
It is a well-known phenomenon in linguistic that a few words appear a lot, might appear in other context as well. So instead of creating multiple String
object, you can use one immutable. But then you need to keep a pool manager to take care of these immutable objects.
This will save you lot of memory.
This is an interesting article to read as well:
http://en.wikipedia.org/wiki/Zipf%27s_law
There are some nice examples here but I wanted to jump in with some personal ones where immutability helped a ton. In my case, I started off designing an immutable concurrent data structure mainly with the hopes of just being able to confidently run code in parallel with overlapping reads and writes and not having to worry about race conditions. There was a talk John Carmack gave that kind of inspired me to do it where he talked about such an idea. It’s a pretty basic structure and pretty trivial to implement like this:
Of course with a few more bells and whistles to it like being able to remove elements in constant time and leave reclaimable holes behind and have blocks get derefed if they become empty and potentially freed for a given immutable instance. But basically to modify the structure, you modify a “transient” version and atomically commit the changes you made to it to get a new immutable copy that doesn’t touch the old one, with the new version only creating new copies of the blocks that have to be made unique while shallow copying and reference counting the others.
However, I didn’t find it that useful for multithreading purposes. After all, there’s still the conceptual problem where, say, a physics system applies physics concurrently while a player is trying to move elements around in a world. Which immutable copy of the transformed data do you go with, the one the player transformed or the one the physics system transformed? So I haven’t really found a nice and simple solution to this basic conceptual problem except to have mutable data structures which just lock in a smarter way and discourage overlapping reads and writes to the same sections of the buffer to avoid stalling threads. That’s something John Carmack seems to have possibly figured out how to solve in his games; at least he talks about it like he can almost see a solution without opening up a car of worms. I haven’t gotten as far as him in that regard. All I can see are endless design questions if I tried to just parallelize everything around immutables. I wish I could spend a day picking at his brain since most of my efforts started off with those ideas he threw out.
Nevertheless, I found enormous value of this immutable data structure in other areas. I even use it now to store images which is really weird and does make random-access require some more instructions (right shift and a bitwise and
along with a layer of pointer indirection), but I’ll cover the benefits below.
Undo System
One of the most immediate places I found to benefit from this was the undo system. Undo system code used to be one of the most error-prone things in my area (visual FX industry), and not just in the products I worked on but in competing products (their undo systems were also flaky) because there were so many different types of data to worry about undoing and redoing properly (property system, mesh data changes, shader changes that weren’t property-based like swapping one with the other, scene hierarchy changes like changing the parent of a child, image/texture changes, etc. etc. etc.).
So the amount of undo code required was enormous, often rivaling the amount of code implementing the system for which the undo system had to record state changes. By leaning on this data structure, I was able to get the undo system down to just this:
on user operation:
copy entire application state to undo entry
perform operation
on undo/redo:
swap application state with undo entry
Normally that code above would be enormously inefficient when your scene data spans gigabytes to copy it in entirety. But this data structure only shallow copies things that weren’t changed, and it actually made it cheap enough store an immutable copy of the entire application state. So now I can implement undo systems as easily as the code above and just focus on using this immutable data structure to make copying unchanged portions of the application state cheaper and cheaper and cheaper. Since I started using this data structure, all my personal projects have undo systems just using this simple pattern.
Now there is still some overhead here. Last time I measured it was around 10 kilobytes just to shallow copy the entire application state without making any changes to it (this is independent of scene complexity since the scene is arranged in a hierarchy, so if nothing below the root changes, only the root is shallow copied without having to descend down into the children). That’s far from 0 bytes as would be needed for an undo system only storing deltas. But at 10 kilobytes of undo overhead per operation, that’s still only a megabyte per 100 user operations. Plus I could still potentially squash that down further in the future if needed.
Exception-Safety
Exception-safety with a complex application is no trivial matter. However, when your application state is immutable and you’re only using transient objects to try to commit atomic change transactions, then it’s inherently exception-safe since if any part of the code throws, the transient is tossed away before giving a new immutable copy. So that trivializes one of the hardest things I’ve always found to get right in a complex C++ codebase.
Too many people often just use RAII-conforming resources in C++ and think that’s enough to be exception-safe. Often it isn’t, since a function can generally cause side effects to states beyond the ones local to its scope. You generally need to start dealing with scope guards and sophisticated rollback logic in those cases. This data structure made it so I often don’t need to bother with that since the functions aren’t causing side effects. They’re returning transformed immutable copies of application state instead of transforming the application’s state.
Non-Destructive Editing
Non-destructive editing is basically layering/stacking/connecting operations together without touching the original user’s data (just input data and output data without touching input). It’s typically trivial to implement with a simple image application like Photoshop and might not benefit that much from this data structure since many operations might just want to transform every pixel of the entire image.
However, with non-destructive mesh editing, for example, a lot of operations often want to transform only a part of the mesh. One operation might just want to move some vertices here. Another might just want to subdivide some polygons there. Here the immutable data structure helps a ton in avoiding the need to have to make an entire copy of the entire mesh just to return a new version of the mesh with a small part of it changed.
Minimizing Side Effects
With these structures in hand, it also makes it easy to write functions that minimize side effects without incurring a huge performance penalty for it. I’ve found myself writing more and more functions which just return whole immutable data structures by value these days without incurring side effects, even when that seems a bit wasteful.
For example, typically the temptation to transform a bunch of positions might be to accept a matrix and a list of objects and transform those in a mutable fashion. These days I find myself just returning a new list of objects.
When you have more functions like this in your system that cause no side effects, it definitely makes it easier to reason about the correctness of it as well as test its correctness.
The Benefits of Cheap Copies
So anyway, these are the areas where I found the most use out of immutable data structures (or persistent data structures). I also got a bit overzealous initially and made an immutable tree and immutable linked list and immutable hash table, but over time I rarely found as much use for those. I mainly found the most use of the chunky immutable array-like container in the diagram above.
I also still have a lot of code working with mutables (find it a practical necessity at least for low-level code), but the main application state is an immutable hierarchy, drilling down from an immutable scene to immutable components inside of it. Some of the cheaper components are still copied in full, but the most expensive ones like meshes and images use the immutable structure to allow those partial cheap copies of only the parts that needed to be transformed.
In Java, C#, and other similar languages, class-type fields may be used either to identify objects, or to encapsulate values or state in those objects, but the languages make no distinction between such usages. Suppose a class object George
has a field of type char[] chars;
. That field may encapsulate a character sequence in either:
-
An array which will never be modified, nor exposed to any code that might modify it, but to which outside references may exist.
-
An array to which no outside references exist, but which George may modify freely.
-
An array which is owned by George, but to which there may exist outside views which would be expect to represent George’s current state.
Additionally, the variable may, instead of encapsulating a character sequence, encapsulate a live view to a character sequence owned by some other object
If chars
presently encapsulates the character sequence [w i n d], and George wants chars
to encapsulate the character sequence [w a n d], there are a number of things George could do:
A. Construct a new array containing the characters [w a n d] and change chars
to identify that array rather than the old one.
B. Identify somehow a pre-existing character array which will always hold the characters [w a n d] and change chars
to identify that array rather than the old one.
C. Change the second character of the array identified by chars
to an a
.
In case 1, (A) and (B) are safe ways to achieve the desired result. In case (2), (A) and (C) are safe, but (B) would not be [it wouldn’t cause immediate problems, but since George would assume that it has ownership of the array, it would assume that it could change the array at will]. In case (3), choices (A) and (B) would break any outside views, and thus only choice (C) is correct. Thus, knowing how to modify the character sequence encapsulated by the field requires knowing which semantic type of field it is.
If instead of using a field of type char[]
, which encapsulates a potentially-mutable character sequence, code has used type String
, which encapsulates an immutable character sequence, all of the above issues go away. All fields of type String
encapsulate a sequence of characters using a sharable object that will never change. Consequently, if a field of type String
encapsulates “wind”, the only way to make it encapsulate “wand” is to make it identify different object–one which holds “wand”. In cases where code holds the only reference to the object, mutating the object may be more efficient than creating a new one, but any time a class is mutable it is necessary to distinguish among the different ways by which it can encapsulate value. Personally I think Apps Hungarian should have been used for this (I would consider the four uses of char[]
to be semantically-distinct types, even though the type system considers them identical–exactly the sort of situation where Apps Hungarian shines), but since it wasn’t the easiest way to avoid such ambiguities is to design immutable types that only encapsulate values one way.
1
There is a lot of good answers already. This is just an extra info somewhat specific to .NET. I was digging through old .NET blog posts and found a nice sum-up of
benefits from the point of view of the Microsoft Immutable Collections developers:
Snapshot semantics, allowing you to share your collections in a way
that the receiver can count on never changing.Implicit thread-safety in multi-threaded applications (no locks
required to access collections).Any time you have a class member that accepts or returns a
collection type and you want to include read-only semantics in the
contract.Functional programming friendly.
Allow modification of a collection during enumeration, while
ensuring original collection does not change.They implement the same IReadOnly* interfaces that your code already
deals with so migration is easy.If someone hands you a ReadOnlyCollection, an IReadOnlyList, or
an IEnumerable the only guarantee is that you can’t change the data
– there is no guarantee that the person that handed you the collection
won’t change it. Yet, you often need some confidence that it won’t
change. These types don’t offer events to notify you when their
contents change, and if they do change, might that occur on a
different thread, possibly while you’re enumerating its contents? Such
behavior would lead to data corruption and/or random exceptions in
your application.