0

Semver specifies to update the major version on a backwards incompatible change. Would modifying the behavior for the hash function of a custom type fall under this category?

I asked a couple friends, and they mostly generally agreed that it shouldn’t. For using data structures like a hashmap or hashset, they behavior would be identical (other than potentially slight performance differences depending on the distribution of the hash function). The majority of people would be able to swap the old hash function for the new one and never tell the difference.

The arguments I’ve heard that it should be a breaking change generally depend on abusing the hash function. For example, in a unit test, hard coding the expected hash of an object. The one valid argument I’ve heard on this side is with serialization and storing the hash as a string. A user might construct a hashset/hashmap and serialize it to disk, then expect to be able to deserialize. In the case of different hash functions, the deserialized hashmap would be in an invalid state.

I couldn’t find much discussions online about this, and I’m curious to hear others’ opinion.

4

12

IMHO this depends on the guarantees and constraints described in the documentation (a.k.a “contract”) of the hash function, which should exists when it is part of your custom’s type public API. From SemVer.org, sem ver spec bullet point #1:

  1. Software using Semantic Versioning MUST declare a public API. This API could be declared in the code itself or exist strictly in documentation. However it is done, it SHOULD be precise and comprehensive.

(emphasis mine).

When the documentation says the hash function may be subject to change, then changing it does not count as “breaking”. If the documentation says the hash function will not be changed within a major version, then it will be a breaking change.

Of course, when the APIs documentation does not say anything about this, and you don’t know exactly who is using your API and for what purpose, you better play it safe and increase the major version number to signal users this might be breaking change for them (even it is not). However, you can also use this situation and extend the missing parts of the API docs and tell users that from now on they shall not rely on this hash function to be stable any more in the future.

7

The answer to this question is the same as the answer to every other question about semantic versioning. Well, it’s not really an answer, but a counter-question: what are you promising?

Semantic versioning is all about what you are promising in your public API. If you break a promise that you are making in your public API, then it is a breaking change. Simple as that.

So, what is it that you are promising in your public API? And does the change in hash function have any effect on any of those promises?

Unfortunately, it is not always quite this simple. Maybe some of your users are interpreting your documentation as making an implicit guarantee that you are not aware of? Maybe your documentation is not specifying some specific property and your users are assuming that the behavior of the current hash function is the specification of that property? (A quite well-known example is that the Python specification does not guarantee anywhere that finalizers are called immediately when an object goes out of scope, and Guido van Rossum has even explicitly said that Python implementors are not required to implement deterministic finalization and that user code which relies on deterministic finalization is broken, and yet, there is code out there which relies on deterministic finalization and thus will never work on PyPy, IronPython, Jython, or TrufflePython, or if CPython ever moves away from Reference Counting.)

So, you also have to be aware of the promises that you are not making, but your users perceive you to be making. Because, in the end, it doesn’t matter that you were right if your users walk away because you broke their code. That is, for example, why .NET’s random function will forever remain broken and will never be fixed: someone, somewhere, sometime, wrote some code that relies on its current broken behavior.

1

4

It might be.

If the details are part of the contract, it unambiguously is.

If users depend on it to stay the same, it probably is.
Welcome to the hell of back-compat for non-readers.
Or you have bad docs and it’s all your own fault the users had to guess.

But using a different example, changing a sorting-algorithm which is not documented as stable without any additional guarantees to potentially return a different sorted order, while possibly inconvenient for those who program by trial-and-error, is generally considered fine.

2

It seems to me that this would count as a breaking change.

The result of the hash is exposed to code using your library. If the program uses this value, in some use cases, then the program would break if it upgraded the version of the library.

Maybe a more obvious case would be say EmphasiseText() v1 returns italic, v2 returns bold. For a single run of any application it won’t break. But the results generated by the program will be unexpectedly different and may be considered “incorrect” when comparing with the old result.

0

tl;dr It’s a breaking change iff it can break a workflow of a user who relied only on defined behavior.


In principle, just about any change can break a user’s workflow.

Examples:

  1. A user might make write a script that looks for a specific color, and any slight change might break their script.

  2. A user might compare numbers generated by an app to look for differences. Then, a slight difference in how a floating-point estimation would be made could cause a false-positive.

Likewise, if a user’s workflow would depend on a specific hash-function, then changing it would be a “breaking change“, for them.

However, if that user was relying on undefined-behavior, then we might say that their workflow was already broken.

So, generally, a change is “breaking” if it would change its defined-behavior.

0

MacOS and iOS guarantee that hash functions built in a certain way will return different hash values when you restart the application. Which means you can’t persist hash values beyond the application running. This is obviously not considered a “breaking change”, so no change of a hash function is considered a breaking change.

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *