The C++ “standard” method for serializing and deserializing a data type is to use streams with the insertion (<<) and extraction (>>) operators. This has some flaws, but it does neatly allow you to use the same semantics to write anything to/read anything from a stream simply by overloading the operators.

However, I see one major problem (as with many deserialization frameworks). How can you preserve the invariants of complex types?

Let’s take the following simple type as an example:

class MyType
{
public:
    MyType(int u):
        underlying_(u)
    {
        if (u > 10) throw std::invalid_argument{};
    }

    auto value() const { return _underlying; }

private:
    int underlying_;
};

Essentially, this class defines a type which, if it exists, it is guaranteed to have a “value” less than 10. In other words, it is not possible to construct the type with an invalid value.

The insertion operator is simple to implement:

std::ostream& operator<<(std::ostream& os, const MyType& v)
{
    os << v.value();
    return os;
}

However, I do not understand how to write the extraction operator:

std::istream& operator>>(std::istream& is, MyType& v)
{
    ????
}

There are two problems here:

  1. The extraction operator assumes an instance of MyType already exists into which the data is read
  2. Simply reading the value directly into _underlying could break the invariants

The “solution” to (1) is usually to introduce a default constructor, but this breaks the whole idea of the class invariants. I want an instance of the class to not exist unless it has a valid value.

To solve (2), I’ve seen the operators being added as friend to the class, but this introduces the potential for invariant breaking (essentially, the stream is directly modifying the internals of the type).

What are some possible solutions to these problems? Is there a “standard” solution?

As an aside, an even better example would be a enum class. An enum class is a strong type with a set of valid values and invalid values. Since the enum is implemented as a primitive data type, it could store any value that the underlying type allows, but only a subset of these values are allowed. When used properly, the compiler prevents you from setting an invalid value. However, deserializing directly into an enum (perhaps by using a cast) would break this checking (i.e., break the invariants).

3

The “solution” to (1) is usually to introduce a default constructor, but this breaks the whole idea of the class invariants. I want an instance of the class to not exist unless it has a valid value.

Who says a default contructor cannot create an instance of the class with a valid value? In most real world cases I have seen, it wasn’t hard to find a more or less sensible “default” value, regardless of any invariants. And when that’s not possible, “>>” is probably the wrong tool.

Simply reading the value directly into _underlying could break the invariants

What about

std::istream& operator>>(std::istream& is, MyType& v)
{
    int u;
    is >> u;
    MyType tmp(u);
    v=tmp;
}

This will make use of the invariant test inside the constructor, throw an exception when it is violated, and when the value is valid, it will be assigned to v. As you see, no friend declaration necessary, only an implicit or explicit assignment operator.

Designing deserialisation

It is vital not to confuse means (extraction operator) with objective (deserialization):

  • There are plenty of ways to use extraction operator in a meaningful way to deserialise, including ensuring invariants. But the overload should correspond to the usual C++ idioms.

  • If you don’t want objects to be constructed with a default value before deserialisation, you’d better go for a factory like design: where the factory function uses a stream as parameter and return objects by value. (don’t consider a constructor from istream). Use a factory returning a smart pointer for polymorphic objects.

  • Prefer reusing one of the proven libraries to save you time (boost, cereal and others, as mentioned by Basile in the comments).

The C++ extraction operator

The extraction operator assumes that the input to be extracted from, is a valid representation of the object to be extracted (correct format) AND the value extracted meets all the invariants.

You can design your operator overload in two ways, being understood that the target object must already exist and be a valid MyType:

  • directly altering v, calling setters and other operations to change the state of the object to the desired state. In this case you do not need to be friend, and could restrict yourself to use strictly the public interface.
  • constructing a temporary object and copying (or moving?) it to v. Ideally, you would call a specific constructor and complement it with additional public operations (this is in the spirit of LSP’s history rule). If you chose the friend approach, you have more flexibility, but also more risks.

In both cases, if something goes wrong, the operator should ideally set the failbit (which will cause a throw or not, depending on the settings of the stream). If you throw directly, it’ll work, but some users might be surprised depending how they use the istreams in their code.

Here an example implementation:

std::istream& operator>> (std::istream &is, MyType &v) {
    int in; 
    is>>in; 
    try {
        v = MyType(in);
    } catch(...) {
        is.setstate(std::ios::failbit);;
    }
    return is; 
} 

6