Why use a special “Name” class (instead of just a string) for representing object names in C++?

  softwareengineering

Suppose we have an Instance class in a C++ program, which has a GUID/UUID, name, parents, children, and other properties which can be saved to or loaded from an XML file.

The intuitive approach for representing the name of the Instance is to give it a property of type std::string, char*/char[], or whatever other generic string type is being used. Some programs, however, use a separate Name class as a wrapper around a string. I’ve looked at source code which does this, but from everything I’ve seen, the Name class has no purpose besides providing a mutex, various asserts and other runtime error checking, and convoluted-looking declaration methods (usually a combination of functions like declare() { <declaration code> }, doDeclare() { <some convoluted stuff>; declare() }, and callDoDeclare() { <even more convoluted nonsense>; doDeclare() }.

Is there actually a reason to use a special Name class, or can I just use a regular string property? Should I even be worrying about this so early on in the development of my project?

EDIT: I wasn’t precise enough about the purpose of a Name type in most programs. A Name isn’t as much of a label for an Instance as it is general metadata: for example, a Descriptor class could have a name member, and there could be a Property with a Descriptor where the name contains the string “Name”, and this property named “Name” could be used to represent the name of an Instance. Not to mention that Instances aren’t just encoded and decoded from XML data, but can also be accessed through a scripting API (like in ROBLOX; yes, this question is about ROBLOX, but I didn’t want to bring it up at first).

3

from everything I’ve seen, the Name class has no purpose besides providing a mutex, various asserts and other runtime error checking

This is the reason why — it has behavior associated with it, which is all the more important when you see <some convoluted stuff> and <even more convoluted nonesense>. This convoluted logic has a place to live without it becoming scattered and intermingled with other concerns in your application. This provides opportunities to pass a Name around instead of another more broadly-scoped object in cases where you need the logic for the identifier or name, but not the logic bound to the object it identifies.

Passing a Name around instead of an std::string gets you some compile-time checks that are not possible when just passing around a string. This avoids Stringly-Typed code and Primitive Obsession. When passing a string, is it the name of the kind of object you think it is? The function receiving a Name as a string has no way of knowing this.

Some other questions on this site which are related to this topic:

  • Should we define types for everything?
  • Is coupling with strings “looser” than with class methods?
  • What’s the name of the technique of using very specific types to help catch errors?

2

from everything I’ve seen, the Name class has no purpose besides [a list of various purposes]

Sorry if this comes across as facetious but you rolled from claiming there’s no purpose into listing the actual purposes. It’s possible you’re dismissing these as valid or productive contributions to the type definition, which they most certainly are; but it’s only going to make sense when you understand the goal that they’re trying to achieve.

Have you heard of primitive obsession? It’s the direct answer to your question. For your current example, the specific scenario “replace data value with object” is the one that applies most, though I recommend reading through the entire parent page to build an understanding of this guideline.

One of the main benefits here is that you get compile-time type safety. You can’t just pass any string into this, it has to be a Name, which means that it forces you to explicitly confirm that this string value is indeed a name.
The second main benefit is that having a custom type here allows you to write custom logic to operate on this type. For example, you might want to generate an filepath-safe variant of this name, or you might want to write some equality check that’s broader than just string == string. Having a type enables you to do so in a clear and reusable location.

Should I even be worrying about this so early on in the development of my project?

This is a difficult question to answer when you’re still learning about these guidelines. On the one hand, it would be productive to already account for hard lessons that others have had to learn, without needing to make the mistake and have to recover from it yourself. On the other hand, YAGNI always looms over you, and it’s really easy to get stuck in analysis paralysis if you try to include every guideline from the get go.

So it’s up to you. Would you rather do some research and make a conscious decision to include this guideline? Would you rather blindly follow it on the supposition that it conveys a benefit down the line? Or would you rather not implement something you don’t understand and accept that you might have to learn this the hard way?

There’s no wrong answer, just pick the answer that works best for how you learn things.

Note also that this is why mentoring is so prevalent for people who are learning the ropes, because a mentor is able to judge on the fly which guidelines make the most sense for the current scenario, while balancing both the amount of guidelines to follow and the severity of failing to follow them. If you have access to a person with more experience in this field and who is willing to sanity check your work before you commit to it, that would definitely be helpful to rely on.

the Name class has no purpose besides providing (…)

These are good reasons to introduce a seperate class, but even if Name wouldn’t have any additional behaviour it is still beneficial to have a separate class. Two words: strong typing.

I’ll give you an example. Some time ago I worked on a project that dealt with multiple classes, say A,B,C. Each of those classes had id field, and all those fields had the same int type. So how is that bad? The project revolved around making various sophisticated aggregations and calculations. We would often deal with nested maps like A.id -> [B.id -> [C.id -> (something)]]. So we would group by A.id, then by B.id, then by C.id. This is a simplistic example, in real scenario such nesting could be of depth 10 or even 20. And we would often had errors because wrong id ended up at wrong level. At the time we could only detect this at runtime, and due to the complexity of the process, we often weren’t able to write proper tests. The other team of analytics was able to detect these problems, but then the entire testing process takes very long time.

This could’ve been very easily avoided with strong typing. All we needed to do is to define AId, BId, CId classes and set those ids on A,B,C. And voilà, problem solved. I’ve actually proposed this change, but the decision was that the refactoring of this huge codebase would be too costly.

The conclusion of this story is: we did not need int type. We only needed id per class. That is the real need. Similarly your Name class describes it purpose, it is irrelevant how it is implemented under the hood. Purpose. It matters more than concrete representation.

There are other benefits of such design. Say I decide one day I want to change the underlying type of my id field to long or uuid, or string, or whatever. If designed correctly this might mean changing a single line of code. Or maybe couple lines of code, e.g. for conversion between this type and database type. Without this abstraction such refactoring would be costly.

2

@Flater correctly identified the Primitive Obsession issue, but it may be warranted to explain a bit more why primitive obsession is an issue.

What’s in a type?

Types are used for a variety of purposes, so sometimes it’s easy to get lost.

At minima, a type is:

  • A set of values.
  • On which a set of sensible operations is provided.

For example, for a String:

  • Set of values: any sequence of any characters, from 0 to infinity.
  • Set of operations: many, many, different operations.

Is a String a good name, thus? Arguably no:

  • What does it mean for a name to be empty?
  • Is it problematic for a name, in this application, to contain punctuation? Non-printable characters? To be thousands of characters long?

That is, explicitly or implicitly, a good name probably has a set of values that is a subset of all possible string values.

A dedicated type (Name) can be used to establish and maintain invariants:

  • A Name is never empty.
  • A Name only contains characters in the [0-9A-Za-z_] set.
  • A Name is between 5 and 30 characters long.

Note that maintaining the invariants imply that not all String operations are available. In fact, in all likelihood, a Name is immutable once built, so invariants only have to be verified at construction.

Going further, I want to emphasize the sensible adjective: just because an operation can exist, does not mean it should exist:

  • What does it mean to catenate two Name? It’s quite likely nonsensical.
  • What does it mean to lookup a pattern in a Name? Looks like a hack that’ll come back and bite us later, should be a proper property instead.

When using a String, not only do you not have invariants, you also have an unrestricted set of operations many of which make no sense whatsoever — or worse, encourage bad practices — for a particular use of String.

Strong Typing

The practice of strong typing goes even further, by adding specific semantics to a type.

It is quite likely that in a given application, the Id used for a cat and a dog is similar: same invariants, same valid set of operations.

Yet, using the same type for both may lead to a cat-lover ending up with a dog in their lap instead, and they won’t be happy.

Applying Strong Typing, two types should be created: CatId and DogId, possibly sharing some code, inheriting from the same base class, etc… but allowing us to differentiate between Cat & Dog when it matters, so that we cannot accidentally mix them up when we do not intend to.

It’s typically more useful for statically typed languages, obviously, as there type mismatches are raised systematically.

3

If you have a “Name” class instead of string, you can use it to split into family name and given name, salutation, ordering (that’s why I didn’t say “first name” because for some people the family name comes first), how to call this person (not always the first of the given names, sometimes something totally different). You can add these features bit by bit.

Without any changes in existing code.

4

Given the description, it appears the code base isn’t just (merely) to enable loading/saving as XML; it appears to me that it’s designed to faithfully recreate an in-memory representation of the contents of an XML document, down to the tiniest detail.

Hence the complexity: XML is known to be enormously complicated.

Moreover, XML is sometimes used to represent huge documents – an XML document may contain hundreds of millions of tags. I think it should be apparent, in year 2024, that XML is not a good choice. But the designers of an XML library had to account for that, or else they’d need to make their users pay attention to the various system design limits.

Studying such code base can be rewarding for seasoned programmers, but it could also be detrimental for junior programmers, because the code base tend not to contain any explanation of the “why’s”. Don’t fall into the trap of worrying too much – it can impede the normal thinking process of a sane person. Only do it when one is well equipped with the knowledge and the concrete use cases (system requirements).

If we know the context of the code base, it is possible to reverse-engineer the design decisions like peeling an onion. If not, we can always ask the programmers who originally created or used the code base.

Working from first principles: all tags have names, and names can be organized into namespaces. Namespaces can be specified as a prefix; aliases can be created. Also, namespaces must be strong-named by specifying an URI.

If we do not consider any namespaces, we should be able to treat name as a value-like class. A value-like class is immutable; being immutable means that from the user’s perspective it is indistinguishable if it’s copied or reference-shared. This simplifies the design.

When namespaces are considered, it is now necessary for users to navigate from a Name to its Namespace, and to iterate through all Names within a given Namespace. Thus, Name and Namespace become relational.

The possibility of very large documents containing hundreds of millions of instances of “names” forces us to contend with the issue of memory consumption. Ideally, if the same name occurs in the document often enough, we would like to have just one C++ instance of this name, so that it can be shared. Replacing actual copies (e.g. std::string) with a pointer (64-bit) is good, but in applications that handle such large amounts of data, reducing to a 64-bit pointer is still not good (small) enough.

To add to this complexity, some XML frameworks allow in-memory mutability and editing. This means we cannot assume Name to be immutable.

And then the framework may allow multithreaded usage. Combined with mutability, it’s now necessary to implement a threaded mutex.

Finally, we add event-driven programming features (callback listeners). I’m not sure why it’s needed, but hey, we’re approaching Michelin one-star, if we just add one more feature across the entire framework.

Such framework will necessarily contain a lot of boilerplate. In C++, it is common to replace these boilerplate with templates and/or C-style macros. It allows senior programmers to reason about the code at a higher abstraction level; however, it makes the code base less accessible to juniors.

If I were to design this, I’d start by asking which of these aren’t necessary. Imagine if you can earn a million dollar for each feature that can be omitted.

2

LEAVE A COMMENT