We have several medium to large software projects targeting different platforms (arm32/64 and x86/amd64 on Linux and Windows) with a lot of duplicated code in them, since every project has its own split(string, delimiter)
, find_files(regex/pattern)
, and stuff like that.
Because of that we’re fixing identical bugs over and over again in different projects and waste time inventing just another variant in the next project.
So I’m working on small, platform “independent” libraries just for utility functions like these (string stuff, sockets, filesystem, time, …)
Now to the actual problem.
The string library currently provides lots of functions taking std::string
as parameters and returning std::string
(or containers containing them).
I would like to replace them with std::string_view where appropriate. But I’m not sure where that would be appropriate and how.
Functions like
std::vector<std::string> split(const std::string& str, char delimiter);
could be
std::vector<std::string_view> split(std::string_view str, char delimiter);
The std::string_view
parameter makes sense here, but the return type would force callers to handle it differently, making it more complicated when the caller wants to store the results (i.e. needs std::string).
Does it make sense to provide multiple variants like
std::vector<std::string> split(std::string_view str, char delimiter);
std::vector<std::string_view> split_v(std::string_view str, char delimiter);
?
What should be the default? (i.e. would it make more sense to have split
return string_view
and provide a split_s
return string
?). Or is there a completely different way? What would be a good approach in general, regarding both performance and ease of use (and thus acceptance),and making it hard to misuse?
9
In C++ and its standard libs, you will find many “string like” types, like std::string
, std::string_view
, std::wstring
, char *
, and so one. This old blog post from 2008 already mentioned 30 types (in a Windows legacy context, but without std::string_view
, which is C++17). I am sure the number has multiplied in between.
How does this help us to solve your problem? Well, in the situation of that huge number of string types available, I think when designing general purpose functions, it is best to focus on one type only as long as
- this makes sense
- it does not come with a huge performance impact.
So I would recommend to start with std::vector<std::string_view>
as the return type.
This gives callers the freedom to convert afterwards to another string-like type explicitly, or not, or let them add some extra helper function(s). For these, an additional add-on, an extra suffix like _sv
or _strview
should be totally acceptable for the extra functions’ name.
Of course, when it turns out that in your system, you need a specific variant of “split-stringview-with-conversion-to-string” a thousand times, but “split-stringview-without-conversion” nowhere in the whole code base, you may make a different decision.
1
std::string_view
and std::string
are two completely different kinds of types.
-
std::string_view
is a reference-type, or if you will a view.
Like all of them, it doesn’t own its data. Instead, it relies on you keeping it around for it to refer to, until you are done with it.
The big advantage is how cheap it is to create and copy, the disadvantage having to keep track of the lifetime. (Not guaranteeing nul-termination doesn’t matter here.) -
std::string
is a non-trivial value-type.
Like all of them, it owns its data, and is thus expensive to create. As it doesn’t employ copy-on-write (COW), it is equally expensive to copy, though move-semantics enable cheap moving.
The big advantage is being self-contained, the disadvantage needing an expensive allocation. Additionally, many libraries you might want to interface with use their own incompatible though inter-convertible string-type, which means creating a copy.
In summary, returning a reference-type avoids the work involved in creating a value-type, and allows the caller to convert it into whatever format is convenient without duplicating work as needed.
The single std::vector
around all of them doesn’t significantly impact the balance.
Only if you always need the same type should you change the interface, before it becomes bothersome.