What is a ‘Null Terminated String’ ?

  softwareengineering

I have just started reading C++, and I came across a term “Null Terminated String”, I read about it but couldn’t understand what it actually stands for.

I also want to know what is the difference between a c-string(also referred as “Null terminated String”) and a c++ string.

0

Short answer: a null terminated string is a char array with a null value (0x00) after the last valid character in the string.


Long Answer:

It’s important to remember that not every C and C++ compiler will initialize values for you. AFAIK, most don’t.

A basic string in C or C++ (without STL) is simply an array of characters.

char myString[25];

At this point in time, we have no idea what’s within that string. It could be empty; it could have garbage characters (most likely); or it could have meaningful information. It all depends upon what was in that memory segment before the array was declared.

Note that we have 24 characters of storage here, and the null will take the 25th character.

It’s common practice to pre-fill and clear a string with nulls to get rid of any garbage.

memset(myString, 0x00, 25);

Note that in this case I’m using a hexadecimal declaration of 0 to indicate NULL. Some compilers and / or libraries have a NULL value or similar defined.

Many of the basic string functions like strcmp, strcat, etc… rely upon null terminated strings to indicate the end of the string. If you don’t have it terminated then the string function can run off the end of the string and not act as you would expect.

The C++ STL string is an actual object and takes care of some of those initialization / termination concerns for you.

2

In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (”, called NUL in ASCII).

http://en.wikipedia.org/wiki/Null-terminated_string

1

There are some excellent answers in this thread, but I’d like to add one meant for a person who learned computer programming starting from a strongly-typed language like Java or C#, and never programmed in a weakly-typed language like C or C++.

(Note that I’m talking about strong vs. weak typing, not dynamic vs. static typing. The exact definition of weak typing is a fascinating discussion on its own, but outside of the scope of this answer 🙂

To understand null-terminated strings we need to start from how the data is stored in weakly-typed systems. In these systems the entire memory is just one big sequence of bytes and the program has access to any of these bytes at any time. It is up to the program to interpret the bytes correctly. For instance, when the program needs to read an 32-bit integer at address A1, it reads 4 bytes starting at address A1 and interprets them as a single 32-bit integer. It knows that 32-bit integer is 4 bytes of size, so it doesn’t need to have any marker for where is the integer supposed to end.

This is not true about text strings, which in most languages can be of arbitrary size, and are represented by a single byte per character (or 2 bytes for UNICODE strings). Thus knowing the starting address of a string does not mean the program knows where the string ends. Keep in mind – in weakly-typed languages there is nothing stopping the program from reading the memory beyond the end of the string and keep interpreting the bytes that represent the data stored behind the string as further characters.

So in order to read a text string at address A2 the program needs a way of knowing how long the string is, so that it knows how many bytes it should read. Some languages will deal with it by storing the size of the text string in the first byte (or 2 or even 4 bytes). A string “foo” might be 4 bytes long and look like this:

3 102 111 111

where 3 is the length of the string and 102 and 111 are ASCII codes for the characters ‘f’ and ‘o’. This is pretty simple, but limits the maximum length of any string, in this case to 255 characters (since 255 is maximum integer value that can be stored in a single byte we used to keep the length of the string).

Another way of dealing with this problem is marking the end of the string, and this is exactly what a null-terminated string does. It uses a NULL character represented by ASCII value of 0 (zero). So the same string “foo” might look like this:

102 111 111 0

Note that in this case there is no limit for the length of a string that can be represented in this format and the overhead of the representation is always exactly one byte (the finial zero). Obviously text strings containing the NULL character cannot be represented as null-terminated strings at all.

1

Null-terminated strings are not like strings in most other languages. They are the standard way to represent strings in C, as nothing more than an array of characters in sequential order. In a language like C++, a string is an actual object with parameters and stuff.

The problem with this array of characters is, how do you know when to stop reading, where the end of the string is? Since the null character isn’t used for anything else, it’s used to terminate the string, i.e. mark the endpoint.

2

To answer the second question first, a C++-string is an instance of the class std::string that is part of the C++ standard library.
A c-string (or c-style string, or NUL-terminated string) is a sequence of characters that ends at the first '' (ASCII NUL) character.

One important difference is that a std::string can contain embedded NUL characters within its contents, but a C-style string by definition can not (as it ends at the first NUL character).

So, the term ‘NUL terminated string’ (often “misspelled” as null-terminated) comes from the fact that such a string ends in (is terminated by) a NUL ('') character.

A null-terminated string is a sequence of characters with a trailing 0-valued character. So a string like "Hi" is represented as the sequence {72, 105, 0} (ASCII). The 0 is a sentinel value that indicates the end of the string. The C string library functions (strcmp, strcpy, etc.) rely on the presence of that 0 byte to operate correctly.

This is different from Pascal or old-school BASIC strings that stored the string length in the leading byte ({2, 72, 105}).

In C, strings are stored in arrays of char.

The C++ string class uses null-terminated strings under the hood (at least in the implementations I’m familiar with), but its interface is such that you don’t normally have to worry about that level of detail.

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT