std::string (C++) and char* (or c-string "string" for C)
You're not working with strings. You're working with pointers. var1
is a char pointer (const char*
). It is not a string. If it is null-terminated, then certain C functions will treat it as a string, but it is fundamentally just a pointer.
So when you compare it to a char array, the array decays to a pointer as well, and the compiler then tries to find an operator == (const char*, const char*)
.
Such an operator does exist. It takes two pointers and returns true
if they point to the same address. So the compiler invokes that, and your code breaks.
IF you want to do string comparisons, you have to tell the compiler that you want to deal with strings, not pointers.
The C way of doing this is to use the strcmp
function:
This will return zero if the two strings are equal. (It will return a value greater than zero if the left-hand side is lexicographically greater than the right hand side, and a value less than zero otherwise.)
So to compare for equality you need to do one of these:
However, C++ has a very useful string
class. If we use that your code becomes a fair bit simpler. Of course we could create strings from both arguments, but we only need to do it with one of them:
Now the compiler encounters a comparison between string and char pointer. It can handle that, because a char pointer can be implicitly converted to a string, yielding a string/string comparison. And those behave exactly as you'd expect.
--------
Today we'll continue our C-to-C++ migration theme by focusing on std::string
, a container-like class used to manage strings. std::string
provides much more straightforward string management interfaces, allows you to utilize SBRM design patterns, and helps eliminate string management overhead.
Let's start off by reviewing built-in string support in C/C++.
Built-in String Support (C-style Strings)
Let's start off with a review of built-in string support, henceforth referred to as "C-style strings".
Neither C or C++ have a default built-in string type. C-strings are simply implemented as a char
array which is terminated by a null character (aka 0
). This last part of the definition is important: all C-strings are char
arrays, but not all char
arrays are c-strings.
C-strings of this form are called "string literals":
String literals are indicated by using the double quote ("
) and are stored as a constant (const
) C-string. The null character is automatically appended at the end for your convenience.
The standard library contains functions for processing C-strings, such as strlen
, strcpy
, and strcat
. These functions are defined in the C header string.h
and in the C++ header cstring
. These standard C-string functions require the strings to be terminated with a null character to function correctly.
DISADVANTAGES OF C-STRINGS
C arrays do not track their own size. You must keep up with size on your own or rely on the linear-time strlen
function to determine the size of each string during runtime. Since C has no concept of boundary protection, the use of the null character is of paramount importance: the C library functions require it, or else they operate past the bounds of the array
Working with C-strings is not intuitive. Functions are required to compare strings, and the output of the strcmp
functions is not intuitive either. For functions like strcpy
and strcat
, the programmer is required to remember the correct argument order for each call. Inverting arguments can have a non-obvious yet negative effect.
Many C-strings are used as fixed-size arrays. This is true for literals as well as arrays that are declared in the form char str[32]
. For dynamically sized strings, programmers must worry about manually allocating, resizing, and copying strings.
The concept of C-string size/length is not intuitive and commonly results in off-by-one bugs. The null character that marks the end of a C-string requires a byte of storage in the char
array. This means that a string of length 24 needs to be stored in a 25-byte char
array. However, the strlen
function returns the length of the string without the null character. This simple fact has tripped up many programmers (including myself) when copying around memory. Eventually, you end up with a non-null-terminated string, causing the string library functions to operate out-of-bounds.
What If We Could Fix Those Disadvantages?
What if we could fix those disadvantages? What would our ideal string use-case look like? Here are some ideas:
Flexible storage capacity
Constant-time string length retrieval (rather than a linear-time functional check)
No need to worry about manual memory management or resizing strings
Boundary issues are handled for me, with or without a null character.
Intuitive assignment using the
=
operator rather thanstrcpy
Intuitive comparison using the
==
operator rather thanstrcmp
Intuitive interfaces for other operations such as concatenation (
+
operator is nice!), tokenization
std::string
std::string
Luckily, the C++ std::string
class scratches most of these itches for us. Fundamentally, you can consider std::string
as a container for handling char
arrays, similar to std::vector<char>
with some specialized function additions.
The std::string
class manages the underlying storage for you, storing your strings in a contiguous manner. You can get access to this underlying buffer using the c_str()
member function, which will return a pointer to null-terminated char
array. This allows std::string
to interoperate with C-string APIs.
Let's take a look at using std::string
.
DECLARATION AND ASSIGNMENT
Declaring a std::string
object is simple:
You can also initialize it with a C-string:
Or initialize it by copying another std::string
object:
Or even by making a substring out of another std::string
:
There's also a "fill" constructor for std::string
which allows you to populate the buffer with a repeated series of characters:
Assigning values to a std::string
is also simple, as you just need to use the =
operator:
Isn't this so much easier than using strcpy
?
COMPARING STRINGS
Comparing strings for equality using std::string
is also much more intuitive, as the ==
operator has been overloaded for comparison:
The use of the ==
operator works as long as one of the values is a std::string
. This means we can compare the std::string
to a string literal:
You can also compare strings lexicographically using the other comparison operators (<
, data-preserve-html-node="true" >
):
If you're not familiar with lexicographical ordering, it is the ordering by ASCII values of the characters. In ASCII, all upper case letters come before the lower case letters, so "apple" > "Apple".
If you prefer a functional comparison interface, std::string
also provides a compare
function. This function is similar to strcmp
:
0
indicates equalityPositive values indicate that the second string comes first lexicographically
Negative values mean your string object comes first lexicographically.
You can also compare substrings of two different string objects. The substring is of length Y, starting at position X.
CONCATENATING STRINGS
I'm sure at this point you won't be surprised: concatenating two strings is a trivial operation that involves using the +
operator:
If you prefer a functional interface, std::string
also provides an append
function. Each of these functions appends something onto the end of your std::string
object:
ACCESSING CHARACTERS
Similar to C-strings, std::string
supports the indexing operator []
to access specific characters. Just as with C-strings and arrays, indexing starts at 0. As with other containers, the indexing operator does not support bounds checking. If you wish to have bounds checking applied, you can use the at()
member function.
OTHER STD::STRING
INTERFACES
STD::STRING
INTERFACESstd::string
provides many other useful interfaces. I'll just provide a brief overview of functionality - full interface documentation can be found at cppreference.
For handling storage:
size()
andlength()
both return the length of thestd::string
size
is provided to maintain a common interface with container classes
capacity()
provides the current number of characters that can be held in the currently allocated storageempty()
returnstrue
if a string is currently emptyclear()
resets the container to an empty stringreserve()
resizes the underlying storage buffer to the requested capacityresize()
performs a similar operation, but provides the option of filling new characters with a specific valueshrink_to_fit()
shrinks the buffer to the current string size, freeing up unused storage capacity
For modifying strings:
insert()
inserts characters or strings at a specific positionreplace()
replaces characters in a substringpush_back()
appends a character to the end of the stringpop_back()
removes the last character of the stringerase()
removes specific characters
For working with substrings:
substr()
returns a copy of the substring at the specified positionfind()
identifies the first position within a string where the specified character or substring can be foundrfind()
finds the last occurrence of a substringfind_first_of()
finds the first occurrence of a substringfind_last_of()
finds the last occurrence of a substringfind_first_not_of()
finds the first absence of a substringfind_last_not_of()
finds the last absence of a substring
Remember, full documentation can be found on cppreference.com
A Note on Avoiding Copy Overhead
Unless you want to make a copy of your std::string
, you will want to avoid passing around strings by value:
Instead, you should pass the argument by reference if you want to modify the string:
Or by const
reference if the string will not be modified:
I rarely find myself passing around std::string
containers by value, since I want to avoid the unnecessary copies.
When Should I Use std::string
?
std::string
?Great, now we have some idea of what we can do with a std::string
. When and why should I use std::string
over C-strings?
Let's consider some of the advantages to using std::string
:
Ability to utilize SBRM design patterns
The interfaces are much more intuitive to use, leading to less chances of messing up argument order
Better searching, replacement, and string manipulation functions (c.f. the
cstring
library)The
size
/length
functions are constant time (c.f. the linear timestrlen
function)Reduced boilerplate by abstracting memory management and buffer resizing
Reduced risk of segmentation faults by utilizing iterators and the
at()
functionCompatible with STL algorithms
In general, std::string
provides a modern interface for string management and will help you write much more straightforward code than C-strings. In general, prefer std::string
to C-strings, but especially prefer std::string
for mutable strings.
std::string
Limitations
std::string
LimitationsThere's storage overhead involved with using a std::string
object. C-strings are the simplest possible storage method for a string, making them attractive in situations where memory must be conserved. However, similar to other C++ containers, I find that this minor overhead is worth the convenience.
When utilizing a std::string
, memory must be dynamically allocated and initialized during runtime. You cannot pre-allocate a std::string
buffer during compile-time ands you cannot supply a pre-existing buffer for std::string
to assume ownership over. Unlike std::string
, C-strings can utilize compile-time allocation and determination of size. Additionally, memory allocation is handled by the std::string
class itself. If you need fine-grained control of memory management, look to manual management with C-strings.
One major gripe I have with std::strings
is that they don't play nicely with string literals. String literals are placed in static storage and cannot be taken over by a std::string
. Initializing a std::string
using a string literal will always involve a copy. C-strings still seem to be the best storage option for string literals, especially if you want to avoid unnecessary copies (such as in an embedded environment).
C++17: std::string_view
std::string_view
If you are using C++17, you can avoid memory allocation and still enjoy the C++ string
interfaces by using std::string_view
. The entire purpose of std::string_view
is to avoid copying data which is already owned and of which only a fixed view is required. A std::string_view
can refer to both a C++ string
or a C-string. All that std::string_view
needs to store is a pointer to the character sequence and a length.
std::string_view
provides the same API that std::string
does, so it is a perfect match for C-style string literals.
The only catch with std::string_view
is that it is non-owning, so the programmer is responsible for making sure the std::string_view
does not outlive the string which it points to. Embedded applications are mostly interested in forcing static memory allocations, so there is little worry about lifetime problems when using string literals with std::string_view
.
Putting it All Together
I've written a basic std::string
example which can be found in the embedded-resources
Github repository.
Last updated