# std::string (C++) and char\* (or c-string "string" for C)

You're not working with strings. You're working with pointers. `var1` is a char pointer (`const char*`). It is not a string. If it is null-terminated, then certain C functions will *treat* it as a string, but it is fundamentally just a pointer.

So when you compare it to a char array, the array decays to a pointer as well, and the compiler then tries to find an `operator == (const char*, const char*)`.

Such an operator does exist. It takes two pointers and returns `true` if they point to the same address. So the compiler invokes that, and your code breaks.

IF you want to do string comparisons, you have to tell the compiler that you want to deal with *strings*, not *pointers*.

The C way of doing this is to use the `strcmp` function:

```
strcmp(var1, "dev");
```

This will return zero if the two strings are *equal*. (It will return a value greater than zero if the left-hand side is lexicographically greater than the right hand side, and a value less than zero otherwise.)

So to compare for equality you need to do one of these:

```
if (!strcmp(var1, "dev")){...}
if (strcmp(var1, "dev") == 0) {...}
```

However, C++ has a very useful `string` class. If we use that your code becomes a fair bit simpler. Of course we could create strings from both arguments, but we only need to do it with one of them:

```
std::string var1 = getenv("myEnvVar");

if(var1 == "dev")
{
   // do stuff
}
```

Now the compiler encounters a comparison between string and char pointer. It can handle that, because a char pointer can be implicitly converted to a string, yielding a string/string comparison. And those behave exactly as you'd expect.

\--------

Today we'll continue our C-to-C++ migration theme by focusing on `std::string`, a container-like class used to manage strings. `std::string` provides much more straightforward string management interfaces, allows you to utilize SBRM design patterns, and helps eliminate string management overhead.

Let's start off by reviewing built-in string support in C/C++.

## Built-in String Support (C-style Strings) <a href="#built-in-string-support-c-style-strings" id="built-in-string-support-c-style-strings"></a>

Let's start off with a review of built-in string support, henceforth referred to as "C-style strings".

Neither C or C++ have a default built-in string type. C-strings are simply implemented as a `char`array which is terminated by a null character (aka `0`). This last part of the definition is important: all C-strings are `char` arrays, but not all `char` arrays are c-strings.

C-strings of this form are called "[string literals](http://en.cppreference.com/w/cpp/language/string_literal)":

```
const char * str = "This is a string literal. See the double quotes?"
```

[String literals](http://en.cppreference.com/w/cpp/language/string_literal) are indicated by using the double quote (`"`) and are stored as a constant (`const`) C-string. The null character is automatically appended at the end for your convenience.

The standard library contains functions for processing C-strings, such as `strlen`, `strcpy`, and `strcat`. These functions are defined in the C header `string.h` and in the C++ header `cstring`. These standard C-string functions require the strings to be terminated with a null character to function correctly.

### DISADVANTAGES OF C-STRINGS <a href="#disadvantages-of-c-strings" id="disadvantages-of-c-strings"></a>

C arrays do not track their own size. You must keep up with size on your own or rely on the linear-time `strlen` function to determine the size of each string during runtime. Since C has no concept of boundary protection, the use of the null character is of paramount importance: the C library functions require it, or else they operate past the bounds of the array

Working with C-strings is not intuitive. Functions are required to compare strings, and the output of the `strcmp` functions is not intuitive either. For functions like `strcpy` and `strcat`, the programmer is required to remember the correct argument order for each call. Inverting arguments can have a non-obvious yet negative effect.

Many C-strings are used as fixed-size arrays. This is true for literals as well as arrays that are declared in the form `char str[32]`. For dynamically sized strings, programmers must worry about manually allocating, resizing, and copying strings.

The concept of C-string size/length is not intuitive and commonly results in off-by-one bugs. The null character that marks the end of a C-string requires a byte of storage in the `char` array. This means that a string of length 24 needs to be stored in a 25-byte `char` array. However, the `strlen` function returns the length of the string *without the null character*. This simple fact has tripped up many programmers (including myself) when copying around memory. Eventually, you end up with a non-null-terminated string, causing the string library functions to operate out-of-bounds.

## What If We Could Fix Those Disadvantages? <a href="#what-if-we-could-fix-those-disadvantages" id="what-if-we-could-fix-those-disadvantages"></a>

What if we could fix those disadvantages? What would our ideal string use-case look like? Here are some ideas:

* Flexible storage capacity
* Constant-time string length retrieval (rather than a linear-time functional check)
* No need to worry about manual memory management or resizing strings
* Boundary issues are handled for me, with or without a null character.
* Intuitive assignment using the `=` operator rather than `strcpy`
* Intuitive comparison using the `==` operator rather than `strcmp`
* Intuitive interfaces for other operations such as concatenation (`+` operator is nice!), tokenization

## `std::string` <a href="#std-string" id="std-string"></a>

Luckily, the C++ [`std::string`](http://en.cppreference.com/w/cpp/string/basic_string) class scratches most of these itches for us. Fundamentally, you can consider `std::string` as a container for handling `char` arrays, similar to `std::vector<char>` with some specialized function additions.

The `std::string` class manages the underlying storage for you, storing your strings in a contiguous manner. You can get access to this underlying buffer using the `c_str()` member function, which will return a pointer to null-terminated `char` array. This allows `std::string` to interoperate with C-string APIs.

Let's take a look at using `std::string`.

### DECLARATION AND ASSIGNMENT <a href="#declaration-and-assignment" id="declaration-and-assignment"></a>

Declaring a `std::string` object is simple:

```
std::string my_str;
```

You can also initialize it with a C-string:

```
std::string name("Phillip");
```

Or initialize it by copying another `std::string` object:

```
std::string name2(name);
```

Or even by making a substring out of another `std::string`:

```
std::string lip(name, 4);
```

There's also a "fill" constructor for `std::string` which allows you to populate the buffer with a repeated series of characters:

```
// fill the string with a char. note the single quotes
std::string filled(16, 'A');
```

Assigning values to a `std::string` is also simple, as you just need to use the `=` operator:

```
// c-string assignment
my_str = "Phillip";

// Copy assignment
my_str = filled;

// Move assignment
my_str = std::move(name2);
```

Isn't this so much easier than using `strcpy`?

### COMPARING STRINGS <a href="#comparing-strings" id="comparing-strings"></a>

Comparing strings for equality using `std::string` is also much more intuitive, as the `==`operator has been overloaded for comparison:

```
if(my_str == name2)
{
    std::cout << "my_str and name2 match!" << std::endl;
}
```

The use of the `==` operator works as long as one of the values is a `std::string`. This means we can compare the `std::string` to a string literal:

```
if(my_str == "Phillip")
{
    std::cout << "my_str and \"Phillip\" match!" << std::endl;
}
```

You can also compare strings lexicographically using the other comparison operators (`<`, data-preserve-html-node="true" `>`):

```
if(string1 < string2)
{
    std::cout << "string1 comes first lexicographically" << std::endl;
}
```

If you're not familiar with lexicographical ordering, it is the ordering by ASCII values of the characters. In ASCII, all upper case letters come before the lower case letters, so "apple" > "Apple".

If you prefer a functional comparison interface, `std::string` also provides a `compare` function. This function is similar to `strcmp`:

* `0` indicates equality
* Positive values indicate that the second string comes first lexicographically
* Negative values mean your string object comes first lexicographically.

```
if(!str1.compare(str2))
{
    std::cout << "These strings are equal" << std::endl;
}
```

You can also compare substrings of two different string objects. The substring is of length Y, starting at position X.

```
if(!str1.compare(str2, x, y))
{
    std::cout << "String 1 is equal to the substring of String 2" << std::endl;
}
```

### CONCATENATING STRINGS <a href="#concatenating-strings" id="concatenating-strings"></a>

I'm sure at this point you won't be surprised: concatenating two strings is a trivial operation that involves using the `+` operator:

```
//Concatenation is also simple!
my_str = lip + name2;
my_str += "lip"; //C-string cat works too
```

If you prefer a functional interface, `std::string` also provides an `append` function. Each of these functions appends something onto the end of your `std::string` object:

```
std::string my_str("test");
std::string str2("boo");
const char * c_str = "This is a c_str";

// We can append a string
my_str.append(str2);
my_str.append(c_str);

// We can append X characters from the beginning of a string
my_str.append(str2, x);
my_str.append(c_str, x);

// We can also append a substring, starting at index X and of length Y
my_str.append(str2, x, y);
my_str.append(c_str, x, y);
```

### ACCESSING CHARACTERS <a href="#accessing-characters" id="accessing-characters"></a>

Similar to C-strings, `std::string` supports the indexing operator `[]` to access specific characters. Just as with C-strings and arrays, indexing starts at 0. As with other containers, the indexing operator does not support bounds checking. If you wish to have bounds checking applied, you can use the `at()` member function.

### OTHER `STD::STRING` INTERFACES <a href="#other-std-string-interfaces" id="other-std-string-interfaces"></a>

`std::string` provides [many other useful interfaces](http://en.cppreference.com/w/cpp/string/basic_string). I'll just provide a brief overview of functionality - full interface documentation can be found at [cppreference](http://en.cppreference.com/w/cpp/string/basic_string).

For handling storage:

* `size()` and `length()` both return the length of the `std::string`
  * `size` is provided to maintain a common interface with container classes
* `capacity()` provides the current number of characters that can be held in the currently allocated storage
* `empty()` returns `true` if a string is currently empty
* `clear()` resets the container to an empty string
* `reserve()` resizes the underlying storage buffer to the requested capacity
* `resize()` performs a similar operation, but provides the option of filling new characters with a specific value
* `shrink_to_fit()` shrinks the buffer to the current string size, freeing up unused storage capacity

For modifying strings:

* `insert()` inserts characters or strings at a specific position
* `replace()` replaces characters in a substring
* `push_back()` appends a character to the end of the string
* `pop_back()` removes the last character of the string
* `erase()` removes specific characters

For working with substrings:

* `substr()` returns a copy of the substring at the specified position
* `find()` identifies the first position within a string where the specified character or substring can be found
* `rfind()` finds the last occurrence of a substring
* `find_first_of()` finds the first occurrence of a substring
* `find_last_of()` finds the last occurrence of a substring
* `find_first_not_of()` finds the first absence of a substring
* `find_last_not_of()` finds the last absence of a substring

Remember, full documentation can be found on [cppreference.com](http://en.cppreference.com/w/cpp/string/basic_string)

## A Note on Avoiding Copy Overhead <a href="#a-note-on-avoiding-copy-overhead" id="a-note-on-avoiding-copy-overhead"></a>

Unless you want to make a copy of your `std::string`, you will want to avoid passing around strings by value:

```
void foo(std::string str);
```

Instead, you should pass the argument by reference if you want to modify the string:

```
void foo(std::string &str);
```

Or by `const` reference if the string will not be modified:

```
void foo(const std::string &str);
```

I rarely find myself passing around `std::string` containers by value, since I want to avoid the unnecessary copies.

## When Should I Use `std::string`? <a href="#when-should-i-use-std-string" id="when-should-i-use-std-string"></a>

Great, now we have some idea of what we can do with a `std::string`. When and why should I use `std::string` over C-strings?

Let's consider some of the advantages to using `std::string`:

* Ability to utilize [SBRM design patterns](https://embeddedartistry.com/blog/2017/7/17/migrating-from-c-to-c-take-advantage-of-raiisbrm)
* The interfaces are much more intuitive to use, leading to less chances of messing up argument order
* Better searching, replacement, and string manipulation functions (c.f. the `cstring` library)
* The `size`/`length` functions are constant time (c.f. the linear time `strlen` function)
* Reduced boilerplate by abstracting memory management and buffer resizing
* Reduced risk of segmentation faults by utilizing iterators and the `at()` function
* Compatible with STL algorithms

In general, `std::string` provides a modern interface for string management and will help you write much more straightforward code than C-strings. In general, prefer `std::string` to C-strings, but especially prefer `std::string` for mutable strings.

## `std::string` Limitations <a href="#std-string-limitations" id="std-string-limitations"></a>

There's storage overhead involved with using a `std::string` object. C-strings are the simplest possible storage method for a string, making them attractive in situations where memory must be conserved. However, similar to other C++ containers, I find that this minor overhead is worth the convenience.

When utilizing a `std::string`, memory must be dynamically allocated and initialized during runtime. You cannot pre-allocate a `std::string` buffer during compile-time ands you cannot supply a pre-existing buffer for `std::string` to assume ownership over. Unlike `std::string`, C-strings can utilize compile-time allocation and determination of size. Additionally, memory allocation is handled by the `std::string` class itself. If you need fine-grained control of memory management, look to manual management with C-strings.

One major gripe I have with `std::strings` is that they don't play nicely with string literals. String literals are placed in static storage and cannot be taken over by a `std::string`. Initializing a `std::string` using a string literal will always involve a copy. C-strings still seem to be the best storage option for string literals, especially if you want to avoid unnecessary copies (such as in an embedded environment).

## C++17: `std::string_view` <a href="#c-17-std-string_view" id="c-17-std-string_view"></a>

If you are using C++17, you can avoid memory allocation and still enjoy the C++ `string`interfaces by using [`std::string_view`](https://en.cppreference.com/w/cpp/string/basic_string_view). The entire purpose of `std::string_view` is to avoid copying data which is already owned and of which only a fixed view is required. A `std::string_view` can refer to both a C++ `string` or a C-string. All that `std::string_view`needs to store is a pointer to the character sequence and a length.

`std::string_view` provides the same API that `std::string` does, so it is a perfect match for C-style string literals.

```
std::string_view my_view("Works with a string literal");
```

The only catch with `std::string_view` is that it is non-owning, so the programmer is responsible for making sure the `std::string_view` does not outlive the string which it points to. Embedded applications are mostly interested in forcing static memory allocations, so there is little worry about lifetime problems when using string literals with `std::string_view`.

## Putting it All Together <a href="#putting-it-all-together" id="putting-it-all-together"></a>

I've written a [basic `std::string` example](https://github.com/embeddedartistry/embedded-resources/blob/master/examples/cpp/string.cpp) which can be found in the [`embedded-resources`Github repository](http://github.com/embeddedartistry/embedded-resources).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sisyphus.gitbook.io/project/c++-notes/c++-class-string-and-char.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
