Strings in R with stringr

Author

Michael Hallquist, PSYC 859

Published

March 12, 2026

The stringr package provides a consistent set of functions for working with strings. All functions start with str_ and are vectorized, so they work naturally with columns in a data.frame.

We will use a small example dataset to demonstrate the core verbs.

Code
people <- tibble::tibble(
  id = 1:5,
  name = c("Ada Lovelace", "Grace Hopper", "Margaret Hamilton",
           "Katherine Johnson", "Mary Jackson"),
  email = c("ada@navy.mil", "grace@navy.mil", "margaret@mit.edu",
            "katherine@nasa.gov", NA),
  dept = c("CompSci", "CompSci", "Engineering", "Research", "Research")
)

people %>% kable_table()
id name email dept
1 Ada Lovelace ada@navy.mil CompSci
2 Grace Hopper grace@navy.mil CompSci
3 Margaret Hamilton margaret@mit.edu Engineering
4 Katherine Johnson katherine@nasa.gov Research
5 Mary Jackson NA Research

Creating strings

R strings are wrapped in quotes. You can use either single or double quotes. Escapes use a backslash.

Code
"Line 1\nLine 2"
[1] "Line 1\nLine 2"
Code
"He said: \"strings are useful\""
[1] "He said: \"strings are useful\""
Code
"A backslash looks like this: \\"
[1] "A backslash looks like this: \\"

Raw strings are useful when you want to avoid escaping backslashes:

Code
r"(C:\Users\hallquist\Documents\file.txt)"
[1] "C:\\Users\\hallquist\\Documents\\file.txt"

Combine strings

Code
people %>%
  mutate(
    label = str_c(name, " (", dept, ")", sep = "")
  ) %>%
  kable_table()
id name email dept label
1 Ada Lovelace ada@navy.mil CompSci Ada Lovelace (CompSci)
2 Grace Hopper grace@navy.mil CompSci Grace Hopper (CompSci)
3 Margaret Hamilton margaret@mit.edu Engineering Margaret Hamilton (Engineering)
4 Katherine Johnson katherine@nasa.gov Research Katherine Johnson (Research)
5 Mary Jackson NA Research Mary Jackson (Research)

str_glue() is convenient for inline formatting:

Code
people %>%
  mutate(label = str_glue("{name} [{dept}]")) %>%
  kable_table()
id name email dept label
1 Ada Lovelace ada@navy.mil CompSci Ada Lovelace [CompSci]
2 Grace Hopper grace@navy.mil CompSci Grace Hopper [CompSci]
3 Margaret Hamilton margaret@mit.edu Engineering Margaret Hamilton [Engineering]
4 Katherine Johnson katherine@nasa.gov Research Katherine Johnson [Research]
5 Mary Jackson NA Research Mary Jackson [Research]

If you want to collapse a vector into one string, use str_flatten():

Code
str_flatten(people$dept, collapse = ", ")
[1] "CompSci, CompSci, Engineering, Research, Research"

String length and substrings

Code
people %>%
  mutate(
    n_chars = str_length(name),
    first_name = str_sub(name, 1, str_locate(name, " ")[, 1] - 1)
  ) %>%
  kable_table()
id name email dept n_chars first_name
1 Ada Lovelace ada@navy.mil CompSci 12 Ada
2 Grace Hopper grace@navy.mil CompSci 12 Grace
3 Margaret Hamilton margaret@mit.edu Engineering 17 Margaret
4 Katherine Johnson katherine@nasa.gov Research 17 Katherine
5 Mary Jackson NA Research 12 Mary

Other common helpers include str_trim() and str_squish() to handle extra whitespace:

Code
str_trim("   too much space   ")
[1] "too much space"
Code
str_squish("too    much    space")
[1] "too much space"

Detecting patterns

str_detect() returns TRUE/FALSE for each element:

Code
people %>%
  mutate(is_nasa = str_detect(email, "nasa")) %>%
  kable_table()
id name email dept is_nasa
1 Ada Lovelace ada@navy.mil CompSci FALSE
2 Grace Hopper grace@navy.mil CompSci FALSE
3 Margaret Hamilton margaret@mit.edu Engineering FALSE
4 Katherine Johnson katherine@nasa.gov Research TRUE
5 Mary Jackson NA Research NA

To find rows with any missing email:

Code
people %>%
  filter(is.na(email))
id name email dept
5 Mary Jackson NA Research

You can count matches with str_count():

Code
people %>%
  mutate(n_vowels = str_count(name, "[aeiouAEIOU]")) %>%
  kable_table()
id name email dept n_vowels
1 Ada Lovelace ada@navy.mil CompSci 6
2 Grace Hopper grace@navy.mil CompSci 4
3 Margaret Hamilton margaret@mit.edu Engineering 6
4 Katherine Johnson katherine@nasa.gov Research 6
5 Mary Jackson NA Research 3

Extracting and splitting

str_extract() pulls the first match from each string:

Code
people %>%
  mutate(domain = str_extract(email, "[^@]+$")) %>%
  kable_table()
id name email dept domain
1 Ada Lovelace ada@navy.mil CompSci navy.mil
2 Grace Hopper grace@navy.mil CompSci navy.mil
3 Margaret Hamilton margaret@mit.edu Engineering mit.edu
4 Katherine Johnson katherine@nasa.gov Research nasa.gov
5 Mary Jackson NA Research NA

Use str_split() to break strings into pieces:

Code
str_split("Ada Lovelace", " ")
[[1]]
[1] "Ada"      "Lovelace"

If you want multiple columns, tidyr::separate() is handy:

Code
people %>%
  separate(name, into = c("first", "last"), sep = " ") %>%
  kable_table()
id first last email dept
1 Ada Lovelace ada@navy.mil CompSci
2 Grace Hopper grace@navy.mil CompSci
3 Margaret Hamilton margaret@mit.edu Engineering
4 Katherine Johnson katherine@nasa.gov Research
5 Mary Jackson NA Research

Replacing

Code
people %>%
  mutate(
    email_safe = str_replace(email, "@", " at "),
    email_domain = str_replace(email, ".*@", "")
  ) %>%
  kable_table()
id name email dept email_safe email_domain
1 Ada Lovelace ada@navy.mil CompSci ada at navy.mil navy.mil
2 Grace Hopper grace@navy.mil CompSci grace at navy.mil navy.mil
3 Margaret Hamilton margaret@mit.edu Engineering margaret at mit.edu mit.edu
4 Katherine Johnson katherine@nasa.gov Research katherine at nasa.gov nasa.gov
5 Mary Jackson NA Research NA NA

To replace all matches, use str_replace_all():

Code
str_replace_all("A-1, B-2, C-3", "-", ":")
[1] "A:1, B:2, C:3"

A compact case study: parsing coded strings

Imagine IDs that pack information into a single string:

Code
ids <- tibble::tibble(
  code = c("S01_age=21", "S02_age=19", "S03_age=22", "S04_age=20")
)

ids %>% kable_table()
code
S01_age=21
S02_age=19
S03_age=22
S04_age=20

We can extract the subject ID and age using str_match():

Code
ids %>%
  mutate(
    subject = str_match(code, "^(S\\d{2})")[, 2],
    age = as.numeric(str_match(code, "age=(\\d{2})$")[, 2])
  ) %>%
  kable_table()
code subject age
S01_age=21 S01 21
S02_age=19 S02 19
S03_age=22 S03 22
S04_age=20 S04 20

Summary

The stringr toolkit is built around a small set of verbs:

  1. Create/Combine: str_c(), str_glue(), str_flatten()
  2. Inspect: str_length(), str_detect(), str_count()
  3. Extract/Split: str_extract(), str_match(), str_split()
  4. Modify: str_sub(), str_replace(), str_replace_all(), str_trim()

For more detail and worked examples, see R4DS Ch. 14: https://r4ds.hadley.nz/strings.html

Appendix: regex primer (optional)

Regular expressions (regex) let you describe patterns in text. stringr uses standard regex syntax in most str_ functions.

Core building blocks:

  • . any character
  • ^ start of string, $ end of string
  • * zero or more, + one or more, ? zero or one
  • [] character class (e.g., [A-Z], [0-9])
  • () grouping and capture
  • | OR

Common examples:

Code
str_detect("room 312", "\\\\d+")         # any digits
[1] FALSE
Code
str_detect("A12", "^[A-Z]\\\\d{2}$")     # one letter, two digits
[1] FALSE
Code
str_extract("x=42", "\\\\d+")            # extract digits
[1] NA
Code
str_replace("abc-123", "^[a-z]+-", "")   # drop leading letters and dash
[1] "123"

Useful helpers:

  • str_detect(x, pattern) returns TRUE/FALSE
  • str_extract(x, pattern) returns the first match
  • str_replace(x, pattern, replacement) replaces the first match
  • str_replace_all(x, pattern, replacement) replaces all matches

For a deeper treatment and practice, see R4DS Ch. 15: https://r4ds.hadley.nz/regexps.html