Title: | Extra String Manipulation Functions |
---|---|
Description: | There are some things that I wish were easier with the 'stringr' or 'stringi' packages. The foremost of these is the extraction of numbers from strings. 'stringr' and 'stringi' make you figure out the regular expression for yourself; 'strex' takes care of this for you. There are many other handy functionalities in 'strex'. Contributions to this package are encouraged; it is intended as a miscellany of string manipulation functions that cannot be found in 'stringi' or 'stringr'. |
Authors: | Rory Nolan [aut, cre] |
Maintainer: | Rory Nolan <[email protected]> |
License: | GPL-3 |
Version: | 2.0.1 |
Built: | 2024-11-01 03:25:25 UTC |
Source: | https://github.com/rorynolan/strex |
n
th occurrence of pattern.Extract the part of a string which is before or after the n
th occurrence of
a specified pattern, vectorized over the string.
str_after_nth(string, pattern, n) str_after_first(string, pattern) str_after_last(string, pattern) str_before_nth(string, pattern, n) str_before_first(string, pattern) str_before_last(string, pattern)
str_after_nth(string, pattern, n) str_after_first(string, pattern) str_after_last(string, pattern) str_before_nth(string, pattern, n) str_before_first(string, pattern) str_before_last(string, pattern)
string |
A character vector. |
pattern |
The pattern to look for. The default interpretation is a regular expression, as described in stringi::about_search_regex. To match a without regular expression (i.e. as a human would), use
coll(). For details see |
n |
A vector of integerish values. Must be either length 1 or
have length equal to the length of |
str_after_first(...)
is just str_after_nth(..., n = 1)
.
str_after_last(...)
is just str_after_nth(..., n = -1)
.
str_before_first(...)
is just str_before_nth(..., n = 1)
.
str_before_last(...)
is just str_before_nth(..., n = -1)
.
A character vector.
Other bisectors:
str_before_last_dot()
string <- "abxxcdxxdexxfgxxh" str_after_nth(string, "xx", 3) str_before_nth(string, "e", 1:2) str_before_nth(string, "xx", -3) str_before_nth(string, ".", -3) str_before_nth(rep(string, 2), "..x", -3) str_before_first(string, "d") str_before_last(string, "x") string <- c("abc", "xyz.zyx") str_after_first(string, ".") # using regex str_after_first(string, coll(".")) # using human matching str_after_last(c("xy", "xz"), "x")
string <- "abxxcdxxdexxfgxxh" str_after_nth(string, "xx", 3) str_before_nth(string, "e", 1:2) str_before_nth(string, "xx", -3) str_before_nth(string, ".", -3) str_before_nth(rep(string, 2), "..x", -3) str_before_first(string, "d") str_before_last(string, "x") string <- c("abc", "xyz.zyx") str_after_first(string, ".") # using regex str_after_first(string, coll(".")) # using human matching str_after_last(c("xy", "xz"), "x")
The currency of a number is defined as the character coming before the number in the string. If nothing comes before (i.e. if the number is the first thing in the string), the currency is the empty string, similarly the currency can be a space, comma or any manner of thing.
str_extract_currencies(string) str_nth_currency(string, n) str_first_currency(string) str_last_currency(string)
str_extract_currencies(string) str_nth_currency(string, n) str_first_currency(string) str_last_currency(string)
string |
A character vector. |
n |
A vector of integerish values. Must be either length 1 or
have length equal to the length of |
These functions are vectorized over string
and n
.
str_extract_currencies()
extracts all currency amounts.
str_nth_currency()
just gets the n
th currency amount from each string.
str_first_currency(string)
and str_last_currency(string)
are just
wrappers for str_nth_currency(string, n = 1)
and str_nth_currency(string, n = -1)
.
"-$2.00" and "$-2.00" are interpreted as negative two dollars.
If you request e.g. the 5th currency amount but there are only 3 currency
amounts, you get an amount and currency symbol of NA
.
A data frame with 4 columns: string_num
, string
, curr_sym
and
amount
. Every extracted currency amount gets its own row in the data
frame detailing the string number and string that it was extracted from,
the currency symbol and the amount.
string <- c("ab3 13", "$1", "35.00 $1.14", "abc5 $3.8", "stuff") str_extract_currencies(string) str_nth_currency(string, n = 2) str_nth_currency(string, n = -2) str_nth_currency(string, c(1, -2, 1, 2, -1)) str_first_currency(string) str_last_currency(string)
string <- c("ab3 13", "$1", "35.00 $1.14", "abc5 $3.8", "stuff") str_extract_currencies(string) str_nth_currency(string, n = 2) str_nth_currency(string, n = -2) str_nth_currency(string, c(1, -2, 1, 2, -1)) str_first_currency(string) str_last_currency(string)
If strings are numbered, their numbers may not comply with alphabetical
order, e.g. "abc2" comes after "abc10"
in alphabetical order. We might (for
whatever reason) wish to change them such that they come in the order that
we would like. This function alters the strings such that they comply with
alphabetical order, so here "abc2"
would be renamed to "abc02". It works on
file names with more than one number in them e.g. "abc01def3"
(a string
with 2 numbers). All the strings in the character vector string
must have
the same number of numbers, and the non-number bits must be the same.
str_alphord_nums(string)
str_alphord_nums(string)
string |
A character vector. |
A character vector.
string <- paste0("abc", 1:12) print(string) str_alphord_nums(string) str_alphord_nums(c("abc9def55", "abc10def7")) str_alphord_nums(c("01abc9def55", "5abc10def777", "99abc4def4")) str_alphord_nums(1:10) ## Not run: str_alphord_nums(c("abc9def55", "abc10xyz7")) # error ## End(Not run)
string <- paste0("abc", 1:12) print(string) str_alphord_nums(string) str_alphord_nums(c("abc9def55", "abc10def7")) str_alphord_nums(c("01abc9def55", "5abc10def777", "99abc4def4")) str_alphord_nums(1:10) ## Not run: str_alphord_nums(c("abc9def55", "abc10xyz7")) # error ## End(Not run)
This is usually used to get the part of a file name that doesn't include the
file extension. It is vectorized over string
. If there is no period in
string
, the input is returned.
str_before_last_dot(string)
str_before_last_dot(string)
string |
A character vector. |
A character vector.
Other bisectors:
before-and-after
str_before_last_dot(c("spreadsheet1.csv", "doc2.doc", ".R"))
str_before_last_dot(c("spreadsheet1.csv", "doc2.doc", ".R"))
After padding is removed, could the input string be considered to be numeric, i.e. could it be coerced to numeric. This function is vectorized over its one argument.
str_can_be_numeric(string)
str_can_be_numeric(string)
string |
A character vector. |
A logical vector.
str_can_be_numeric("3") str_can_be_numeric("5 ") str_can_be_numeric(c("1a", "abc"))
str_can_be_numeric("3") str_can_be_numeric("5 ") str_can_be_numeric(c("1a", "abc"))
Vectorized over string
.
str_detect_all(string, pattern, negate = FALSE) str_detect_any(string, pattern, negate = FALSE)
str_detect_all(string, pattern, negate = FALSE) str_detect_any(string, pattern, negate = FALSE)
string |
A character vector. |
pattern |
A character vector. The patterns to look for. Default is
|
negate |
A flag. If |
A character vector.
str_detect_all("quick brown fox", c("x", "y", "z")) str_detect_all(c(".", "-"), ".") str_detect_all(c(".", "-"), coll(".")) str_detect_all(c(".", "-"), coll("."), negate = TRUE) str_detect_all(c(".", "-"), c(".", ":")) str_detect_all(c(".", "-"), coll(c(".", ":"))) str_detect_all("xyzabc", c("a", "c", "z")) str_detect_all(c("xyzabc", "abcxyz"), c(".b", "^x")) str_detect_any("quick brown fox", c("x", "y", "z")) str_detect_any(c(".", "-"), ".") str_detect_any(c(".", "-"), coll(".")) str_detect_any(c(".", "-"), coll("."), negate = TRUE) str_detect_any(c(".", "-"), c(".", ":")) str_detect_any(c(".", "-"), coll(c(".", ":"))) str_detect_any(c("xyzabc", "abcxyz"), c(".b", "^x"))
str_detect_all("quick brown fox", c("x", "y", "z")) str_detect_all(c(".", "-"), ".") str_detect_all(c(".", "-"), coll(".")) str_detect_all(c(".", "-"), coll("."), negate = TRUE) str_detect_all(c(".", "-"), c(".", ":")) str_detect_all(c(".", "-"), coll(c(".", ":"))) str_detect_all("xyzabc", c("a", "c", "z")) str_detect_all(c("xyzabc", "abcxyz"), c(".b", "^x")) str_detect_any("quick brown fox", c("x", "y", "z")) str_detect_any(c(".", "-"), ".") str_detect_any(c(".", "-"), coll(".")) str_detect_any(c(".", "-"), coll("."), negate = TRUE) str_detect_any(c(".", "-"), c(".", ":")) str_detect_any(c(".", "-"), coll(c(".", ":"))) str_detect_any(c("xyzabc", "abcxyz"), c(".b", "^x"))
If the element does not exist, this function returns the empty string. This
is consistent with stringr::str_sub()
. This function is vectorised over
both arguments.
str_elem(string, index)
str_elem(string, index)
string |
A character vector. |
index |
An integer. Negative indexing is allowed as in
|
A one-character string.
Other single element extractors:
str_elems()
,
str_paste_elems()
str_elem(c("abcd", "xyz"), 3) str_elem("abcd", -2)
str_elem(c("abcd", "xyz"), 3) str_elem("abcd", -2)
Efficiently extract several elements from a string. See str_elem()
for
extracting single elements. This function is vectorized over the first
argument.
str_elems(string, indices, byrow = TRUE)
str_elems(string, indices, byrow = TRUE)
string |
A character vector. |
indices |
A vector of integerish values. Negative indexing is allowed as
in |
byrow |
Should the elements be organised in the matrix with one row per
string ( |
A character matrix.
Other single element extractors:
str_elem()
,
str_paste_elems()
string <- c("abc", "def", "ghi", "vwxyz") str_elems(string, 1:2) str_elems(string, 1:2, byrow = FALSE) str_elems(string, c(1, 2, 3, 4, -1))
string <- c("abc", "def", "ghi", "vwxyz") str_elems(string, 1:2) str_elems(string, 1:2, byrow = FALSE) str_elems(string, c(1, 2, 3, 4, -1))
Extract the non-numeric bits of a string where numbers are optionally defined with decimals, scientific notation and thousand separators.
str_extract_non_numerics( string, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", commas = FALSE )
str_extract_non_numerics( string, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", commas = FALSE )
string |
A string. |
decimals |
Do you want to include the possibility of decimal numbers
( |
leading_decimals |
Do you want to allow a leading decimal point to be the start of a number? |
negs |
Do you want to allow negative numbers? Note that double negatives are not handled here (see the examples). |
sci |
Make the search aware of scientific notation e.g. 2e3 is the same as 2000. |
big_mark |
A character. Allow this character to be used as a thousands
separator. This character will be removed from between digits before they
are converted to numeric. You may specify many at once by pasting them
together e.g. |
commas |
Deprecated. Use |
str_first_non_numeric(...)
is just
str_nth_non_numeric(..., n = 1)
.
str_last_non_numeric(...)
is just
str_nth_non_numeric(..., n = -1)
.
Other non-numeric extractors:
str_nth_non_numeric()
strings <- c( "abc123def456", "abc-0.12def.345", "abc.12e4def34.5e9", "abc1,100def1,230.5", "abc1,100e3,215def4e1,000" ) str_extract_non_numerics(strings) str_extract_non_numerics(strings, decimals = TRUE, leading_decimals = FALSE) str_extract_non_numerics(strings, decimals = TRUE) str_extract_non_numerics(strings, big_mark = ",") str_extract_non_numerics(strings, decimals = TRUE, leading_decimals = TRUE, sci = TRUE ) str_extract_non_numerics(strings, decimals = TRUE, leading_decimals = TRUE, sci = TRUE, big_mark = ",", negs = TRUE ) str_extract_non_numerics(c("22", "1.2.3"), decimals = TRUE)
strings <- c( "abc123def456", "abc-0.12def.345", "abc.12e4def34.5e9", "abc1,100def1,230.5", "abc1,100e3,215def4e1,000" ) str_extract_non_numerics(strings) str_extract_non_numerics(strings, decimals = TRUE, leading_decimals = FALSE) str_extract_non_numerics(strings, decimals = TRUE) str_extract_non_numerics(strings, big_mark = ",") str_extract_non_numerics(strings, decimals = TRUE, leading_decimals = TRUE, sci = TRUE ) str_extract_non_numerics(strings, decimals = TRUE, leading_decimals = TRUE, sci = TRUE, big_mark = ",", negs = TRUE ) str_extract_non_numerics(c("22", "1.2.3"), decimals = TRUE)
Extract the numbers from a string, where decimals, scientific notation and thousand separators are optionally allowed.
str_extract_numbers( string, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE )
str_extract_numbers( string, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE )
string |
A string. |
decimals |
Do you want to include the possibility of decimal numbers
( |
leading_decimals |
Do you want to allow a leading decimal point to be the start of a number? |
negs |
Do you want to allow negative numbers? Note that double negatives are not handled here (see the examples). |
sci |
Make the search aware of scientific notation e.g. 2e3 is the same as 2000. |
big_mark |
A character. Allow this character to be used as a thousands
separator. This character will be removed from between digits before they
are converted to numeric. You may specify many at once by pasting them
together e.g. |
leave_as_string |
Do you want to return the number as a string ( |
commas |
Deprecated. Use |
If any part of a string contains an ambiguous number (e.g. 1.2.3
would be
ambiguous if decimals = TRUE
(but not otherwise)), the value returned for
that string will be NA
and a warning
will be issued.
With scientific notation, it is assumed that the exponent is not a decimal
number e.g. 2e2.4
is unacceptable. Thousand separators, however, are
acceptable in the exponent.
Numbers outside the double precision floating point range (i.e. with absolute
value greater than 1.797693e+308) are read as Inf
(or -Inf
if they begin
with a minus sign). This is what base::as.numeric()
does.
For str_extract_numbers
and str_extract_non_numerics
, a list of
numeric or character vectors, one list element for each element of
string
. For str_nth_number
and str_nth_non_numeric
, a numeric or
character vector the same length as the vector string
.
Other numeric extractors:
str_nth_number()
,
str_nth_number_after_mth()
,
str_nth_number_before_mth()
strings <- c( "abc123def456", "abc-0.12def.345", "abc.12e4def34.5e9", "abc1,100def1,230.5", "abc1,100e3,215def4e1,000" ) str_extract_numbers(strings) str_extract_numbers(strings, decimals = TRUE) str_extract_numbers(strings, decimals = TRUE, leading_decimals = TRUE) str_extract_numbers(strings, big_mark = ",") str_extract_numbers(strings, decimals = TRUE, leading_decimals = TRUE, sci = TRUE ) str_extract_numbers(strings, decimals = TRUE, leading_decimals = TRUE, sci = TRUE, big_mark = ",", negs = TRUE ) str_extract_numbers(strings, decimals = TRUE, leading_decimals = FALSE, sci = FALSE, big_mark = ",", leave_as_string = TRUE ) str_extract_numbers(c("22", "1.2.3"), decimals = TRUE)
strings <- c( "abc123def456", "abc-0.12def.345", "abc.12e4def34.5e9", "abc1,100def1,230.5", "abc1,100e3,215def4e1,000" ) str_extract_numbers(strings) str_extract_numbers(strings, decimals = TRUE) str_extract_numbers(strings, decimals = TRUE, leading_decimals = TRUE) str_extract_numbers(strings, big_mark = ",") str_extract_numbers(strings, decimals = TRUE, leading_decimals = TRUE, sci = TRUE ) str_extract_numbers(strings, decimals = TRUE, leading_decimals = TRUE, sci = TRUE, big_mark = ",", negs = TRUE ) str_extract_numbers(strings, decimals = TRUE, leading_decimals = FALSE, sci = FALSE, big_mark = ",", leave_as_string = TRUE ) str_extract_numbers(c("22", "1.2.3"), decimals = TRUE)
Say you want to ensure a name is fit to be the name of a csv file. Then, if the input doesn't end with ".csv", this function will tack ".csv" onto the end of it. This is vectorized over the first argument.
str_give_ext(string, ext, replace = FALSE)
str_give_ext(string, ext, replace = FALSE)
string |
The intended file name. |
ext |
The intended file extension (with or without the "."). |
replace |
If the file has an extension already, replace it (or append the new extension name)? |
A string: the file name in your intended form.
str_give_ext(c("abc", "abc.csv"), "csv") str_give_ext("abc.csv", "pdf") str_give_ext("abc.csv", "pdf", replace = TRUE)
str_give_ext(c("abc", "abc.csv"), "csv") str_give_ext("abc.csv", "pdf") str_give_ext("abc.csv", "pdf", replace = TRUE)
Give the positions of (
, )
, [
, ]
, \{
, \}
within a string.
str_locate_braces(string)
str_locate_braces(string)
string |
A character vector |
A data frame with 4 columns: string_num
, string
, position
and
brace
. Every extracted brace amount gets its own row in the tibble
detailing the string number and string that it was extracted from, the
position in its string and the brace.
Other locators:
str_locate_nth()
str_locate_braces(c("a{](kkj)})", "ab(]c{}"))
str_locate_braces(c("a{](kkj)})", "ab(]c{}"))
n
th instance of a pattern.The n
th instance of an pattern will cover a series of character
indices. These functions tell you which indices those are. These functions
are vectorised over all arguments.
str_locate_nth(string, pattern, n) str_locate_first(string, pattern) str_locate_last(string, pattern)
str_locate_nth(string, pattern, n) str_locate_first(string, pattern) str_locate_last(string, pattern)
string |
A character vector. |
pattern |
The pattern to look for. The default interpretation is a regular expression, as described in stringi::about_search_regex. To match a without regular expression (i.e. as a human would), use
coll(). For details see |
n |
A vector of integerish values. Must be either length 1 or
have length equal to the length of |
str_locate_first(...)
is just str_locate_nth(..., n = 1)
.
str_locate_last(...)
is just str_locate_nth(..., n = -1)
.
A two-column matrix. The th row of this matrix gives the start
and end indices of the
th instance of
pattern
in the th
element of
string
.
Other locators:
str_locate_braces()
str_locate_nth(c("abcdabcxyz", "abcabc"), "abc", 2) str_locate_nth( c("This old thing.", "That beautiful thing there."), "\\w+", c(2, -2) ) str_locate_nth("abc", "b", c(0, 1, 1, 2)) str_locate_first("abcxyzabc", "abc") str_locate_last("abcxyzabc", "abc")
str_locate_nth(c("abcdabcxyz", "abcabc"), "abc", 2) str_locate_nth( c("This old thing.", "That beautiful thing there."), "\\w+", c(2, -2) ) str_locate_nth("abc", "b", c(0, 1, 1, 2)) str_locate_first("abcxyzabc", "abc") str_locate_last("abcxyzabc", "abc")
Match arg
against a series of candidate choices
. arg
matches an
element of choices
if arg
is a prefix of that element.
str_match_arg( arg, choices = NULL, index = FALSE, several_ok = FALSE, ignore_case = FALSE ) match_arg( arg, choices = NULL, index = FALSE, several_ok = FALSE, ignore_case = FALSE )
str_match_arg( arg, choices = NULL, index = FALSE, several_ok = FALSE, ignore_case = FALSE ) match_arg( arg, choices = NULL, index = FALSE, several_ok = FALSE, ignore_case = FALSE )
arg |
A character vector (of length one unless |
choices |
A character vector of candidate values. |
index |
Return the index of the match rather than the match itself? |
several_ok |
Allow |
ignore_case |
Ignore case while matching. If this is |
ERROR
s are thrown when a match is not made and where the match is
ambiguous. However, sometimes ambiguities are inevitable. Consider the case
where choices = c("ab", "abc")
, then there's no way to choose "ab"
because "ab"
is a prefix for "ab"
and "abc"
. If this is the case, you
need to provide a full match, i.e. using arg = "ab"
will get you "ab"
without an error, however arg = "a"
will throw an ambiguity error.
When choices
is NULL
, the choices
are obtained from a default setting
for the formal argument arg
of the function from which str_match_arg
was
called. This is consistent with base::match.arg()
. See the examples for
details.
When arg
and choices
are identical and several_ok = FALSE
, the first
element of choices
is returned. This is consistent with
base::match.arg()
.
This function inspired by RSAGA::match.arg.ext()
. Its behaviour is almost
identical (the difference is that RSAGA::match.arg.ext(..., ignore.case = TRUE)
always returns in all lower case; strex::match_arg(..., ignore_case = TRUE)
ignores case while matching but returns the element of choices
in
its original case). RSAGA
is a heavy package to depend upon so
strex::match_arg()
is handy for package developers.
This function is designed to be used inside of other functions. It's fine to use it for other purposes, but the error messages might be a bit weird.
choices <- c("Apples", "Pears", "Bananas", "Oranges") match_arg("A", choices) match_arg("B", choices, index = TRUE) match_arg(c("a", "b"), choices, several_ok = TRUE, ignore_case = TRUE) match_arg(c("b", "a"), choices, ignore_case = TRUE, index = TRUE, several_ok = TRUE ) myword <- function(w = c("abacus", "baseball", "candy")) { w <- match_arg(w) w } myword("b") myword() myword <- function(w = c("abacus", "baseball", "candy")) { w <- match_arg(w, several_ok = TRUE) w } myword("c") myword()
choices <- c("Apples", "Pears", "Bananas", "Oranges") match_arg("A", choices) match_arg("B", choices, index = TRUE) match_arg(c("a", "b"), choices, several_ok = TRUE, ignore_case = TRUE) match_arg(c("b", "a"), choices, ignore_case = TRUE, index = TRUE, several_ok = TRUE ) myword <- function(w = c("abacus", "baseball", "candy")) { w <- match_arg(w) w } myword("b") myword() myword <- function(w = c("abacus", "baseball", "candy")) { w <- match_arg(w, several_ok = TRUE) w } myword("c") myword()
n
th non-numeric substring from a string.Extract the n
th non-numeric bit of a string where numbers are optionally
defined with decimals, scientific notation and thousand separators.
str_first_non_numeric(...)
is just
str_nth_non_numeric(..., n = 1)
.
str_last_non_numeric(...)
is
just str_nth_non_numeric(..., n = -1)
.
str_nth_non_numeric( string, n, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", commas = FALSE ) str_first_non_numeric( string, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", commas = FALSE ) str_last_non_numeric( string, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "" )
str_nth_non_numeric( string, n, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", commas = FALSE ) str_first_non_numeric( string, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", commas = FALSE ) str_last_non_numeric( string, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "" )
string |
A string. |
n |
A vector of integerish values. Must be either length 1 or
have length equal to the length of |
decimals |
Do you want to include the possibility of decimal numbers
( |
leading_decimals |
Do you want to allow a leading decimal point to be the start of a number? |
negs |
Do you want to allow negative numbers? Note that double negatives are not handled here (see the examples). |
sci |
Make the search aware of scientific notation e.g. 2e3 is the same as 2000. |
big_mark |
A character. Allow this character to be used as a thousands
separator. This character will be removed from between digits before they
are converted to numeric. You may specify many at once by pasting them
together e.g. |
commas |
Deprecated. Use |
Other non-numeric extractors:
str_extract_non_numerics()
strings <- c( "abc123def456", "abc-0.12def.345", "abc.12e4def34.5e9", "abc1,100def1,230.5", "abc1,100e3,215def4e1,000" ) str_nth_non_numeric(strings, n = 2) str_nth_non_numeric(strings, n = -2, decimals = TRUE) str_first_non_numeric(strings, decimals = TRUE, leading_decimals = FALSE) str_last_non_numeric(strings, big_mark = ",") str_nth_non_numeric(strings, n = 1, decimals = TRUE, leading_decimals = TRUE, sci = TRUE ) str_first_non_numeric(strings, decimals = TRUE, leading_decimals = TRUE, sci = TRUE, big_mark = ",", negs = TRUE ) str_first_non_numeric(c("22", "1.2.3"), decimals = TRUE)
strings <- c( "abc123def456", "abc-0.12def.345", "abc.12e4def34.5e9", "abc1,100def1,230.5", "abc1,100e3,215def4e1,000" ) str_nth_non_numeric(strings, n = 2) str_nth_non_numeric(strings, n = -2, decimals = TRUE) str_first_non_numeric(strings, decimals = TRUE, leading_decimals = FALSE) str_last_non_numeric(strings, big_mark = ",") str_nth_non_numeric(strings, n = 1, decimals = TRUE, leading_decimals = TRUE, sci = TRUE ) str_first_non_numeric(strings, decimals = TRUE, leading_decimals = TRUE, sci = TRUE, big_mark = ",", negs = TRUE ) str_first_non_numeric(c("22", "1.2.3"), decimals = TRUE)
n
th number from a string.Extract the n
th number from a string, where decimals, scientific notation
and thousand separators are optionally allowed.
str_nth_number( string, n, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number( string, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number( string, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE )
str_nth_number( string, n, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number( string, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number( string, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE )
string |
A string. |
n |
A vector of integerish values. Must be either length 1 or
have length equal to the length of |
decimals |
Do you want to include the possibility of decimal numbers
( |
leading_decimals |
Do you want to allow a leading decimal point to be the start of a number? |
negs |
Do you want to allow negative numbers? Note that double negatives are not handled here (see the examples). |
sci |
Make the search aware of scientific notation e.g. 2e3 is the same as 2000. |
big_mark |
A character. Allow this character to be used as a thousands
separator. This character will be removed from between digits before they
are converted to numeric. You may specify many at once by pasting them
together e.g. |
leave_as_string |
Do you want to return the number as a string ( |
commas |
Deprecated. Use |
str_first_number(...)
is just str_nth_number(..., n = 1)
.
str_last_number(...)
is just str_nth_number(..., n = -1)
.
For a detailed explanation of the number extraction, see
str_extract_numbers()
.
A numeric vector (or a character vector if leave_as_string = TRUE
).
Other numeric extractors:
str_extract_numbers()
,
str_nth_number_after_mth()
,
str_nth_number_before_mth()
strings <- c( "abc123def456", "abc-0.12def.345", "abc.12e4def34.5e9", "abc1,100def1,230.5", "abc1,100e3,215def4e1,000" ) str_nth_number(strings, n = 2) str_nth_number(strings, n = -2, decimals = TRUE) str_first_number(strings, decimals = TRUE, leading_decimals = TRUE) str_last_number(strings, big_mark = ",") str_nth_number(strings, n = 1, decimals = TRUE, leading_decimals = TRUE, sci = TRUE ) str_first_number(strings, decimals = TRUE, leading_decimals = TRUE, sci = TRUE, big_mark = ",", negs = TRUE ) str_last_number(strings, decimals = TRUE, leading_decimals = FALSE, sci = FALSE, big_mark = ",", negs = TRUE, leave_as_string = TRUE ) str_first_number(c("22", "1.2.3"), decimals = TRUE)
strings <- c( "abc123def456", "abc-0.12def.345", "abc.12e4def34.5e9", "abc1,100def1,230.5", "abc1,100e3,215def4e1,000" ) str_nth_number(strings, n = 2) str_nth_number(strings, n = -2, decimals = TRUE) str_first_number(strings, decimals = TRUE, leading_decimals = TRUE) str_last_number(strings, big_mark = ",") str_nth_number(strings, n = 1, decimals = TRUE, leading_decimals = TRUE, sci = TRUE ) str_first_number(strings, decimals = TRUE, leading_decimals = TRUE, sci = TRUE, big_mark = ",", negs = TRUE ) str_last_number(strings, decimals = TRUE, leading_decimals = FALSE, sci = FALSE, big_mark = ",", negs = TRUE, leave_as_string = TRUE ) str_first_number(c("22", "1.2.3"), decimals = TRUE)
n
th number after the m
th occurrence of a pattern.Given a string, a pattern and natural numbers n
and m
, find the n
th
number after the m
th occurrence of the pattern.
str_nth_number_after_mth( string, pattern, n, m, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_nth_number_after_first( string, pattern, n, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_nth_number_after_last( string, pattern, n, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number_after_mth( string, pattern, m, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number_after_mth( string, pattern, m, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number_after_first( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number_after_last( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number_after_first( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number_after_last( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE )
str_nth_number_after_mth( string, pattern, n, m, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_nth_number_after_first( string, pattern, n, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_nth_number_after_last( string, pattern, n, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number_after_mth( string, pattern, m, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number_after_mth( string, pattern, m, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number_after_first( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number_after_last( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number_after_first( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number_after_last( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE )
string |
A character vector. |
pattern |
The pattern to look for. The default interpretation is a regular expression, as described in stringi::about_search_regex. To match a without regular expression (i.e. as a human would), use
coll(). For details see |
n , m
|
Vectors of integerish values. Must be either length 1 or have
length equal to the length of |
decimals |
Do you want to include the possibility of decimal numbers
( |
leading_decimals |
Do you want to allow a leading decimal point to be the start of a number? |
negs |
Do you want to allow negative numbers? Note that double negatives are not handled here (see the examples). |
sci |
Make the search aware of scientific notation e.g. 2e3 is the same as 2000. |
big_mark |
A character. Allow this character to be used as a thousands
separator. This character will be removed from between digits before they
are converted to numeric. You may specify many at once by pasting them
together e.g. |
leave_as_string |
Do you want to return the number as a string ( |
commas |
Deprecated. Use |
A numeric or character vector.
Other numeric extractors:
str_extract_numbers()
,
str_nth_number()
,
str_nth_number_before_mth()
string <- c( "abc1abc2abc3abc4abc5abc6abc7abc8abc9", "abc1def2ghi3abc4def5ghi6abc7def8ghi9" ) str_nth_number_after_mth(string, "abc", 1, 3) str_nth_number_after_mth(string, "abc", 2, 3) str_nth_number_after_first(string, "abc", 2) str_nth_number_after_last(string, "abc", -1) str_first_number_after_mth(string, "abc", 2) str_last_number_after_mth(string, "abc", 1) str_first_number_after_first(string, "abc") str_first_number_after_last(string, "abc") str_last_number_after_first(string, "abc") str_last_number_after_last(string, "abc")
string <- c( "abc1abc2abc3abc4abc5abc6abc7abc8abc9", "abc1def2ghi3abc4def5ghi6abc7def8ghi9" ) str_nth_number_after_mth(string, "abc", 1, 3) str_nth_number_after_mth(string, "abc", 2, 3) str_nth_number_after_first(string, "abc", 2) str_nth_number_after_last(string, "abc", -1) str_first_number_after_mth(string, "abc", 2) str_last_number_after_mth(string, "abc", 1) str_first_number_after_first(string, "abc") str_first_number_after_last(string, "abc") str_last_number_after_first(string, "abc") str_last_number_after_last(string, "abc")
n
th number before the m
th occurrence of a pattern.Given a string, a pattern and natural numbers n
and m
, find the n
th
number that comes before the m
th occurrence of the pattern.
str_nth_number_before_mth( string, pattern, n, m, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_nth_number_before_first( string, pattern, n, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_nth_number_before_last( string, pattern, n, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number_before_mth( string, pattern, m, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number_before_mth( string, pattern, m, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number_before_first( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number_before_last( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number_before_first( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number_before_last( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE )
str_nth_number_before_mth( string, pattern, n, m, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_nth_number_before_first( string, pattern, n, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_nth_number_before_last( string, pattern, n, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number_before_mth( string, pattern, m, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number_before_mth( string, pattern, m, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number_before_first( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_first_number_before_last( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number_before_first( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE ) str_last_number_before_last( string, pattern, decimals = FALSE, leading_decimals = decimals, negs = FALSE, sci = FALSE, big_mark = "", leave_as_string = FALSE, commas = FALSE )
string |
A character vector. |
pattern |
The pattern to look for. The default interpretation is a regular expression, as described in stringi::about_search_regex. To match a without regular expression (i.e. as a human would), use
coll(). For details see |
n , m
|
Vectors of integerish values. Must be either length 1 or have
length equal to the length of |
decimals |
Do you want to include the possibility of decimal numbers
( |
leading_decimals |
Do you want to allow a leading decimal point to be the start of a number? |
negs |
Do you want to allow negative numbers? Note that double negatives are not handled here (see the examples). |
sci |
Make the search aware of scientific notation e.g. 2e3 is the same as 2000. |
big_mark |
A character. Allow this character to be used as a thousands
separator. This character will be removed from between digits before they
are converted to numeric. You may specify many at once by pasting them
together e.g. |
leave_as_string |
Do you want to return the number as a string ( |
commas |
Deprecated. Use |
A numeric or character vector.
Other numeric extractors:
str_extract_numbers()
,
str_nth_number()
,
str_nth_number_after_mth()
string <- c( "abc1abc2abc3abc4def5abc6abc7abc8abc9", "abc1def2ghi3abc4def5ghi6abc7def8ghi9" ) str_nth_number_before_mth(string, "def", 1, 1) str_nth_number_before_mth(string, "abc", 2, 3) str_nth_number_before_first(string, "def", 2) str_nth_number_before_last(string, "def", -1) str_first_number_before_mth(string, "abc", 2) str_last_number_before_mth(string, "def", 1) str_first_number_before_first(string, "def") str_first_number_before_last(string, "def") str_last_number_before_first(string, "def") str_last_number_before_last(string, "def")
string <- c( "abc1abc2abc3abc4def5abc6abc7abc8abc9", "abc1def2ghi3abc4def5ghi6abc7def8ghi9" ) str_nth_number_before_mth(string, "def", 1, 1) str_nth_number_before_mth(string, "abc", 2, 3) str_nth_number_before_first(string, "def", 2) str_nth_number_before_last(string, "def", -1) str_first_number_before_mth(string, "abc", 2) str_last_number_before_mth(string, "def", 1) str_first_number_before_first(string, "def") str_first_number_before_last(string, "def") str_last_number_before_first(string, "def") str_last_number_before_last(string, "def")
This is a quick way around doing a call to str_elems()
followed by a call
of apply(..., paste)
.
str_paste_elems(string, indices, sep = "")
str_paste_elems(string, indices, sep = "")
string |
A character vector. |
indices |
A vector of integerish values. Negative indexing is allowed as
in |
sep |
A string. The separator for pasting |
Elements that don't exist e.g. element 5 of "abc"
are ignored.
A character vector.
Other single element extractors:
str_elem()
,
str_elems()
string <- c("abc", "def", "ghi", "vwxyz") str_paste_elems(string, 1:2) str_paste_elems(string, c(1, 2, 3, 4, -1)) str_paste_elems("abc", c(1, 5, 55, 43, 3))
string <- c("abc", "def", "ghi", "vwxyz") str_paste_elems(string, 1:2) str_paste_elems(string, c(1, 2, 3, 4, -1)) str_paste_elems("abc", c(1, 5, 55, 43, 3))
If any parts of a string are quoted (between quotation marks), remove those parts of the string, including the quotes. Run the examples and you'll know exactly how this function works.
str_remove_quoted(string)
str_remove_quoted(string)
string |
A character vector. |
A character vector.
Other removers:
str_singleize()
,
str_trim_anything()
string <- "\"abc\"67a\'dk\'f" cat(string) str_remove_quoted(string)
string <- "\"abc\"67a\'dk\'f" cat(string) str_remove_quoted(string)
If a string contains a given pattern duplicated back-to-back a number of times, remove that duplication, leaving the pattern appearing once in that position (works if the pattern is duplicated in different parts of a string, removing all instances of duplication). This is vectorized over string and pattern.
str_singleize(string, pattern)
str_singleize(string, pattern)
string |
A character vector. |
pattern |
The pattern to look for. The default interpretation is a regular expression, as described in stringi::about_search_regex. To match a without regular expression (i.e. as a human would), use
coll(). For details see |
A character vector.
Other removers:
str_remove_quoted()
,
str_trim_anything()
str_singleize("abc//def", "/") str_singleize("abababcabab", "ab") str_singleize(c("abab", "cdcd"), "cd") str_singleize(c("abab", "cdcd"), c("ab", "cd"))
str_singleize("abc//def", "/") str_singleize("abababcabab", "ab") str_singleize(c("abab", "cdcd"), "cd") str_singleize(c("abab", "cdcd"), c("ab", "cd"))
Break a string wherever you go from a numeric character to a non-numeric or
vice-versa. Keep the whole string, just split it up. Vectorised over
string
.
str_split_by_numbers( string, decimals = FALSE, leading_decimals = FALSE, negs = FALSE, sci = FALSE, big_mark = "", commas = FALSE )
str_split_by_numbers( string, decimals = FALSE, leading_decimals = FALSE, negs = FALSE, sci = FALSE, big_mark = "", commas = FALSE )
string |
A string. |
decimals |
Do you want to include the possibility of decimal numbers
( |
leading_decimals |
Do you want to allow a leading decimal point to be the start of a number? |
negs |
Do you want to allow negative numbers? Note that double negatives are not handled here (see the examples). |
sci |
Make the search aware of scientific notation e.g. 2e3 is the same as 2000. |
big_mark |
A character. Allow this character to be used as a thousands
separator. This character will be removed from between digits before they
are converted to numeric. You may specify many at once by pasting them
together e.g. |
commas |
Deprecated. Use |
A list of character vectors.
Other splitters:
str_split_camel_case()
str_split_by_numbers(c("abc123def456.789gh", "a1b2c344")) str_split_by_numbers("abc123def456.789gh", decimals = TRUE) str_split_by_numbers(c("22", "1.2.3"), decimals = TRUE)
str_split_by_numbers(c("abc123def456.789gh", "a1b2c344")) str_split_by_numbers("abc123def456.789gh", decimals = TRUE) str_split_by_numbers(c("22", "1.2.3"), decimals = TRUE)
Vectorized over string
.
str_split_camel_case(string, lower = FALSE)
str_split_camel_case(string, lower = FALSE)
string |
A character vector. |
lower |
Do you want the output to be all lower case (or as is)? |
A list of character vectors, one list element for each element of
string
.
Adapted from Ramnath Vaidyanathan's answer at http://stackoverflow.com/questions/8406974/splitting-camelcase-in-r.
Other splitters:
str_split_by_numbers()
str_split_camel_case(c("RoryNolan", "NaomiFlagg", "DepartmentOfSillyHats")) str_split_camel_case(c("RoryNolan", "NaomiFlagg", "DepartmentOfSillyHats", lower = TRUE ))
str_split_camel_case(c("RoryNolan", "NaomiFlagg", "DepartmentOfSillyHats")) str_split_camel_case(c("RoryNolan", "NaomiFlagg", "DepartmentOfSillyHats", lower = TRUE ))
Go from a string to a vector whose th element is the
th
character in the string.
str_to_vec(string)
str_to_vec(string)
string |
A character vector. |
A character vector.
str_to_vec("abcdef")
str_to_vec("abcdef")
The stringi
and stringr
packages let you trim whitespace, but
what if you want to trim something else from either (or both) side(s) of a
string? This function lets you select which pattern to trim and from which
side(s).
str_trim_anything(string, pattern, side = "both")
str_trim_anything(string, pattern, side = "both")
string |
A character vector. |
pattern |
The pattern to look for. The default interpretation is a regular expression, as described in stringi::about_search_regex. To match a without regular expression (i.e. as a human would), use
coll(). For details see |
side |
Which side do you want to trim from? |
A string.
Other removers:
str_remove_quoted()
,
str_singleize()
str_trim_anything("..abcd.", ".", "left") str_trim_anything("..abcd.", coll("."), "left") str_trim_anything("-ghi--", "-", "both") str_trim_anything("-ghi--", "-") str_trim_anything("-ghi--", "-", "right") str_trim_anything("-ghi--", "--") str_trim_anything("-ghi--", "i-+")
str_trim_anything("..abcd.", ".", "left") str_trim_anything("..abcd.", coll("."), "left") str_trim_anything("-ghi--", "-", "both") str_trim_anything("-ghi--", "-") str_trim_anything("-ghi--", "-", "right") str_trim_anything("-ghi--", "--") str_trim_anything("-ghi--", "i-+")
strex
: extra string manipulation functionsThere are some things that I wish were easier with the stringr
or stringi
packages. The foremost of these is the extraction of numbers from strings.
stringr
makes you figure out the regex for yourself; strex
takes care of
this for you. There are many more useful functionalities in strex
. In
particular, there's a match_arg()
function which is more flexible than the
base match.arg()
. Contributions to this package are encouraged: it is
intended as a miscellany of string manipulation functions which cannot be
found in stringi
or stringr
.
Maintainer: Rory Nolan [email protected] (ORCID)
Rory Nolan and Sergi Padilla-Parra (2017). filesstrings: An R package for file and string manipulation. The Journal of Open Source Software, 2(14). doi:10.21105/joss.00260.
Useful links:
Report bugs at https://github.com/rorynolan/strex/issues