R: Pattern Matching and Replacement

grep {base}

R Documentation

Pattern Matching and Replacement

Description

grep searches for matches to pattern (its first argument) within the vector x of character strings (second argument). regexpr does too, but returns more detail in a different format.

sub and gsub perform replacement of matches determined by regular expression matching.

Usage

grep(pattern, x, ignore.case=FALSE, extended=TRUE, value=FALSE)
sub(pattern, replacement, x,
        ignore.case=FALSE, extended=TRUE)
gsub(pattern, replacement, x,
        ignore.case=FALSE, extended=TRUE)
regexpr(pattern, text,  extended=TRUE)

Arguments

`pattern`	character string containing a regular expression to be matched in the vector of character string `vec`.
`x, text`	a vector of character strings where matches are sought.
`ignore.case`	if `FALSE`, the pattern matching is case sensitive and if `TRUE`, case is ignored during matching.
`extended`	if `TRUE`, extended regular expression matching is used, and if `FALSE` basic regular expressions are used.
`value`	if `FALSE`, a vector containing the (`integer`) indices of the matches determined by `grep` is returned, and if `TRUE`, a vector containing the matching elements themselves is returned.
`replacement`	a replacement for matched pattern in `sub` and `gsub`.

Details

The two *sub functions differ only in that sub replaces only the first occurrence of a pattern whereas gsub replaces all occurrences.

The regular expressions used are those specified by POSIX 1003.2, either extended or basic, depending on the value of the extended argument.

Value

For gsub a vector giving either the indices of the elements of x that yielded a match or, if value is TRUE, the matched elements.

For sub and gsub a character vector of the same length as the original.

For regexpr an integer vector of the same length as text giving the starting position of the first match, or -1 if there is none, with attribute "match.length" giving the length of the matched text (or -1 for no match).

Examples

grep("[a-z]", letters)

txt <- c("arm","foot","lefroo", "bafoobar")
if(any(i <- grep("foo",txt)))
   cat("`foo' appears at least once in\n\t",txt,"\n")
i # 2 and 4
txt[i]

## Double all 'a' or 'b's;  "\" must be escaped, i.e. `doubled'
gsub("([ab])", "\\1_\\1_", "abc and ABC")

txt <- c("The", "licenses", "for", "most", "software", "are",
  "designed", "to", "take", "away", "your", "freedom",
  "to", "share", "and", "change", "it.",
   "", "By", "contrast,", "the", "GNU", "General", "Public", "License",
   "is", "intended", "to", "guarantee", "your", "freedom", "to",
   "share", "and", "change", "free", "software", "--",
   "to", "make", "sure", "the", "software", "is",
   "free", "for", "all", "its", "users")
( i <- grep("[gu]", txt) ) # indices
all( txt[i] == grep("[gu]", txt, value = TRUE) )
(ot <- sub("[b-e]",".", txt))
txt[ot != gsub("[b-e]",".", txt)]#- gsub does "global" substitution

txt[gsub("g","#", txt) !=
    gsub("g","#", txt, ignore.case = TRUE)] # the "G" words

regexpr("en", txt)

[Package Contents]