grep {base} | R Documentation |
grep
searches for matches to pattern
(its first
argument) within the vector x
of character strings (second
argument). regexpr
does too, but returns more detail in a
different format.
sub
and gsub
perform replacement of matches
determined by regular expression matching.
grep(pattern, x, ignore.case=FALSE, extended=TRUE, value=FALSE) sub(pattern, replacement, x, ignore.case=FALSE, extended=TRUE) gsub(pattern, replacement, x, ignore.case=FALSE, extended=TRUE) regexpr(pattern, text, extended=TRUE)
pattern |
character string containing a regular expression
to be matched in the vector of character string vec . |
x, text |
a vector of character strings where matches are sought. |
ignore.case |
if FALSE , the pattern matching is
case sensitive and if TRUE , case is ignored during matching. |
extended |
if TRUE , extended regular expression matching
is used, and if FALSE basic regular expressions are used. |
value |
if FALSE , a vector containing the (integer ) indices
of the matches determined by grep is returned,
and if TRUE , a vector containing the matching
elements themselves is returned. |
replacement |
a replacement for matched pattern in
sub and gsub . |
The two *sub
functions differ only in that sub
replaces only
the first occurrence of a pattern
whereas gsub
replaces
all occurrences.
The regular expressions used are those specified by POSIX 1003.2,
either extended or basic, depending on the value of the
extended
argument.
For gsub
a vector giving either the indices of the elements
of x
that yielded a match or, if value
is TRUE
,
the matched elements.
For sub
and gsub
a character vector of the same
length as the original.
For regexpr
an integer vector of the same length as
text
giving the starting position of the first match, or -1
if there is none, with attribute "match.length"
giving the
length of the matched text (or -1 for no match).
charmatch
, pmatch
, match
.
apropos
uses regexps and has nice examples.
grep("[a-z]", letters) txt <- c("arm","foot","lefroo", "bafoobar") if(any(i <- grep("foo",txt))) cat("`foo' appears at least once in\n\t",txt,"\n") i # 2 and 4 txt[i] ## Double all 'a' or 'b's; "\" must be escaped, i.e. `doubled' gsub("([ab])", "\\1_\\1_", "abc and ABC") txt <- c("The", "licenses", "for", "most", "software", "are", "designed", "to", "take", "away", "your", "freedom", "to", "share", "and", "change", "it.", "", "By", "contrast,", "the", "GNU", "General", "Public", "License", "is", "intended", "to", "guarantee", "your", "freedom", "to", "share", "and", "change", "free", "software", "--", "to", "make", "sure", "the", "software", "is", "free", "for", "all", "its", "users") ( i <- grep("[gu]", txt) ) # indices all( txt[i] == grep("[gu]", txt, value = TRUE) ) (ot <- sub("[b-e]",".", txt)) txt[ot != gsub("[b-e]",".", txt)]#- gsub does "global" substitution txt[gsub("g","#", txt) != gsub("g","#", txt, ignore.case = TRUE)] # the "G" words regexpr("en", txt)