formula {base} | R Documentation |
The generic function formula
and its specific methods provide a
way of extracting formulae which have been included in other objects.
as.formula
is almost identical, additionally preserving
attributes when object
already inherits from "formula"
.
y ~ model formula(object) formula.default(anything) formula.formula(formula.obj) formula.terms(terms.obj) formula.data.frame(df) as.formula(object) I(name)
The models fit by, e.g., the lm
and glm
functions
are specified in a compact symbolic form.
The ~
operator is basic in the formation of such models.
An expression of the form y ~ model
is interpreted
as a specification that the response y
is modelled
by a linear predictor specified symbolically by model
.
Such a model consists of a series of terms separated
by +
operators.
The terms themselves consist of variable and factor
names separated by :
operators.
Such a term is interpreted as the interaction of
all the variables and factors appearing in the term.
In addition to +
and :
, a number of other operators are
useful in model formulae. The *
operator denotes factor
crossing: a*b
interpreted as a+b+a:b
. The ^
operator indicates crossing to the specified degree. For example
(a+b+c)^2
is identical to (a+b+c)*(a+b+c)
which in turn
expands to a formula containing the main effects for a
,
b
and c
together with their second-order interactions.
The %in%
operator indicates that the terms on its left are
nested within those on the right. For example a+b%in%a
expands to the formula a+a:b
. The -
operator removes
the specified terms, so that (a+b+c)^2 - a:b
is identical to
a + b + c + b:c + a:c
. It can also used to remove the intercept
term: y~x - 1
is a line through the origin. A model with no
intercept can be also specified as y~x + 0
or 0 + y~x
.
While formulae usually involve just variable and factor
names, they can also involve arithmetic expressions.
The formula log(y) ~ a + log(x)
is quite legal.
When such arithmetic expressions involve
operators which are also used symbolically
in model formulae, there can be confusion between
arithmetic and symbolic operator use.
To avoid this confusion, the function I()
can be used to bracket those portions of a model
formula where the operators are used in their
arithmetic sense. For example, in the formula
y ~ a + I(b+c)
, the term b+c
is to be
interpreted as the sum of b
and c
.
All the functions above produce an object
of class formula
which contains a symbolic model formula.
class(fo <- y ~ x1*x2) # "formula" fo typeof(fo)# R internal : "language" terms(fo) ## Create a formula for a model with a large number of variables: xnam <- paste("x", 1:25, sep="") (fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))