factor {base}R Documentation

Factors

Description

The function factor is used to encode a vector as a factor (the names category and enumerated type are also used for factors). If ordered is TRUE, the factor levels are assumed to be ordered. For compatibility with S there is also a function ordered.

is.factor, is.ordered, as.factor and as.ordered are the membership and coercion functions for these classes.

Usage

factor(x, levels = sort(unique(x), na.last = TRUE), labels,
       exclude = NA, ordered = FALSE)
ordered(x, ...)

is.factor(x)
is.ordered(x)

as.factor(x)
as.ordered(x)

Arguments

x a vector of data, usually taking a small number of distinct values
levels an optional vector of the values that x might have taken. The default is the set of values taken by x, sorted into increasing order.
labels either an optional vector of labels for the levels (in the same order as levels after removing those in exclude), or a character string of length 1.
exclude a vector of values to be excluded when forming the set of levels. This should be of the same type as x, and will be coerced if necessary.
ordered logical flag to determine if the levels should be regraded as ordered (in the order given).
... (in ordered(.)): any of the above, apart from ordered itself.

Details

The type of the vector x is not restricted.

Ordered factors differ from factors only in their class, but methods and the model-fitting functions treat the two classes quite differently.

The encoding of the vector happens as follows. First all the values in exclude are removed from levels. If x[i] equals levels[j], then the i-th element of the result is j. If no match is found for x[i] in levels, then the i-th element of the result is set to NA.

Normally the `levels' used as an attribute of the result are the reduced set of levels after removing those in exclude, but this can be altered by supplying labels. This should either be a set of new labels for the levels, or a character string, in which case the levels are that character string with a sequence number appended.

factor(x) applied to a factor is a no-operation unless there are unused levels: in that case, a factor with the reduced level set is returned. If exclude is used it should also be a factor with the same level set as x or a set of codes for the levels to be excluded.

The codes of a factor may contain NA. For a numeric x, set exclude=NULL to make NA an extra level ("NA"), by default the last level.

Value

factor returns an object of class "factor" which has a set of numeric codes the length of x with a "levels" attribute of mode character. If ordered is true (or ordered is used) the result has class c("ordered", "factor").

is.factor returns TRUE or FALSE depending on whether its argument is of type factor or not. Correspondingly, is.ordered returns TRUE when its argument is ordered and FALSE otherwise.

as.factor coerces its argument to a factor. It is an abbreviated form of factor.

as.ordered(x) returns x if this is ordered, and ordered(x) otherwise.

Warning

The interpretation of a factor depends on both the codes and the "levels" attribute. Be careful only to compare factors with the same set of levels (in the same order). In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion.

The levels of a factor are by default sorted, but the sort order may well depend on the locale at the time of creation, and should not be assumed to be ASCII.

See Also

gl for construction of ``balanced'' factors and C for factors with specified contrasts. levels and nlevels for accessing the levels, and codes to get integer codes.

Examples

ff <- factor(substring("statistics", 1:10, 1:10), levels=letters)
ff
codes(ff)
factor(ff)# drops the levels that do not occur
factor(factor(letters[7:10])[2:3]) # exercise indexing and reduction
factor(letters[1:20], label="letter")

class(ordered(4:1))# "ordered", inheriting from "factor"

[Package Contents]