Title: | Multivariate Adaptive Regression Splines |
---|---|
Description: | Build regression models using the techniques in Friedman's papers "Fast MARS" and "Multivariate Adaptive Regression Splines" <doi:10.1214/aos/1176347963>. (The term "MARS" is trademarked and thus not used in the name of the package.) |
Authors: | Stephen Milborrow [aut, cre], Trevor Hastie [aut], Rob Tibshirani [aut], Alan Miller [ctb], Thomas Lumley [ctb] |
Maintainer: | Stephen Milborrow <[email protected]> |
License: | GPL-3 |
Version: | 5.3.4 |
Built: | 2024-11-05 03:41:15 UTC |
Source: | https://github.com/cran/earth |
Contrasts function for factors in the earth
response.
For internal use by earth.
contr.earth.response(x, base, contrasts)
contr.earth.response(x, base, contrasts)
x |
a factor |
base |
unused |
contrasts |
unused |
Returns a diagonal matrix.
An example for a 3 level factor with levels A
, B
, and C
:
A B C A 1 0 0 B 0 1 0 C 0 0 1
Earth uses this function internally.
You shouldn't need it.
It is made publicly available only because it seems that is necessary
for model.matrix
.
Build a regression model using the techniques in Friedman's papers "Multivariate Adaptive Regression Splines" and "Fast MARS".
See the package vignette “Notes on the earth package”.
## S3 method for class 'formula' earth(formula = stop("no 'formula' argument"), data = NULL, weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, nprune = NULL, nfold=0, ncross=1, stratify=TRUE, varmod.method = "none", varmod.exponent = 1, varmod.conv = 1, varmod.clamp = .1, varmod.minspan = -3, Scale.y = NULL, ...) ## Default S3 method: earth(x = stop("no 'x' argument"), y = stop("no 'y' argument"), weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, nprune = NULL, nfold=0, ncross=1, stratify=TRUE, varmod.method = "none", varmod.exponent = 1, varmod.conv = 1, varmod.clamp = .1, varmod.minspan = -3, Scale.y = NULL, ...) ## S3 method for class 'fit' earth(x = stop("no 'x' argument"), y = stop("no 'y' argument"), weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, offset = NULL, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, penalty = if(degree > 1) 3 else 2, nk = min(200, max(20, 2 * ncol(x))) + 1, thresh = 0.001, minspan = 0, endspan = 0, newvar.penalty = 0, fast.k = 20, fast.beta = 1, linpreds = FALSE, allowed = NULL, nprune = NULL, Object = NULL, Scale.y = NULL, Adjust.endspan = 2, Auto.linpreds = TRUE, Force.weights = FALSE, Use.beta.cache = TRUE, Force.xtx.prune = FALSE, Get.leverages = NROW(x) < 1e5, Exhaustive.tol = 1e-10, ...)
## S3 method for class 'formula' earth(formula = stop("no 'formula' argument"), data = NULL, weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, nprune = NULL, nfold=0, ncross=1, stratify=TRUE, varmod.method = "none", varmod.exponent = 1, varmod.conv = 1, varmod.clamp = .1, varmod.minspan = -3, Scale.y = NULL, ...) ## Default S3 method: earth(x = stop("no 'x' argument"), y = stop("no 'y' argument"), weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, nprune = NULL, nfold=0, ncross=1, stratify=TRUE, varmod.method = "none", varmod.exponent = 1, varmod.conv = 1, varmod.clamp = .1, varmod.minspan = -3, Scale.y = NULL, ...) ## S3 method for class 'fit' earth(x = stop("no 'x' argument"), y = stop("no 'y' argument"), weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, offset = NULL, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, penalty = if(degree > 1) 3 else 2, nk = min(200, max(20, 2 * ncol(x))) + 1, thresh = 0.001, minspan = 0, endspan = 0, newvar.penalty = 0, fast.k = 20, fast.beta = 1, linpreds = FALSE, allowed = NULL, nprune = NULL, Object = NULL, Scale.y = NULL, Adjust.endspan = 2, Auto.linpreds = TRUE, Force.weights = FALSE, Use.beta.cache = TRUE, Force.xtx.prune = FALSE, Get.leverages = NROW(x) < 1e5, Exhaustive.tol = 1e-10, ...)
To start off, look at the arguments
formula
,
data
,
x
,
y
,
nk
,
degree
, and
trace
.
If the response is binary or a factor, consider using the glm
argument.
For cross validation, use the nfold
argument.
For prediction intervals, use the varmod.method
argument.
Most users will find that the above arguments are all they need,
plus in some cases keepxy
and nprune
.
Unless you are a knowledgeable user, it's best not subvert the
standard algorithm by toying with tuning parameters such as thresh
,
penalty
, and endspan
.
formula |
Model formula. |
data |
Data frame for |
x |
Matrix or dataframe containing the independent variables. |
y |
Vector containing the response variable, or, in the case of multiple responses, a matrix or dataframe whose columns are the values for each response. |
subset |
Index vector specifying which cases to use, i.e., which rows in |
weights |
Case weights.
Default is NULL, meaning no case weights.
If specified, |
wp |
Response weights.
Default is NULL, meaning no response weights.
If specified, |
na.action |
NA action. Default is |
offset |
Offset term passed from the formula in |
keepxy |
Default is |
trace |
Trace |
glm |
NULL (default) or a list of arguments to pass on to |
degree |
Maximum degree of interaction (Friedman's |
penalty |
Generalized Cross Validation (GCV) penalty per knot.
Default is |
nk |
Maximum number of model terms before pruning, i.e., the
maximum number of terms created by the forward pass.
Includes the intercept. |
thresh |
Forward stepping threshold.
Default is |
minspan |
Minimum number of observations between knots.
(This increases resistance to runs of correlated noise in the input data.) |
endspan |
Minimum number of observations before the first and after the final knot. |
newvar.penalty |
Penalty for adding a new variable in the forward pass
(Friedman's |
fast.k |
Maximum number of parent terms considered at each step of the forward pass.
(This speeds up the forward pass. See the Fast MARS paper section 3.0.) |
fast.beta |
Fast MARS ageing coefficient, as described in the
Fast MARS paper section 3.1.
Default is |
linpreds |
Index vector specifying which predictors should enter linearly, as in |
allowed |
Function specifying which predictors can interact and how.
Default is NULL, meaning all standard MARS terms are allowed. |
pmethod |
Pruning method.
One of: |
nprune |
Maximum number of terms (including intercept) in the pruned model.
Default is NULL, meaning all terms created by the forward pass
(but typically not all terms will remain after pruning).
Use this to enforce an upper bound on the model size (that is less than |
nfold |
Number of cross-validation folds.
Default is |
ncross |
Only applies if |
stratify |
Only applies if |
varmod.method |
Construct a variance model.
For details, see |
varmod.exponent |
Power transform applied to the rhs before regressing the
absolute residuals with the specified |
varmod.conv |
Convergence criterion for the Iteratively Reweighted Least Squares used
when creating the variance model. |
varmod.clamp |
The estimated standard deviation of the main model errors
is forced to be at least a small positive value,
which we call |
varmod.minspan |
Only applies when |
Object |
Earth object to be updated, for use by |
Scale.y |
|
Adjust.endspan |
In interaction terms, |
Auto.linpreds |
Default is |
Force.weights |
Default is |
Use.beta.cache |
Default is |
Force.xtx.prune |
Default is |
Get.leverages |
Default is |
Exhaustive.tol |
Default |
... |
Dots are passed on to |
An S3 model of class "earth"
.
See earth.object
for a complete description.
Stephen Milborrow, derived from mda::mars
by Trevor Hastie and Robert Tibshirani.
The approach used for GLMs was motivated by work done by Jane Elith and John Leathwick (a representative paper is given below).
The evimp
function uses ideas from Max Kuhn's caret
package
https://CRAN.R-project.org/package=caret.
Parts of Thomas Lumley's leaps
package have been
incorporated into earth
, so earth
can directly access
Alan Miller's Fortran functions without going through hidden functions
in the leaps
package.
The Wikipedia article is recommended for an elementary introduction.
The primary references are the Friedman papers, but
readers may find the MARS section in Hastie, Tibshirani,
and Friedman a more accessible introduction.
Faraway takes a hands-on approach,
using the ozone
data to compare mda::mars
with other techniques.
(If you use Faraway's examples with earth
instead of mars
, use $bx
instead of $x
, and check out the book's errata.)
Friedman and Silverman is recommended background reading for the MARS paper.
Earth's pruning pass uses code from the leaps
package
which is based on techniques in Miller.
Faraway (2005) Extending the Linear Model with R https://www.maths.bath.ac.uk/~jjf23
Friedman (1991) Multivariate Adaptive Regression Splines (with discussion)
Annals of Statistics 19/1, 1–141
http://projecteuclid.org/euclid.aos/1176347963
doi:10.1214/aos/1176347963
Friedman (1993) Fast MARS
Stanford University Department of Statistics, Technical Report 110
https://statistics.stanford.edu/research/fast-mars
Friedman and Silverman (1989) Flexible Parsimonious Smoothing and Additive Modeling Technometrics, Vol. 31, No. 1.
Hastie, Tibshirani, and Friedman (2009) The Elements of Statistical Learning (2nd ed.) https://hastie.su.domains/pub.htm
Leathwick, J.R., Rowe, D., Richardson, J., Elith, J., & Hastie, T. (2005) Using multivariate adaptive regression splines to predict the distributions of New Zealand's freshwater diadromous fish Freshwater Biology, 50, 2034-2052 https://hastie.su.domains/pub.htm
Miller, Alan (1990, 2nd ed. 2002) Subset Selection in Regression https://wp.csiro.au/alanmiller/index.html
Wikipedia article on MARS https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
Start with summary.earth
, plot.earth
,
evimp
, and plotmo
.
Please see the main package vignette “Notes on the earth package”. The vignette can also be downloaded from http://www.milbo.org/doc/earth-notes.pdf.
The vignette
“Variance models in earth”
is also included with the package.
It describes how to generate prediction intervals for earth
models.
earth.mod <- earth(Volume ~ ., data = trees) plotmo(earth.mod) summary(earth.mod, digits = 2, style = "pmax")
earth.mod <- earth(Volume ~ ., data = trees) plotmo(earth.mod) summary(earth.mod, digits = 2, style = "pmax")
The object returned by the earth
function.
This is an S3
model of class
"earth"
.
It is a list with the components listed below.
Term refers to a term created during the
forward pass (each line of the output from format.earth
is a term).
Term number 1 is always the intercept.
rss |
Residual sum-of-squares (RSS) of the model (summed over all responses,
if |
rsq |
|
gcv |
Generalized Cross Validation (GCV) of the model (summed over all responses).
The GCV is calculated using the |
grsq |
|
bx |
Matrix of basis functions applied to (Intercept) h(Girth-12.9) h(12.9-Girth) h(Girth-12.9)*h(... [1,] 1 0.0 4.6 0 [2,] 1 0.0 4.3 0 [3,] 1 0.0 4.1 0 ... |
dirs |
Matrix with one row per MARS term, and with with ij-th element equal to
This matrix includes all terms generated by the forward pass,
including those not in Girth Height (Intercept) 0 0 # intercept h(12.9-Girth) -1 0 # 2nd term uses Girth h(Girth-12.9) 1 0 # 3rd term uses Girth h(Girth-12.9)*h(Height-76) 1 1 # 4th term uses Girth and Height ... |
cuts |
Matrix with ij-th element equal to the cut point (hinge value)
for predictor j in term i.
This matrix includes all terms generated by the forward pass,
including those not in Girth Height (Intercept) 0 0 # intercept, no cuts h(12.9-Girth) 12.9 0 # 2nd term has cut at 12.9 h(Girth-12.9) 12.9 0 # 3rd term has cut at 12.9 h(Girth-12.9)*h(Height-76) 12.9 76 # 4th term has two cuts ... |
prune.terms |
A matrix specifying which terms appear in which pruning pass subsets.
The row index of Example [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 1 0 0 0 0 0 0 # intercept-only model [2,] 1 2 0 0 0 0 0 # best 2 term model uses terms 1,2 [3,] 1 2 4 0 0 0 0 # best 3 term model uses terms 1,2,4 [4,] 1 2 6 9 0 0 0 # and so on ... |
selected.terms |
Vector of term numbers in the selected model.
Can be used as a row index vector into |
fitted.values |
Fitted values.
A matrix with dimensions |
residuals |
Residuals.
A matrix with dimensions |
coefficients |
Regression coefficients.
A matrix with dimensions |
rss.per.response |
A vector of the RSS for each response.
Length is the number of responses, i.e., |
rsq.per.response |
A vector of the R-Squared for each response
(where R-Squared is calculated using the |
gcv.per.response |
A vector of the GCV for each response.
Length is the number of responses.
The |
grsq.per.response |
A vector of the GRSq for each response
(calculated using the |
rss.per.subset |
A vector of the RSS
for each model subset generated by the pruning pass.
Length is |
gcv.per.subset |
A vector of the GCV for each model in |
leverages |
Diagonal of the hat matrix (from the linear regression of the response on |
penalty , nk , thresh
|
Copies of the corresponding arguments to |
pmethod , nprune
|
Copies of the corresponding arguments to |
weights , wp
|
Copies of the corresponding arguments to |
termcond |
Reason the forward pass terminated (an integer). |
call |
The call used to invoke |
terms |
Model frame terms.
This component exists only if the model was built using |
modvars |
A matrix specifying which input variables
are used in each column of the model matrix.
(This field is new in earth 5.2.0.) For example, the formula: survived ~ age + pclass + sqrt(age) - sex results in:
age pclass sqrt(age) survived 0 0 0 # the response will be dropped age 1 0 0 pclass 0 1 0 sqrt(age) 0 0 1 # sqrt(age) will be merged with age sex 0 0 0 # sex is unused and will be dropped
age pclass2nd pclass3rd sqrt(age) age 1 0 0 1 # age and sqrt(age) use "age" pclass 0 1 1 0 # pclass2nd and pclass3rd use "pclass" Note that for models built with |
namesx |
Variable names in the input data. Deprecated (subsumed by |
xlevels |
This component exists only if the model was built using |
levels |
This component exists only if the model was built using |
x , y , data , subset
|
Copies of the corresponding arguments to |
glm.list |
List of GLM models. Each element is the value returned by |
glm.coefficients |
GLM regression coefficients.
Analogous to the |
glm.stats |
GLM summary statistics such as |
glm.bpairs |
Is |
cv.list |
List of |
cv.nterms |
Vector of length |
cv.nvars |
Vector of length |
cv.groups |
Specifies which cases went into which folds.
Matrix with two columns and number of rows equal to the the number of cases |
cv.rsq.tab |
Matrix with y mean fold1 0.909 0.909 fold2 0.869 0.869 fold3 0.952 0.952 fold4 0.157 0.157 fold5 0.961 0.961 mean 0.769 0.769 Example for a multiple response model: y1 y2 y3 mean fold1 0.915 0.951 0.944 0.937 fold2 0.962 0.970 0.970 0.968 fold3 0.914 0.940 0.942 0.932 fold4 0.907 0.929 0.925 0.920 fold5 0.947 0.987 0.979 0.971 mean 0.929 0.955 0.952 0.946 |
cv.class.rate.tab |
Like |
cv.maxerr.tab |
Like |
cv.auc.tab |
Like |
cv.cor.tab |
Like |
cv.deviance.tab |
Like |
cv.calib.int.tab |
Like |
cv.calib.slope.tab |
Like |
cv.oof.rsq.tab |
Generated only if |
cv.infold.rsq.tab |
Generated only if |
cv.oof.fit.tab |
Generated only if the |
varmod |
An object of class |
Titanic data with incomplete cases, passenger names, and other details removed.
A data frame with 1046 observations on 6 variables.
pclass |
passenger class, unordered factor: 1st 2nd 3rd |
survived |
integer: 0 or 1 |
sex |
unordered factor: male female |
age |
age in years, min 0.167 max 80.0 |
sibsp |
number of siblings or spouses aboard, integer: 0...8 |
parch |
number of parents or children aboard, integer: 0...6 |
This dataset is included in the earth package because it is a convenient vehicle for illustrating earth's GLM and factor handling.
The dataset was compiled by Frank Harrell and Robert Dawson:
https://hbiostat.org/data/repo/titanic.html
For this version of the Titanic data,
passenger details and incomplete cases were deleted
and the name changed to etitanic
to minimize confusion with other
versions ("e" because it is part of the earth package).
Note that survived
is an integer (it should arguably be a logical).
In this data the crew are conspicuous by their absence.
Contents of etitanic
:
pclass survived sex age sibsp parch 1 1st 1 female 29.000 0 0 2 1st 1 male 0.917 1 2 3 1st 0 female 2.000 1 2 4 1st 0 male 30.000 1 2 5 1st 0 female 25.000 1 2 ... 1309 3rd 0 male 29.000 0 0
How etitanic
was built:
load("titanic3") # from Harrell's web site # discard name, ticket, fare, cabin, embarked, body, home.dest etitanic <- titanic3[,c(1,2,4,5,6,7)] etitanic <- etitanic[!is.na(etitanic$age),] save(etitanic, file="etitanic.rda")
Further details and analyses of the Titanic data may be found in:
F. Harrell (2001) Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis https://hbiostat.org/rmsc
Estimate variable importances in an earth
object
evimp(object, trim=TRUE, sqrt.=TRUE)
evimp(object, trim=TRUE, sqrt.=TRUE)
object |
An |
trim |
If |
sqrt. |
Default is |
This function returns a matrix showing the relative importances of the
variables in the model. There is a row for each variable. The row
name is the variable name, but with -unused
appended if the
variable does not appear in the final model.
The columns of the matrix are (not all of these are printed by print.evimp
):
col
: Column index of the variable in the x
argument to earth
.
used
: 1 if the variable is used in the final model, else 0.
Equivalently, 0 if the row name has an -unused
suffix.
nsubsets
: Variable importance using the "number of subsets" criterion.
Is the number of subsets that include the variable (see "Three Criteria" in the chapter
on evimp
in the earth
vignette
“Notes on the earth package”).
gcv
: Variable importance using the GCV criterion (see "Three Criteria").
gcv.match
: 1, except is
0 where the rank using the gcv
criterion differs from
that using the nsubsets
criterion.
In other words, there is a 0 for values that increase as you go
down the gcv
column.
rss
: Variable importance using the RSS criterion (see "Three Criteria").
rss.match
: Like gcv.match
but for the rss
.
The rows are sorted on the nsubsets
criterion.
This means that values in the nsubsets
column decrease as you go down the column
(more accurately, they are non-increasing).
The values in the gcv
and rss
columns
are also non-increasing, except where the
gcv
or rss
rank differs from the nsubsets
ranking.
There is a chapter on evimp
in the earth
package vignette
“Notes on the earth package”.
Acknowledgment
Thanks to Max Kuhn for the original evimp
code and for helpful discussions.
data(ozone1) earth.mod <- earth(O3 ~ ., data=ozone1, degree=2) ev <- evimp(earth.mod, trim=FALSE) plot(ev) print(ev)
data(ozone1) earth.mod <- earth(O3 ~ ., data=ozone1, degree=2) ev <- evimp(earth.mod, trim=FALSE) plot(ev) print(ev)
Expand binomial-pair data from “short” to “long” form.
The short form specifies the response with two columns giving the numbers of successes and failures. Example short form:
survived died dose sex 3 0 10 male 2 1 10 female 1 2 20 male 1 2 20 female
The long form specifies the response as single column of
TRUE
s and FALSE
s.
For example, the
long form of the above data (spaces and comments added):
survived dose sex TRUE 10 male # row 1 of short data: 0 died, 3 survived TRUE 10 male TRUE 10 male FALSE 10 female # row 2 of short data: 1 died, 2 survived TRUE 10 female TRUE 10 female FALSE 20 male # row 3 of short data: 2 died, 1 survived FALSE 20 male TRUE 20 male FALSE 20 female # row 4 of short data: 2 died, 1 survived FALSE 20 female TRUE 20 female
In this example the total number of survived and died for each row in the short data is the same, but in general that need not be true.
## S3 method for class 'formula' expand.bpairs(formula = stop("no 'formula' argument"), data = NULL, sort = FALSE, ...) ## Default S3 method: expand.bpairs(data = stop("no 'data' argument"), y = NULL, sort = FALSE, ...)
## S3 method for class 'formula' expand.bpairs(formula = stop("no 'formula' argument"), data = NULL, sort = FALSE, ...) ## Default S3 method: expand.bpairs(data = stop("no 'data' argument"), y = NULL, sort = FALSE, ...)
formula |
Model formula such as |
data |
Matrix or dataframe containing the data. |
y |
Model response. One of:
|
sort |
Default |
... |
Unused, but provided for generic/method consistency. |
A dataframe of the data in the long form, with expanded binomial pairs.
The first column of the data will be the response column
(a column of TRUE
s and FALSE
s).
Additionally, the returned value has two attached attributes:
bpairs.index
A vector of row indices into the returned data.
Can be used to reconstruct the short data from the long data
(although this package does not yet provide a function to do so).
ynames
Column names of the original response (a two-element character vector).
survived <- c(3,2,1,1) # short data for demo (too short to build a real model) died <- c(0,1,2,2) dose <- c(10,10,20,20) sex <- factor(c("male", "female", "male", "female")) short.data <- data.frame(survived, died, dose, sex) expand.bpairs(survived + died ~ ., short.data) # returns long form of the data # expand.bpairs(data=short.data, y=cbind(survived, died)) # equivalent # expand.bpairs(short.data, c(1,2)) # equivalent # expand.bpairs(short.data, c("survived", "died")) # equivalent # For example models, see the earth vignette # section "Short versus long binomial data".
survived <- c(3,2,1,1) # short data for demo (too short to build a real model) died <- c(0,1,2,2) dose <- c(10,10,20,20) sex <- factor(c("male", "female", "male", "female")) short.data <- data.frame(survived, died, dose, sex) expand.bpairs(survived + died ~ ., short.data) # returns long form of the data # expand.bpairs(data=short.data, y=cbind(survived, died)) # equivalent # expand.bpairs(short.data, c(1,2)) # equivalent # expand.bpairs(short.data, c("survived", "died")) # equivalent # For example models, see the earth vignette # section "Short versus long binomial data".
Return a string representing an earth
expression
(summary.earth
calls this function internally to
display the terms of the earth
model).
## S3 method for class 'earth' format(x = stop("no 'x' argument"), style = "h", decomp = "anova", digits = getOption("digits"), use.names = TRUE, colon.char = ":", ...)
## S3 method for class 'earth' format(x = stop("no 'x' argument"), style = "h", decomp = "anova", digits = getOption("digits"), use.names = TRUE, colon.char = ":", ...)
x |
An |
style |
Formatting style. One of |
decomp |
One of |
digits |
Number of significant digits.
The default is |
use.names |
One of |
colon.char |
Change colons in the returned string to colon.char.
Default is ":" (no change).
Specifying |
... |
Unused, but provided for generic/method consistency. |
A character representation of the earth
model.
If there are multiple responses, format.earth
will return multiple strings.
If there are embedded GLM model(s), the strings for the GLM model(s)
come after the strings for the standard earth
model(s).
The FAQ section in the package vignette gives precise details of the
"anova"
ordering.
Using format.earth
, perhaps after hand editing the returned string,
you can create an alternative to predict.earth
.
For example:
as.func <- function(object, digits = 8, use.names = FALSE, ...) eval(parse(text=paste( "function(x){\n", "if(is.vector(x))\n", " x <- matrix(x, nrow = 1, ncol = length(x))\n", "with(as.data.frame(x),\n", format(object, digits = digits, use.names = use.names, style = "pmax", ...), ")\n", "}\n", sep = ""))) earth.mod <- earth(Volume ~ ., data = trees) my.func <- as.func(earth.mod, use.names = FALSE) my.func(c(10,80)) # returns 16.84 predict(earth.mod, c(10,80)) # returns 16.84
Note that with pmax
the R expression generated by
format.earth
can handle multiple cases.
Thus the expression is consistent with the way predict
functions usually work in R — we can give predict
multiple
cases (i.e., multiple rows in the input matrix) and it will return a
vector of predicted values.
The earth package also provides a function format.lm
.
It has arguments as followsformat.lm(x, digits=getOption("digits"), use.names=TRUE, colon.char=":")
(Strictly speaking, format.lm
doesn't belong in the earth package.) Example:
lm.mod <- lm(Volume ~ Height*Girth, data = trees) cat(format(lm.mod, colon.char="*")) # yields: # 69.4 # - 1.30 * Height # - 5.86 * Girth # + 0.135 * Height*Girth
earth.mod <- earth(Volume ~ ., data = trees) cat(format(earth.mod)) # yields: # 29.0 # - 3.42 * h(14.2-Girth) # + 6.23 * h(Girth-14.2) # + 0.581 * h(Height-75) cat(format(earth.mod, style="pmax")) # yields: # 29.0 # - 3.42 * pmax(0, 14.2 - Girth) # + 6.23 * pmax(0, Girth - 14.2) # + 0.581 * pmax(0, Height - 75) cat(format(earth.mod, style="C")) # yields (note zero based indexing): # 29.0 # - 3.42 * max(0, 14.2 - x[0]) # + 6.23 * max(0, x[0] - 14.2) # + 0.581 * max(0, x[1] - 75) cat(format(earth.mod, style="bf")) # yields: # 29.0 # - 3.42 * bf1 # + 6.23 * bf2 # + 0.581 * bf3 # # bf1 h(14.2-Girth) # bf2 h(Girth-14.2) # bf3 h(Height-75)
earth.mod <- earth(Volume ~ ., data = trees) cat(format(earth.mod)) # yields: # 29.0 # - 3.42 * h(14.2-Girth) # + 6.23 * h(Girth-14.2) # + 0.581 * h(Height-75) cat(format(earth.mod, style="pmax")) # yields: # 29.0 # - 3.42 * pmax(0, 14.2 - Girth) # + 6.23 * pmax(0, Girth - 14.2) # + 0.581 * pmax(0, Height - 75) cat(format(earth.mod, style="C")) # yields (note zero based indexing): # 29.0 # - 3.42 * max(0, 14.2 - x[0]) # + 6.23 * max(0, x[0] - 14.2) # + 0.581 * max(0, x[1] - 75) cat(format(earth.mod, style="bf")) # yields: # 29.0 # - 3.42 * bf1 # + 6.23 * bf2 # + 0.581 * bf3 # # bf1 h(14.2-Girth) # bf2 h(Girth-14.2) # bf3 h(Height-75)
Convert a mars
object from the mda
package to an earth
object
mars.to.earth(object, trace=TRUE)
mars.to.earth(object, trace=TRUE)
object |
A |
trace |
If |
The value is the same format as that returned by earth
but
with skeletal versions of rss.per.subset
,
gcv.per.subset
, and prune.terms
.
You can fully initialize these components by calling update.earth
after mars.to.earth
, but if you do this selected.terms
may change.
However with pmethod="backward"
a change is unlikely —
selected.terms
would change only if GCVs are so close that numerical errors
have an effect.
Differences between mars and earth objects
Perhaps the most notable difference between
mars
and earth
objects is that mars
returns the
MARS basis matrix in a field called "x
"
whereas earth
returns "bx
" with only the selected terms.
Also, earth
returns "dirs
" rather than "factors
",
and in earth
this matrix can have entries of value 2 for linear predictors.
For details of other differences between mars
and earth
objects,
see the comments in the source code of mars.to.earth
.
Weights
The w
argument is silently ignored by mars
.
mars
normalizes wp
to (euclidean) length 1;
earth
normalizes
wp
to length equal to the number of responses, i.e., the number
of columns in y
. This change was made so an all ones wp
(or in fact any all constant wp
) is equivalent to using no wp
.
If the original call to mars
used the wp
argument,
mars.to.earth
will run update.earth
to force consistency.
This could modify the model, so a warning is issued.
if(require(mda)) { mars.mod <- mars(trees[,-3], trees[,3]) earth.mod <- mars.to.earth(mars.mod) # the standard earth functions can now be used # note the reconstructed call in the summary summary(earth.mod, digits = 2) }
if(require(mda)) { mars.mod <- mars(trees[,-3], trees[,3]) earth.mod <- mars.to.earth(mars.mod) # the standard earth functions can now be used # note the reconstructed call in the summary summary(earth.mod, digits = 2) }
Get the basis matrix of an earth
model.
## S3 method for class 'earth' model.matrix(object = stop("no 'object' argument"), x = NULL, subset = NULL, which.terms = NULL, trace = 0, ..., Env = parent.frame(), Callers.name = "model.matrix.earth")
## S3 method for class 'earth' model.matrix(object = stop("no 'object' argument"), x = NULL, subset = NULL, which.terms = NULL, trace = 0, ..., Env = parent.frame(), Callers.name = "model.matrix.earth")
object |
An |
x |
Default is NULL, meaning use the original data
used to build the Else |
subset |
Which rows to use in |
which.terms |
Which terms to use.
Default is NULL, meaning all terms in the earth model
(i.e. the terms in |
trace |
Default 0. Set to non-zero to see which data |
... |
Unused, but provided for generic/method consistency. |
Env |
For internal use. |
Callers.name |
For internal use (used by earth in trace messages). |
A basis matrix bx
of the same form returned by earth
.
The format of bx
is described in earth.object
.
If x
, subset
, and which.terms
are all NULL (the
default), this function returns the model's bx
. In this case,
it is perhaps easier to simply use object$bx
.
The matrix bx
can be used
as the input matrix to lm
or glm
,
as shown below in the example.
In fact, that is what earth does internally after the pruning pass —
it calls lm.fit
,
and additionally glm
if earth's glm
argument is used.
# Example 1 data(trees) earth.mod <- earth(Volume ~ ., data = trees) # standard earth model summary(earth.mod, decomp = "none") # "none" to print terms in same order as lm.mod below bx <- model.matrix(earth.mod) # earth model's basis mat (equivalent to bx <- earth.mod$bx) lm.mod <- lm(trees$Volume ~ bx[,-1]) # -1 to drop intercept summary(lm.mod) # yields same coeffs as above summary # displayed t values are not meaningful # Example 2 earth.mod <- earth(Volume~., data=trees) # standard earth model summary(earth.mod, decomp = "none") # "none" to print terms in same order as lm.mod below bx <- model.matrix(earth.mod) # earth model's basis mat (equivalent to bx <- earth.mod$bx) bx <- bx[, -1] # drop intercept column bx <- as.data.frame(bx) # lm requires a data frame bx$Volume <- trees$Volume # add Volume to data lm.mod <- lm(Volume~., data=bx) # standard linear regression on earth's basis mat summary(lm.mod) # yields same coeffs as above summary # displayed t values are not meaningful
# Example 1 data(trees) earth.mod <- earth(Volume ~ ., data = trees) # standard earth model summary(earth.mod, decomp = "none") # "none" to print terms in same order as lm.mod below bx <- model.matrix(earth.mod) # earth model's basis mat (equivalent to bx <- earth.mod$bx) lm.mod <- lm(trees$Volume ~ bx[,-1]) # -1 to drop intercept summary(lm.mod) # yields same coeffs as above summary # displayed t values are not meaningful # Example 2 earth.mod <- earth(Volume~., data=trees) # standard earth model summary(earth.mod, decomp = "none") # "none" to print terms in same order as lm.mod below bx <- model.matrix(earth.mod) # earth model's basis mat (equivalent to bx <- earth.mod$bx) bx <- bx[, -1] # drop intercept column bx <- as.data.frame(bx) # lm requires a data frame bx$Volume <- trees$Volume # add Volume to data lm.mod <- lm(Volume~., data=bx) # standard linear regression on earth's basis mat summary(lm.mod) # yields same coeffs as above summary # displayed t values are not meaningful
Ozone readings in Los Angeles, with incomplete cases removed.
A data frame with 330 observations on 10 variables.
O3 |
daily maximum of the hourly average ozone concentrations in Upland, CA |
vh |
500 millibar pressure height, measured at the Vandenberg air force base |
wind |
wind speed in mph at LAX airport |
humidity |
humidity in percent at LAX |
temp |
Sandburg Air Force Base temperature in degrees Fahrenheit |
ibh |
temperature inversion base height in feet |
dpg |
pressure gradient from LAX to Daggert in mm Hg |
ibt |
inversion base temperature at LAX in degrees Fahrenheit |
vis |
visibility at LAX in miles |
doy |
day of the year |
This dataset was copied from library(faraway)
and the name
changed to ozone1
to prevent a name clash.
The data were originally made available by Leo Breiman who was a consultant
on a project where the data were generated.
Example analyses using these data may be found in Faraway and in Hastie and Tibshirani.
> ozone1 O3 vh wind humidity temp ibh dpg ibt vis doy 1 3 5710 4 28 40 2693 -25 87 250 33 2 5 5700 3 37 45 590 -24 128 100 34 3 5 5760 3 51 54 1450 25 139 60 35 ... 330 1 5550 4 85 39 5000 8 44 100 390
Faraway (2005) Extending the Linear Model with R https://www.maths.bath.ac.uk/~jjf23
Hastie and Tibshirani (1990) Generalized Additive Models https://hastie.su.domains/pub.htm
airquality
a different set of ozone data
Plot an earth
object.
By default the plot shows model selection, cumulative distribution
of the residuals, residuals versus fitted values, and the residual QQ plot.
This function calls plotres
internally.
The first arguments are identical to plotres
.
## S3 method for class 'earth' plot(x = stop("no 'x' argument"), # the following are identical to plotres arguments which = 1:4, info = FALSE, versus = 1, standardize = FALSE, delever = FALSE, level = 0, id.n = 3, labels.id = NULL, smooth.col = 2, grid.col = 0, jitter = 0, do.par = NULL, caption = NULL, trace = 0, npoints = 3000, center = TRUE, type = NULL, nresponse = NA, # the following are earth specific col.cv = "lightblue", col.grsq = 1, col.rsq = 2, col.infold.rsq = 0, col.mean.infold.rsq = 0, col.mean.oof.rsq = "palevioletred", col.npreds = if(is.null(object$cv.oof.rsq.tab)) 1 else 0, col.oof.labs = 0, col.oof.rsq = "mistyrose2", col.oof.vline = col.mean.oof.rsq, col.pch.cv.rsq = 0, col.pch.max.oof.rsq = 0, col.vline = col.grsq, col.vseg = 0, lty.grsq = 1, lty.npreds = 2, lty.rsq = 5, lty.vline = "12", legend.pos = NULL, ...) earth_plotmodsel( # for internal use by plotres x, col.rsq = 2, col.grsq = 1, col.infold.rsq = 0, col.mean.infold.rsq = 0, col.mean.oof.rsq = "palevioletred", col.npreds = NULL, col.oof.labs = 0, col.oof.rsq = "mistyrose2", col.oof.vline = col.mean.oof.rsq, col.pch.cv.rsq = 0, col.pch.max.oof.rsq = 0, col.vline = col.grsq, col.vseg = 0, lty.grsq = 1, lty.npreds = 2, lty.rsq = 5, lty.vline = "12", legend.pos=NULL, add = FALSE, jitter = 0, max.nterms = length(object$rss.per.subset), max.npreds=max(1,get.nused.preds.per.subset(object$dirs,object$prune.terms)), ...)
## S3 method for class 'earth' plot(x = stop("no 'x' argument"), # the following are identical to plotres arguments which = 1:4, info = FALSE, versus = 1, standardize = FALSE, delever = FALSE, level = 0, id.n = 3, labels.id = NULL, smooth.col = 2, grid.col = 0, jitter = 0, do.par = NULL, caption = NULL, trace = 0, npoints = 3000, center = TRUE, type = NULL, nresponse = NA, # the following are earth specific col.cv = "lightblue", col.grsq = 1, col.rsq = 2, col.infold.rsq = 0, col.mean.infold.rsq = 0, col.mean.oof.rsq = "palevioletred", col.npreds = if(is.null(object$cv.oof.rsq.tab)) 1 else 0, col.oof.labs = 0, col.oof.rsq = "mistyrose2", col.oof.vline = col.mean.oof.rsq, col.pch.cv.rsq = 0, col.pch.max.oof.rsq = 0, col.vline = col.grsq, col.vseg = 0, lty.grsq = 1, lty.npreds = 2, lty.rsq = 5, lty.vline = "12", legend.pos = NULL, ...) earth_plotmodsel( # for internal use by plotres x, col.rsq = 2, col.grsq = 1, col.infold.rsq = 0, col.mean.infold.rsq = 0, col.mean.oof.rsq = "palevioletred", col.npreds = NULL, col.oof.labs = 0, col.oof.rsq = "mistyrose2", col.oof.vline = col.mean.oof.rsq, col.pch.cv.rsq = 0, col.pch.max.oof.rsq = 0, col.vline = col.grsq, col.vseg = 0, lty.grsq = 1, lty.npreds = 2, lty.rsq = 5, lty.vline = "12", legend.pos=NULL, add = FALSE, jitter = 0, max.nterms = length(object$rss.per.subset), max.npreds=max(1,get.nused.preds.per.subset(object$dirs,object$prune.terms)), ...)
x |
An |
which , info , versus
|
These arguments are identical to |
standardize , delever , level
|
. |
id.n , labels.id , smooth.col
|
. |
grid.col , jitter
|
. |
do.par , caption , trace
|
. |
npoints , center
|
. |
type , nresponse
|
. |
col.cv |
Default |
col.grsq |
Default |
col.rsq |
Default |
col.infold.rsq |
Color of in-fold RSq lines for each fold in the Model Selection plot.
Applies only if |
col.mean.infold.rsq |
Color of mean in-fold RSq for each number of terms in the Model Selection plot.
Default is |
col.mean.oof.rsq |
Default |
col.npreds |
Color
of the "number of predictors" plot in the Model Selection plot.
The default displays the number of predictors unless the |
col.oof.labs |
Color of fold number labels on the |
col.oof.rsq |
Color of out-of-fold RSq lines for each fold in the Model Selection plot.
Applies only if |
col.oof.vline |
Color of vertical line at the maximum |
col.pch.cv.rsq |
Color of point plotted on the |
col.pch.max.oof.rsq |
Color of point plotted on the |
col.vline |
Color of the vertical line at selected model in the Model Selection plot.
Default is |
col.vseg |
Default is |
lty.grsq |
Line type of GRSq line in the Model Selection plot.
Default is |
lty.npreds |
Line type of the "number of predictors" plot in the Model Selection plot.
Default is |
lty.rsq |
Line type of RSq line in the Model Selection plot.
Default is |
lty.vline |
Line type of vertical line at selected model in the Model Selection plot.
Default is |
legend.pos |
Position of the legend in the Model Selection plot.
Default is |
add , max.nterms , max.npreds
|
|
... |
Please see The |
For details on interpreting the graphs,
please see the earth
package vignettes
“Notes on the earth package”
and
“Variance models in earth”.
Note that cross-validation data will not be displayed unless
both nfold
and keepxy
were used in the original call to
earth
.
To remove the Number of used predictors
from the Model Selection
graph (to reduce clutter), use col.npreds=0
.
earth_plotmodsel
is provided for use by plotres
.
Acknowledgment
This function incorporates the function spread.labs
from the orphaned
package TeachingDemos
written by Greg Snow.
earth
,
plot.earth.models
,
plotd
,
plotmo
data(ozone1) earth.mod <- earth(O3 ~ ., data = ozone1, degree = 2) plot(earth.mod)
data(ozone1) earth.mod <- earth(O3 ~ ., data = ozone1, degree = 2) plot(earth.mod)
Compare earth
models by plotting them.
## S3 method for class 'earth.models' plot(x = stop("no 'x' argument"), which = c(1:2), caption = "", jitter = 0, col.grsq = discrete.plot.cols(length(objects)), lty.grsq = 1, col.rsq = 0, lty.rsq = 5, col.vline = col.grsq, lty.vline = "12", col.npreds = 0, lty.npreds = 2, legend.text = NULL, do.par = NULL, trace = 0, ...)
## S3 method for class 'earth.models' plot(x = stop("no 'x' argument"), which = c(1:2), caption = "", jitter = 0, col.grsq = discrete.plot.cols(length(objects)), lty.grsq = 1, col.rsq = 0, lty.rsq = 5, col.vline = col.grsq, lty.vline = "12", col.npreds = 0, lty.npreds = 2, legend.text = NULL, do.par = NULL, trace = 0, ...)
x |
A list of one or more |
which |
Which plots to plot: 1 model, 2 cumulative distribution of residuals.
Default is |
caption |
Overall caption. Values: |
jitter |
Jitter applied to GRSq and RSq values to minimize over-plotting.
Default is |
col.grsq |
Vector of colors for the GRSq plot.
The default is |
lty.grsq |
Line type for the GRSq plot.
Default is |
col.rsq |
Vector of colors for the RSq plot.
Default is |
lty.rsq |
Line type for the RSq plot.
Default is |
col.vline |
A vertical line is drawn for each object
to show which model size was chosen for that object.
The color of the line is |
lty.vline |
Line type of vertical lines (a vertical line is drawn to show the selected model for each object).
Can be a vector.
Default is |
col.npreds |
Vector of colors for the "number of predictors" plot within the model selection plot.
Default is |
lty.npreds |
Line type of the "number of predictors" plot (in the Model Selection plot).
Default is |
legend.text , do.par , trace
|
Please see |
... |
Please see |
This function ignores GLM and cross-validation components of the earth model, if any.
earth
,
plot.earth
,
plot.earth.models
,
plotd
,
plotmo
data(ozone1) a1 <- earth(O3 ~ ., data = ozone1, degree = 2) a2 <- earth(O3 ~ .-wind, data = ozone1, degree = 2) a3 <- earth(O3 ~ .-humidity, data = ozone1, degree = 2) plot.earth.models(list(a1,a2,a3), ylim=c(.65,.85))
data(ozone1) a1 <- earth(O3 ~ ., data = ozone1, degree = 2) a2 <- earth(O3 ~ .-wind, data = ozone1, degree = 2) a3 <- earth(O3 ~ .-humidity, data = ozone1, degree = 2) plot.earth.models(list(a1,a2,a3), ylim=c(.65,.85))
Plot an evimp
object.
## S3 method for class 'evimp' plot(x = stop("no 'x' argument"), cex.var = 1, type.nsubsets = "l", col.nsubsets = "black", lty.nsubsets = 1, type.gcv = "l", col.gcv = 2, lty.gcv = 1, type.rss = "l", col.rss = "gray60", lty.rss = 1, cex.legend = 1, x.legend = nrow(x), y.legend = x[1,"nsubsets"], rh.col = 1, do.par = TRUE, ...)
## S3 method for class 'evimp' plot(x = stop("no 'x' argument"), cex.var = 1, type.nsubsets = "l", col.nsubsets = "black", lty.nsubsets = 1, type.gcv = "l", col.gcv = 2, lty.gcv = 1, type.rss = "l", col.rss = "gray60", lty.rss = 1, cex.legend = 1, x.legend = nrow(x), y.legend = x[1,"nsubsets"], rh.col = 1, do.par = TRUE, ...)
x |
An |
cex.var |
cex for variable names. Default is 1. Make smaller (say 0.8) if you have lots of variables. |
type.nsubsets |
Plot type for nsubsets graph. Default is "l". Use "n" for none, "b" looks good too. |
col.nsubsets |
Color of nsubsets line. Default is "black". |
lty.nsubsets |
Line type of nsubsets line. Default is 1. |
type.gcv , col.gcv , lty.gcv
|
As above but for the gcv plot |
type.rss , col.rss , lty.rss
|
As above but for the rss plot |
cex.legend |
cex for legend strings. Default is 1. Make smaller (say 0.8) if you want a smaller legend. |
x.legend |
x position of legend. Use 0 for no legend. |
y.legend |
y position of legend. |
rh.col |
Color of right hand axis label. Use |
do.par |
Call |
... |
Extra arguments passed to plotting functions. |
earth
,
evimp
,
plot.earth.models
,
plotmo
data(ozone1) earth.mod <- earth(O3 ~ ., data=ozone1, degree=2) ev <- evimp(earth.mod) plot(ev) print(ev)
data(ozone1) earth.mod <- earth(O3 ~ ., data=ozone1, degree=2) ev <- evimp(earth.mod) plot(ev) print(ev)
Plot a variance model (a varmod
object).
Typically you call this function for a variance model
embedded in an earth
model.
## S3 method for class 'varmod' plot(x = stop("no 'x' argument"), which = 1:4, do.par = NULL, info=FALSE, cex = NULL, caption = NULL, line.col = 2, min.sd.col = line.col, trace = 0, ...)
## S3 method for class 'varmod' plot(x = stop("no 'x' argument"), which = 1:4, do.par = NULL, info=FALSE, cex = NULL, caption = NULL, line.col = 2, min.sd.col = line.col, trace = 0, ...)
x |
A |
which |
Which plots to plot. Default is 1:4 meaning all.
The term parent below refers to the |
do.par |
Please see |
info |
Plot some additional information, including lowess fits in the first two plots. |
cex |
Character expansion. |
caption |
Default is NULL, meaning automatically generate an overall caption. |
line.col |
Color of lines in the plots.
Default is |
min.sd.col |
Color of the |
trace , ...
|
Similar to |
The horizontal red dotted line in the first two plots
shows the value of min.sd
.
See earth
's varmod.clamp
argument.
data(ozone1) set.seed(1) # optional, for cross validation reproducibility # note: should really use ncross=30 below but for a quick demo we don't earth.mod <- earth(O3~temp, data=ozone1, nfold=10, ncross=3, varmod.method="lm") plot(earth.mod$varmod) # plot the embedded variance model (this calls plot.varmod)
data(ozone1) set.seed(1) # optional, for cross validation reproducibility # note: should really use ncross=30 below but for a quick demo we don't earth.mod <- earth(O3~temp, data=ozone1, nfold=10, ncross=3, varmod.method="lm") plot(earth.mod$varmod) # plot the embedded variance model (this calls plot.varmod)
Plot the distribution of the predicted values for each class.
Can be used for earth
models, but also for models built by
lm
,
glm
,
lda
,
etc.
plotd(object, hist = FALSE, type = NULL, nresponse = NULL, dichot = FALSE, trace = FALSE, xlim = NULL, ylim = NULL, jitter = FALSE, main=NULL, xlab = "Predicted Value", ylab = if(hist) "Count" else "Density", lty = 1, col = c("gray70", 1, "lightblue", "brown", "pink", 2, 3, 4), fill = if(hist) col[1] else 0, breaks = "Sturges", labels = FALSE, kernel = "gaussian", adjust = 1, zero.line = FALSE, legend = TRUE, legend.names = NULL, legend.pos = NULL, cex.legend = .8, legend.bg = "white", legend.extra = FALSE, vline.col = 0, vline.thresh = .5, vline.lty = 1, vline.lwd = 1, err.thresh = vline.thresh, err.col = 0, err.border = 0, err.lwd = 1, xaxt = "s", yaxt = "s", xaxis.cex = 1, sd.thresh = 0.01, ...)
plotd(object, hist = FALSE, type = NULL, nresponse = NULL, dichot = FALSE, trace = FALSE, xlim = NULL, ylim = NULL, jitter = FALSE, main=NULL, xlab = "Predicted Value", ylab = if(hist) "Count" else "Density", lty = 1, col = c("gray70", 1, "lightblue", "brown", "pink", 2, 3, 4), fill = if(hist) col[1] else 0, breaks = "Sturges", labels = FALSE, kernel = "gaussian", adjust = 1, zero.line = FALSE, legend = TRUE, legend.names = NULL, legend.pos = NULL, cex.legend = .8, legend.bg = "white", legend.extra = FALSE, vline.col = 0, vline.thresh = .5, vline.lty = 1, vline.lwd = 1, err.thresh = vline.thresh, err.col = 0, err.border = 0, err.lwd = 1, xaxt = "s", yaxt = "s", xaxis.cex = 1, sd.thresh = 0.01, ...)
To start off, look at the arguments object
, hist
, type
.
For predict methods with multiple column responses, see the nresponse
argument.
For factor responses with more than two levels, see the dichot
argument.
object |
Model object. Typically a model which predicts a class or a class discriminant. |
hist |
|
type |
Type parameter passed to |
nresponse |
Which column to use when |
dichot |
Dichotimise the predicted response.
This argument is ignored except for models where the observed response
is a factor with more than two levels
and the predicted response is a numeric vector.
The default |
trace |
Default |
xlim |
Limits of the x axis.
The default |
ylim |
Limits of the y axis.
The default |
jitter |
Jitter the histograms or densities horizontally to minimize overplotting.
Default |
main |
Main title. Values: |
xlab |
x axis label.
Default is |
ylab |
y axis label.
Default is |
lty |
Per class line types for the plotted lines. Default is 1 (which gets recycled for all lines). |
col |
Per class line colors. The first few colors of the default are intended to be easily distinguishable on both color displays and monochrome printers. |
fill |
Fill color for the plot for the first class.
For |
breaks |
Passed to |
labels |
|
kernel |
Passed to |
adjust |
Passed to |
zero.line |
Passed to |
legend |
|
legend.names |
Class names in legend.
The default |
legend.pos |
Position of the legend.
The default |
cex.legend |
|
legend.bg |
|
legend.extra |
Show (in the legend) the number of occurrences of each class.
Default is |
vline.thresh |
Horizontal position of optional vertical line.
Default is |
vline.col |
Color of vertical line. Default is 0, meaning no vertical line. |
vline.lty |
Line type of vertical line.
Default is |
vline.lwd |
Line width of vertical line.
Default is |
err.thresh |
x axis value specifying the error shading threshold.
See |
err.col |
Specify up to three colors to shade the "error areas" of the density plot.
The default is data(etitanic) earth.mod <- earth(survived ~ ., data=etitanic) plotd(earth.mod, vline.col=1, err.col=c(2,3,4)) The three areas are (i) the error area to the left of the threshold,
(ii) the error area to the right of the threshold, and,
(iii) the reducible error area.
If less than three values are specified, |
err.border |
Borders around the error shading.
Default is |
err.lwd |
Line widths of borders of the error shading.
Default is |
xaxt |
Default is |
yaxt |
Default is |
xaxis.cex |
Only used if |
sd.thresh |
Minimum acceptable standard deviation for a density.
Default is |
... |
Extra arguments passed to the predict method for the object. |
This function calls predict
with the data originally used to build
the model, and with the type
specified above.
It then separates the predicted values into classes,
where the class for each predicted value
is determined by the class of the observed response.
Finally, it calls density
(or hist
if hist=TRUE
) for each class-specific set of values,
and plots the results.
This function estimates distributions with the
density
and hist
functions,
and also calls plot.density
and plot.histogram
.
For an overview see Venables and Ripley MASS section 5.6.
Partitioning the response into classes
Considerable effort is made to partition the predicted response
into classes in a sensible way.
This is not always possible for multiple column responses and the nresponse
argument
should be used where necessary.
The partitioning details depend on the types and numbers of columns in the observed
and predicted responses.
These in turn depend on the model object and the type
argument.
Use the trace
argument to see how plotd
partitions the
response for your model.
Degenerate densities
A message such asWarning: standard deviation of "male" density is 0, density is degenerate?
means that the density for that class will not be plotted
(the legend will say "not plotted"
).
Set sd.thresh=0
to get rid of this check,
but be aware that histograms (and sometimes x axis labels)
for degenerate densities will be misleading.
Using plotd for various models
This function is included in the earth
package
but can also be used with other models.
Example with glm
:
library(earth); data(etitanic) glm.model <- glm(sex ~ ., data=etitanic, family=binomial) plotd(glm.model)
Example with lm
:
library(earth); data(etitanic) lm.model <- lm(as.numeric(sex) ~ ., data=etitanic) plotd(lm.model)
Using plotd with lda or qda
The plotd
function has special handling for lda
(and qda
) objects.
For such objects, the type
argument can take one of the
following values:
"response"
(default) linear discriminant"ld"
same as "response"
"class"
predicted classes"posterior"
posterior probabilities
Example:
library(MASS); library(earth); data(etitanic) lda.model <- lda(sex ~ ., data=etitanic) plotd(lda.model) # linear discriminant by default plotd(lda.model, type="class", hist=TRUE, labels=TRUE)
This handling of type
is handled internally by plotd
and type
is not passed to predict.lda
(type
is used merely to select fields in the list
returned by predict.lda
).
The type names can be abbreviated down to a single character.
For objects created with lda.matrix
(as opposed to lda.formula
),
plotd
blindly assumes that the grouping
argument was the second argument.
plotd
does not yet support objects created with lda.data.frame
.
For lda
responses with more than two factor levels,
use the nresponse
argument to
select a column in the predicted response.
Thus with the default type=NULL
,
(which gets automatically converted by plotd
to type="response"
),
use nresponse=1
to select just the first linear discriminant.
The default nresponse=NULL
selects all columns,
which is typically not what you want for lda
models.
Example:
library(MASS); library(earth); set.seed(1) # optional, for reproducibility example(lda) # creates a model called "z" plot(z, dimen=1) # invokes plot.lda from the MASS package plotd(z, nresponse=1, hist=1) # equivalent using plotd # nresponse=1 selects first linear discr.
The dichot=TRUE
argument is also useful for lda
responses with more than two factor levels.
TODO
Handle degenerate densities in a more useful way.
Add freq
argument for hist
.
density
, plot.density
hist
, plot.histogram
earth
, plot.earth
if (require(earth)) { old.par <- par(no.readonly=TRUE); par(mfrow=c(2,2), mar=c(4, 3, 1.7, 0.5), mgp=c(1.6, 0.6, 0), cex = 0.8) data(etitanic) mod <- earth(survived ~ ., data=etitanic, degree=2, glm=list(family=binomial)) plotd(mod) plotd(mod, hist=TRUE, legend.pos=c(.25,220)) plotd(mod, hist=TRUE, type="class", labels=TRUE, xlab="", xaxis.cex=.8) par(old.par) }
if (require(earth)) { old.par <- par(no.readonly=TRUE); par(mfrow=c(2,2), mar=c(4, 3, 1.7, 0.5), mgp=c(1.6, 0.6, 0), cex = 0.8) data(etitanic) mod <- earth(survived ~ ., data=etitanic, degree=2, glm=list(family=binomial)) plotd(mod) plotd(mod, hist=TRUE, legend.pos=c(.25,220)) plotd(mod, hist=TRUE, type="class", labels=TRUE, xlab="", xaxis.cex=.8) par(old.par) }
Predict with an earth
model.
## S3 method for class 'earth' predict(object = stop("no 'object' argument"), newdata = NULL, type = c("link", "response", "earth", "class", "terms"), interval = "none", level = .95, thresh = .5, trace = FALSE, ...)
## S3 method for class 'earth' predict(object = stop("no 'object' argument"), newdata = NULL, type = c("link", "response", "earth", "class", "terms"), interval = "none", level = .95, thresh = .5, trace = FALSE, ...)
object |
An |
newdata |
Make predictions using |
type |
Type of prediction.
One of |
interval |
Return prediction or confidence levels.
Default is |
level |
Confidence level for the |
thresh |
Threshold, a value between 0 and 1 when predicting a probability.
Only applies when |
trace |
Default |
... |
Unused, but provided for generic/method consistency. |
The predicted values (a matrix for multiple response models).
If type="terms"
, a matrix with each column showing the contribution of a predictor.
If interval="pint"
or "cint"
, a matrix with three columns:fit
: the predicted valueslwr
: the lower confidence or prediction limitupr
: the upper confidence or prediction limit
If interval="se"
, the standard errors.
Predicting with standard earth models
Use the default type="link"
, or possibly type="class"
.
Actually, the "link"
, "response"
, and "earth"
choices all return the same value unless the glm
argument
was used in the original call to earth
.
Predicting with earth-GLM models
This section applies to earth models with a GLM component, i.e.,
when the glm
argument was used
in the original call to earth
.
The "link"
and "response"
options:
see predict.glm
for a description of these.
In brief: for logistic models
use type="response"
to get probabilities,
and type="link"
to get log-odds.
Use option "earth"
to get the linear fit (this gives the prediction you would get
if your original call to earth had no glm
argument).
Predicting with "class"
Use option "class"
to get the predicted class.
With option "class"
, this function first makes predictions with
type="response"
and then assigns the predicted values to classes as follows:
(i) When the response is a logical, predict TRUE
if
the predicted probability is greater than thresh
(default 0.5
).
(ii) When the response is a numeric, predict TRUE
if
the predicted value is greater than thresh
.
Actually, this is identical to the above case,
although thresh
here may legitimately be a value
outside the 0...1 range.
(iii) When the response is a two level factor,
predict the second level if its probability is more than thresh
.
In other words, with the default thresh=0.5
predict the most probable level.
(iv) When the response is a three or more level factor,
predict the most probable level (and thresh
is ignored).
Predicting with "terms"
The "terms"
option returns a "link"
response suitable for termplot
.
Only the additive terms and the first response (for multi-response models) are returned.
Also, "terms"
always returns the earth terms, and ignores the GLM component
of the model, if any.
data(trees) earth.mod <- earth(Volume ~ ., data = trees) predict(earth.mod) # same as earth.mod$fitted.values predict(earth.mod, data.frame(Girth=10, Height=80)) # yields 17.6 predict(earth.mod, c(10,80)) # equivalent
data(trees) earth.mod <- earth(Volume ~ ., data = trees) predict(earth.mod) # same as earth.mod$fitted.values predict(earth.mod, data.frame(Girth=10, Height=80)) # yields 17.6 predict(earth.mod, c(10,80)) # equivalent
You probably won't need to call this function directly.
It is called by predict.earth
when that function's interval
argument is used.
## S3 method for class 'varmod' predict( object = stop("no 'object' argument"), newdata = NULL, type = c("pint", "cint", "se", "abs.residual"), level = .95, trace = FALSE, ...)
## S3 method for class 'varmod' predict( object = stop("no 'object' argument"), newdata = NULL, type = c("pint", "cint", "se", "abs.residual"), level = .95, trace = FALSE, ...)
object |
A |
newdata |
Make predictions using |
type |
Type of prediction. This is the |
level |
Confidence level for the |
trace |
Currently unused. |
... |
Unused, but provided for generic/method consistency. |
predict.varmod
is called by predict.earth
when its interval
argument is used.
data(ozone1) set.seed(1) # optional, for cross validation reproducibility # note: should really use ncross=30 below but for a quick demo we don't earth.mod <- earth(O3~temp, data=ozone1, nfold=10, ncross=3, varmod.method="lm") # call predict.earth, which calls predict.varmod predict(earth.mod, newdata=ozone1[200:203,], interval="pint", level=.95)
data(ozone1) set.seed(1) # optional, for cross validation reproducibility # note: should really use ncross=30 below but for a quick demo we don't earth.mod <- earth(O3~temp, data=ozone1, nfold=10, ncross=3, varmod.method="lm") # call predict.earth, which calls predict.varmod predict(earth.mod, newdata=ozone1[200:203,], interval="pint", level=.95)
Residuals of an earth
model.
## S3 method for class 'earth' residuals(object = stop("no 'object' argument"), type = NULL, warn = TRUE, ...)
## S3 method for class 'earth' residuals(object = stop("no 'object' argument"), type = NULL, warn = TRUE, ...)
object |
An |
type |
One of: |
warn |
This function gives warnings when the results are not what you may expect.
Use |
... |
Unused, but provided for generic/method consistency. |
The residual values (will be a matrix for multiple response models).
earth
residuals
resid
identical to residuals
data(etitanic) earth.mod <- earth(pclass ~ ., data=etitanic, glm=list(family=binomial)) head(resid(earth.mod, warn=FALSE)) # earth residuals, a column for each response head(resid(earth.mod, type="response")) # GLM response resids, a column for each response
data(etitanic) earth.mod <- earth(pclass ~ ., data=etitanic, glm=list(family=binomial)) head(resid(earth.mod, warn=FALSE)) # earth residuals, a column for each response head(resid(earth.mod, type="response")) # GLM response resids, a column for each response
Summary method for earth
objects.
## S3 method for class 'earth' summary(object = stop("no 'object' argument"), details = FALSE, style = c("h", "pmax", "max", "C", "bf"), decomp = "anova", digits = getOption("digits"), fixed.point=TRUE, newdata = NULL, ...) ## S3 method for class 'summary.earth' print(x = stop("no 'x' argument"), details = x$details, decomp = x$decomp, digits = x$digits, fixed.point = x$fixed.point, newdata = x$newdata, ...)
## S3 method for class 'earth' summary(object = stop("no 'object' argument"), details = FALSE, style = c("h", "pmax", "max", "C", "bf"), decomp = "anova", digits = getOption("digits"), fixed.point=TRUE, newdata = NULL, ...) ## S3 method for class 'summary.earth' print(x = stop("no 'x' argument"), details = x$details, decomp = x$decomp, digits = x$digits, fixed.point = x$fixed.point, newdata = x$newdata, ...)
object |
An |
x |
A |
details |
Default is |
style |
Formatting style. One of |
decomp |
Specify how terms are ordered.
Default is |
digits |
The number of significant digits. |
fixed.point |
Method of printing numbers in matrices.
Default is (Intercept) 15.029 h(temp-58) 0.313 h(234-ibt) -0.046 ... whereas (Intercept) 1.5e+01 h(temp-58) 3.1e-01 h(234-ibt) -4.6e-02 ... Matrices with two or fewer rows are never printed with a fixed point. |
newdata |
Default |
... |
Extra arguments are passed to |
The value is the same as that returned by earth
but with the following extra components.
strings |
String(s) created by |
newrsq |
Only if |
newdata |
Only if |
digits |
|
details |
|
decomp |
|
fixed.point |
The corresponding arguments, passed on to |
The printed Estimated importance
uses evimp
with the nsubsets
criterion.
The most important predictor is printed first, and so on.
earth.mod <- earth(Volume~ ., data = trees) summary(earth.mod, digits = 2)
earth.mod <- earth(Volume~ ., data = trees) summary(earth.mod, digits = 2)
Update an earth
model.
## S3 method for class 'earth' update(object = stop("no 'object' argument"), formula. = NULL, ponly = FALSE, ..., evaluate = TRUE)
## S3 method for class 'earth' update(object = stop("no 'object' argument"), formula. = NULL, ponly = FALSE, ..., evaluate = TRUE)
object |
The earth object |
formula. |
The |
ponly |
Force pruning only, no forward pass.
Default is |
... |
Arguments passed on to |
evaluate |
If |
If only the following arguments are used, a forward pass
is unnecessary, and update.earth
will perform only the pruning pass.
This is usually much faster for large models.
object glm trace nprune pmethod Eval.model.subsets Print.pruning.pass Force.xtx.prune Use.beta.cache Endspan.penalty Get.leverages
This automatic determination to do a forward pass can be overridden
with the ponly
argument.
If ponly=TRUE
the forward pass will be skipped and only the pruning pass will be executed.
This is useful for doing a pruning pass with new data.
(Use earth's data
argument to specify the new data.)
Typically in this scenario you would also specify penalty=-1
.
This is because with sufficient new data, independent of the original
training data, the RSS not the GCV should be used for evaluating model
subsets
(The GCV approximates what the RSS would be on new data — but here
we actually have new data, so why bother approximating.
This "use new data for pruning" approach is useful in situations where
you don't trust the GCV approximation for your data.)
By making penalty=-1
, earth will calculate the RSS, not the GCV.
See also the description of penalty
on the
earth
help page.
Another (somewhat esoteric) use of ponly=TRUE
is to do subset
selection with a different penalty
from that used to build the
original model.
With trace=1
, update.earth
will tell you if earth's
forward pass was skipped.
If you used keepxy=TRUE
in your original call to earth
, then
update.earth
will use the saved values of x
, y
, etc.,
unless you specify otherwise by arguments to update.earth
.
It can be helpful to set trace=1
to see which x
and y
is used by update.earth
.
The value is the same as that returned by earth
.
If object
is the only parameter then no changes are made
— the returned value will be the same as the original object
.
data(ozone1) (earth.mod <- earth(O3 ~ ., data = ozone1, degree = 2)) update(earth.mod, formula = O3 ~ . - temp) # requires forward pass and pruning update(earth.mod, nprune = 8) # requires only pruning update(earth.mod, penalty=1, ponly=TRUE) # pruning pass only with a new penalty
data(ozone1) (earth.mod <- earth(O3 ~ ., data = ozone1, degree = 2)) update(earth.mod, formula = O3 ~ . - temp) # requires forward pass and pruning update(earth.mod, nprune = 8) # requires only pruning update(earth.mod, penalty=1, ponly=TRUE) # pruning pass only with a new penalty
A variance model estimates the variance of predicted values.
It can be used to estimate prediction intervals.
See the interval
argument of predict.earth
.
A variance model is built by earth
if earth
's
varmod.method
argument is specified.
Results are stored in the $varmod
field of the earth
model.
See the vignette “Variance models in earth” for details.
You probably won't need to directly call
print.varmod
or summary.varmod
.
They get called internally by summary.earth
.
## S3 method for class 'varmod' summary( object = stop("no 'object' argument"), level = .95, style = "standard", digits = 2, newdata = NULL, ...)
## S3 method for class 'varmod' summary( object = stop("no 'object' argument"), level = .95, style = "standard", digits = 2, newdata = NULL, ...)
object |
A |
level |
Same as |
style |
Determines how the coefficients of the |
digits |
Number of digits to print. Default is |
newdata |
Default |
... |
Dots are passed on. |
A "varmod"
object has the following fields:
call
The call used internally in the parent model to build the varmod
object.
parent
The parent earth
model.
method
Copy of the varmod.method
argument to the parent model.
package
NULL, unless method="gam"
, in which case either "gam"
or "mgcv"
.
exponent
Copy of the varmod.exponent
argument to the parent model.
lambda
Currently always 1, meaning use absolute residuals.
rmethod
Currently always "hc2", meaning correct the residuals with 1/(1-h_ii)
.
converged
Did the residual submodel IRLS converge?
iters
Number of residual model IRLS iterations (1 to 50).
residmod
The residual submodel.
So for example, if varmod.method="lm"
, this will be an lm
object.
min.sd
The predicted residual standard deviation is clamped
so it will always be at least this value.
This prevents prediction of negative or absurdly small variances.
See earth
's varmod.clamp
argument.
Clamping takes place in predict.varmod
, which is called
by predict.earth
when estimating prediction intervals.
model.var
An n x 1 matrix.
The model.var
for an observation is the estimated model
variance for that observation over all datasets, and is estimated with
repeated cross validation.
It is the variance of the mean out-of-fold prediction for that
observation over ncross
repetitions.
abs.resids
An n x 1 matrix.
The absolute residuals used to build the residual model.
parent.x
An n x p matrix. Parent earth model x
.
parent.y
An n x 1 matrix. Parent earth model y
.
iter.rsq
Weighted R-Squared of residual submodel residmod
,
after IRLS iteration.
iter.stderr
Standard errors of the coefficients of the residual submodel residmod
,
after IRLS iteration.
data(ozone1) set.seed(1) # optional, for cross validation reproducibility # note: should really use ncross=30 below but for a quick demo we don't earth.mod <- earth(O3~temp, data=ozone1, nfold=10, ncross=3, varmod.method="lm") print(summary(earth.mod)) # note additional info on the variance model old.mfrow <- par(mfrow=c(2,2), mar=c(3, 3, 3, 1), mgp=c(1.5, 0.5, 0)) plotmo(earth.mod, do.par=FALSE, response.col=1, level=.90, main="earth model: O3~temp") plot(earth.mod, which=3, level=.90) # residual plot: note 90% pred and darker conf intervals par(par=old.mfrow)
data(ozone1) set.seed(1) # optional, for cross validation reproducibility # note: should really use ncross=30 below but for a quick demo we don't earth.mod <- earth(O3~temp, data=ozone1, nfold=10, ncross=3, varmod.method="lm") print(summary(earth.mod)) # note additional info on the variance model old.mfrow <- par(mfrow=c(2,2), mar=c(3, 3, 3, 1), mgp=c(1.5, 0.5, 0)) plotmo(earth.mod, do.par=FALSE, response.col=1, level=.90, main="earth model: O3~temp") plot(earth.mod, which=3, level=.90) # residual plot: note 90% pred and darker conf intervals par(par=old.mfrow)