Title: | Boosting for Multivariate Longitudinal Responses |
---|---|
Description: | Jointly models the multivariate longitudinal responses and multiple covariates and time using gradient boosting approach. |
Authors: | Amol Pande, Hemant Ishwaran |
Maintainer: | Amol Pande <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.3 |
Built: | 2025-02-18 04:25:12 UTC |
Source: | https://github.com/cran/BoostMLR |
The primary feature of the package is to jointly model the multiple longitudinal responses (referred to as multivariate longitudinal response) and multiple covariates and time from a longitudinal study using gradient boosting approach (Pande et al., 2020). Covariates can be time-varying or time-invariant. Special cases include modeling of univariate longitudinal response from a longitudinal study, and univariate or multivariate response from a cross-sectional study. In all cases, responses are assumed to be continuous. The estimated coefficient can be a function of time (referred to as time-varying coefficient in case of a longitudinal study) or a function of pre-specified covariate (in case of a longitudinal or a cross-sectional study) or fixed.
This package allows joint modeling of a multivariate longitudinal response, which is based on marginal model. Estimation is performed using gradient boosting, a generic form of boosting (Friedman J. H., 2001). The boosting approach use in this package is closely related to component-wise L2 boosting with L1 penalization. Package can handle high dimensionalilty of covariate and response when some of the covariates and responses are pure noise.
The package is designed to identify covariates that affect responses differently as different time intervals. This idea is helpful to dissect an overall effect of covariate into different time intervals. For example, some covariates affect response at the beginning of the follow-up whereas others at a later stage.
This package contains many useful functions and users should read the help file in its entirety for details. However, we briefly mention several key functions that may make it easier to navigate and understand the layout of the package.
This is the main entry point to the package. The model is fit using the gradient boosting approach for the user specified training data.
updateBoostMLR
(updateBoostMLR
)
This allows to update the model by specifying additional boosting iteration.
predictBoostMLR
(predictBoostMLR
)
Model performace can be obtained using the test set data. This function also estimate variable importance (VIMP).
Amol Pande and Hemant Ishwaran
Maintainer: Amol Pande <[email protected]>
Pande A., Ishwaran H., Blackstone E.H. (2020). Boosting for multivariate longitudinal response.
Friedman J.H. (2001). Greedy function approximation: a gradient boosting machine, Ann. of Statist., 5:1189-1232.
BoostMLR
,
updateBoostMLR
,
predictBoostMLR
,
simLong
Function jointly models the multiple longitudinal responses (referred to as multivariate longitudinal response) and multiple covariates and time from a longitudinal study using gradient boosting approach (Pande et al., 2020). Covariates can be time-varying or time-invariant. Special cases include modeling of univariate longitudinal response from a longitudinal study, and univariate or multivariate response from a cross-sectional study. In all cases, responses are assumed to be continuous. The estimated coefficient can be a function of time (referred to as time-varying coefficient in case of a longitudinal study) or a function of pre-specified covariate (in case of a longitudinal or a cross-sectional study) or fixed.
BoostMLR(x, tm, id, y, Time_Varying = TRUE, BS_Time = TRUE, nknots_t = 10, d_t = 3, All_RawX = TRUE, RawX_Names, nknots_x = 7, d_x = 3, M = 200, nu = 0.05, Mod_Grad = TRUE, Shrink = FALSE, VarFlag = TRUE, lower_perc = 0.25, upper_perc = 0.75, NLambda = 100, Verbose = TRUE, Trace = FALSE, lambda = 0, setting_seed = FALSE, seed_value = 100L, ...)
BoostMLR(x, tm, id, y, Time_Varying = TRUE, BS_Time = TRUE, nknots_t = 10, d_t = 3, All_RawX = TRUE, RawX_Names, nknots_x = 7, d_x = 3, M = 200, nu = 0.05, Mod_Grad = TRUE, Shrink = FALSE, VarFlag = TRUE, lower_perc = 0.25, upper_perc = 0.75, NLambda = 100, Verbose = TRUE, Trace = FALSE, lambda = 0, setting_seed = FALSE, seed_value = 100L, ...)
x |
Data frame (or matrix) containing x-values (covariates).
The number of rows should match with number of rows of response |
tm |
Vector of time values, one entry for each row of the response |
id |
Vector of subject identifier with same length as the number of rows of |
y |
Data frame (or matrix) containing the y-values (response) in case of multivariate response or a vector of y-values in case of univariate response. |
Time_Varying |
Time-varying coefficient model or a fixed coefficient model? |
BS_Time |
If |
nknots_t |
If |
d_t |
If |
All_RawX |
Use original scale of |
RawX_Names |
If |
nknots_x |
Specify number of knots for B-spline of |
d_x |
Specify degree of polynomial for B-spline of |
M |
Number of boosting iterations. |
nu |
Boosting regularization parameter. A value from the interval (0,1]. |
Mod_Grad |
Use a modified gradient? Modified gradient is a special type of gradient that is independent of the correlation coefficient. Pande A. (2017) observed that prediction performance increases under modified gradient. |
Shrink |
Allow estimated coefficient to shrink to zero using L1 penalization? |
VarFlag |
Estimate the variance (scale parameter) and correlation parameter
for each |
lower_perc |
Lower percentile value is used to determine the lower cut-off for the distribution of parameter estimate. Applicable when |
upper_perc |
Upper percentile value is used to determine the upper cut-off for the distribution of parameter estimate. Applicable when |
NLambda |
Number of replications for generating distribution of parameter estimates. Applicable when |
Verbose |
Print the current stage of boosting iteration? |
Trace |
Print the current stage of execution? Useful for identifying location in case error occurs. |
lambda |
Additional penaulty; not implemented at this time. |
setting_seed |
Set |
seed_value |
Seed value. |
... |
Further arguments passed to or from other methods. |
This is a non-parametric approach for joint modeling of a multivariate longitudinal response, which is based on marginal model. Estimation is performed using gradient boosting, a generic form of boosting (Friedman J. H., 2001). Our boosting approach is closely related to component-wise L2 boosting with L1 penalization. Approach can handle high dimensionalilty of covariate and response when some of the covariates and responses are pure noise.
Approach is designed to identify covariates that affect responses differently as different time intervals. This idea is helpful to dissect an overall effect of covariate into different time intervals. For example, some covariates affect response at the beginning of the follow-up whereas others at a later stage.
Shrinking allows for early termination of boosting to prevent overfitting. Also, it provides a parsimonious model by shrinking coefficient for non-informative covariate-response pair to zero.
x |
Matrix containing x-values. |
id |
Vector of subject identifier. |
tm |
Vector of time values. |
y |
Matrix containing y-values. |
UseRaw |
Logical vector indicating indexes of covariates which are used as it is without B-spline mapping. |
x_Names |
Variable names of |
y_Names |
Variable names of |
M |
Number of boosting iterations. If boosting terminates before
a pre-specified |
nu |
Regularization parameter. |
Tm_Beta |
An estimate of the parameter beta. This consist of a list of
length equal to the number of multivariate response (denoted by L). If |
mu |
Estimate of the conditional expectation of |
Error_Rate |
Training error rate for each response across the boosting iterations. |
Variable_Select |
Indexes of important covariates that get picked-up across time and across boosting iterations. Result is shown as a matrix with M rows and H (number of overlapping time intervals) columns, where each element represents index of covariate. |
Response_Select |
Indexes of important responses that get picked-up across time and across boosting iterations. Result is shown as a matrix with M rows and H columns, where each element represents index of response variable. |
VarFlag |
Whether the variance (scale parameter) and correlation are estimated? |
Time_Varying |
Whether estimates are time-varying or fixed? |
Phi |
Matrix, having dimension M by L, representing an estimate of variance (scale parameter) for each response across the boosting iterations. |
Rho |
Matrix, having dimension M by L, represent an estimate of correlation for each response across the boosting iterations. |
Lambda_List |
Estimate of the lambda (the L1 penaulty parameter) for each boosting iterations. Useful for internal calculation. |
Grow_Object |
Useful for internal calculation. |
Amol Pande and Hemant Ishwaran
Pande A., Ishwaran H., Blackstone E.H. (2020). Boosting for multivariate longitudinal response.
Pande A., Li L., Rajeswaran J., Ehrlinger J., Kogalur U.B., Blackstone E.H., Ishwaran H. (2017). Boosted multivariate trees for longitudinal data, Machine Learning, 106(2): 277–305.
Pande A. (2017). Boosting for longitudinal data. Ph.D. Dissertation, Miller School of Medicine, University of Miami.
Friedman J.H. (2001). Greedy function approximation: a gradient boosting machine, Ann. of Statist., 5:1189-1232.
updateBoostMLR
,
predictBoostMLR
,
simLong
##----------------------------------------------------------------- ## Multivariate Longitudinal Response ##----------------------------------------------------------------- # Simulate data involves 3 response and 4 covariates dta <- simLong(n = 100, N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym")$dtaL # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dta$features, tm = dta$time, id = dta$id, y = dta$y, M = 100, VarFlag = FALSE) # Plot training error plotBoostMLR(boost.grow$Error_Rate,xlab = "m",ylab = "Training Error") ##----------------------------------------------------------------- ## Laboratory data ##----------------------------------------------------------------- data(Laboratory_Data, package = "BoostMLR") Var_Names <- colnames(Laboratory_Data) x_Names <- setdiff(Var_Names, c("id","time","tbili_po","creat_po")) dta_id <- Laboratory_Data[,"id"] dta_time <- Laboratory_Data[,"time"] dta_x <- Laboratory_Data[,x_Names] dta_y <- Laboratory_Data[,c("tbili_po","creat_po")] boost.grow <- BoostMLR(x = dta_x,tm = dta_time,id = dta_id,y = dta_y, Time_Varying = TRUE,BS_Time = TRUE, All_RawX = TRUE,M = 10, VarFlag = TRUE) ##----------------------------------------------------------------- ## Univariate Longitudinal Response ##----------------------------------------------------------------- # Simulate data involves 1 response and 4 covariates dta <- simLong(n = 100, N = 5, rho =.80, model = 2, q_x = 0, q_y = 0,type = "corCompSym")$dtaL # Boosting call: B-spline for time and covariates, shrinkage, # estimate of rho and phi boost.grow <- BoostMLR(x = dta$features, tm = dta$time, id = dta$id, y = dta$y, M = 100, BS_Time = TRUE, All_RawX = FALSE, Shrink = TRUE,VarFlag = TRUE) # Plot training error plotBoostMLR(boost.grow$Error_Rate,xlab = "m",ylab = "Training Error") # Plot phi plotBoostMLR(boost.grow$Phi,xlab = "m",ylab = "phi") # Plot rho plotBoostMLR(boost.grow$Rho,xlab = "m",ylab = "rho") ##----------------------------------------------------------------- ## Multivariate Longitudinal Response ##----------------------------------------------------------------- # Simulate data involves 3 response and 4 covariates dta <- simLong(n = 100, N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym")$dtaL # Boosting call: Raw values of covariates, fixed parameter estimates # instead of time varying, no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dta$features, tm = dta$time, id = dta$id, y = dta$y, M = 100,Time_Varying = FALSE,VarFlag = FALSE) # Print parameter estimates boost.grow$Tm_Beta ##----------------------------------------------------------------- ## Multivariate Response from Cross-sectional Data: Estimated ## coefficient as a function of covariate ##----------------------------------------------------------------- if (library("mlbench", logical.return = TRUE)) { data("BostonHousing") x <- BostonHousing[,c(1:7,9:12)] tm <- BostonHousing[,8] id <- 1:nrow(BostonHousing) y <- BostonHousing[,13:14] # Boosting call: Raw values of covariates, B-spline for covariate "dis", # no shrinkage boost.grow <- BoostMLR(x = x, tm = tm, id = id, y = y, M = 100,VarFlag = FALSE) # Plot training error plotBoostMLR(boost.grow$Error_Rate,xlab = "m",ylab = "Training Error", legend_fraction_x = 0.2) } ##----------------------------------------------------------------- ## Univariate Response from Cross-sectional Data: Fixed estimated ## coefficient ##----------------------------------------------------------------- if (library("mlbench", logical.return = TRUE)) { library(mlbench) data("BostonHousing") x <- BostonHousing[,1:13] y <- BostonHousing[,14] # Boosting call: Raw values of covariates boost.grow <- BoostMLR(x = x, y = y, M = 100) # Plot training error plotBoostMLR(boost.grow$Error_Rate,xlab = "m",ylab = "Training Error", legend_fraction_x = 0.2) }
##----------------------------------------------------------------- ## Multivariate Longitudinal Response ##----------------------------------------------------------------- # Simulate data involves 3 response and 4 covariates dta <- simLong(n = 100, N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym")$dtaL # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dta$features, tm = dta$time, id = dta$id, y = dta$y, M = 100, VarFlag = FALSE) # Plot training error plotBoostMLR(boost.grow$Error_Rate,xlab = "m",ylab = "Training Error") ##----------------------------------------------------------------- ## Laboratory data ##----------------------------------------------------------------- data(Laboratory_Data, package = "BoostMLR") Var_Names <- colnames(Laboratory_Data) x_Names <- setdiff(Var_Names, c("id","time","tbili_po","creat_po")) dta_id <- Laboratory_Data[,"id"] dta_time <- Laboratory_Data[,"time"] dta_x <- Laboratory_Data[,x_Names] dta_y <- Laboratory_Data[,c("tbili_po","creat_po")] boost.grow <- BoostMLR(x = dta_x,tm = dta_time,id = dta_id,y = dta_y, Time_Varying = TRUE,BS_Time = TRUE, All_RawX = TRUE,M = 10, VarFlag = TRUE) ##----------------------------------------------------------------- ## Univariate Longitudinal Response ##----------------------------------------------------------------- # Simulate data involves 1 response and 4 covariates dta <- simLong(n = 100, N = 5, rho =.80, model = 2, q_x = 0, q_y = 0,type = "corCompSym")$dtaL # Boosting call: B-spline for time and covariates, shrinkage, # estimate of rho and phi boost.grow <- BoostMLR(x = dta$features, tm = dta$time, id = dta$id, y = dta$y, M = 100, BS_Time = TRUE, All_RawX = FALSE, Shrink = TRUE,VarFlag = TRUE) # Plot training error plotBoostMLR(boost.grow$Error_Rate,xlab = "m",ylab = "Training Error") # Plot phi plotBoostMLR(boost.grow$Phi,xlab = "m",ylab = "phi") # Plot rho plotBoostMLR(boost.grow$Rho,xlab = "m",ylab = "rho") ##----------------------------------------------------------------- ## Multivariate Longitudinal Response ##----------------------------------------------------------------- # Simulate data involves 3 response and 4 covariates dta <- simLong(n = 100, N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym")$dtaL # Boosting call: Raw values of covariates, fixed parameter estimates # instead of time varying, no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dta$features, tm = dta$time, id = dta$id, y = dta$y, M = 100,Time_Varying = FALSE,VarFlag = FALSE) # Print parameter estimates boost.grow$Tm_Beta ##----------------------------------------------------------------- ## Multivariate Response from Cross-sectional Data: Estimated ## coefficient as a function of covariate ##----------------------------------------------------------------- if (library("mlbench", logical.return = TRUE)) { data("BostonHousing") x <- BostonHousing[,c(1:7,9:12)] tm <- BostonHousing[,8] id <- 1:nrow(BostonHousing) y <- BostonHousing[,13:14] # Boosting call: Raw values of covariates, B-spline for covariate "dis", # no shrinkage boost.grow <- BoostMLR(x = x, tm = tm, id = id, y = y, M = 100,VarFlag = FALSE) # Plot training error plotBoostMLR(boost.grow$Error_Rate,xlab = "m",ylab = "Training Error", legend_fraction_x = 0.2) } ##----------------------------------------------------------------- ## Univariate Response from Cross-sectional Data: Fixed estimated ## coefficient ##----------------------------------------------------------------- if (library("mlbench", logical.return = TRUE)) { library(mlbench) data("BostonHousing") x <- BostonHousing[,1:13] y <- BostonHousing[,14] # Boosting call: Raw values of covariates boost.grow <- BoostMLR(x = x, y = y, M = 100) # Plot training error plotBoostMLR(boost.grow$Error_Rate,xlab = "m",ylab = "Training Error", legend_fraction_x = 0.2) }
Show the NEWS file of the BoostMLR package.
BoostMLR.news(...)
BoostMLR.news(...)
... |
Further arguments passed to or from other methods. |
None.
Amol Pande and Hemant Ishwaran
The laboratory data is based on 459 patients who were listed for heart transplant and were put on mechanical circulatory system through device implantation from December 1991 to July 2009 at Cleveland Clinic. These patients had periodic measurements of their bilirubin and creatinine levels. Data from 459 patients includes 18285 measurements of bilirubin and creatinine with an average of 39 measurements per patient.
Laboratory data has 4 parts:
A total of 41 x-variables.
Time points (time).
Patient identifier (id).
Longitudinal responses (tbili_po and creat_po).
Rajeswaran J., Blackstone E.H. and Bernard J. Evolution of association between renal and liver function while awaiting for the heart transplant: An application using bivariate multiphase nonlinear mixed effect model. Statistical methods in medical research 27(7):2216–2230, 2018.
data(Laboratory_Data, package = "BoostMLR")
data(Laboratory_Data, package = "BoostMLR")
Partial dependence plot of x and time against adjusted predicted y.
## S3 method for class 'BoostMLR' partial(Object, xvar.name, n.x = 10, n.tm = 10, x.unq = NULL, tm.unq = NULL, Mopt, plot.it = TRUE, path_saveplot = NULL, Verbose = TRUE, ...)
## S3 method for class 'BoostMLR' partial(Object, xvar.name, n.x = 10, n.tm = 10, x.unq = NULL, tm.unq = NULL, Mopt, plot.it = TRUE, path_saveplot = NULL, Verbose = TRUE, ...)
Object |
A boosting object of class |
xvar.name |
Name of the x-variable to be used for partial plot. |
n.x |
Maximum number of unique points used for |
n.tm |
Maximum number of unique points used for |
x.unq |
Unique values used for the partial plot for variable |
tm.unq |
Unique time points used for the partial plots of x against y.
Default is NULL in which case
unique values are obtained uniformaly based on the range of |
Mopt |
The optimal number of boosting iteration. If missing, the value from
the |
plot.it |
Should partial plot be displayed? |
path_saveplot |
Provide the location where plot should be saved. By default the plot will be saved at temporary folder. |
Verbose |
Display the path where the plot is saved? |
... |
Further arguments passed to or from other methods. |
Partial dependence plot (Friedman, 2001) of x values specified by
xvar.name
against the adjusted predicted y-values over a set
of time points specified by tm.unq
.
x.unq |
Unique values used for the partial plot for variable |
tm.unq |
Unique time points used for the partial plots of x against y. |
pList |
List with number of elements equal to number of multivariate response.
Each element of the list is a matrix with number of rows equal to length of |
sList |
List with number of elements equal to number of multivariate response.
Each element is a matrix with the same dimension as described in |
Amol Pande and Hemant Ishwaran
Friedman J.H. Greedy function approximation: a gradient boosting machine, Ann. of Statist., 5:1189-1232, 2001.
##------------------------------------------------------------ ## Generate partial plot for covariate x1 ##------------------------------------------------------------- dta <- simLong(n = 100, N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym")$dtaL # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dta$features, tm = dta$time, id = dta$id, y = dta$y, M = 100, VarFlag = FALSE) Partial_Plot_x1 <- partial.BoostMLR(Object = boost.grow, xvar.name = "x1",plot.it = FALSE)
##------------------------------------------------------------ ## Generate partial plot for covariate x1 ##------------------------------------------------------------- dta <- simLong(n = 100, N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym")$dtaL # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dta$features, tm = dta$time, id = dta$id, y = dta$y, M = 100, VarFlag = FALSE) Partial_Plot_x1 <- partial.BoostMLR(Object = boost.grow, xvar.name = "x1",plot.it = FALSE)
Plotting training and test error, and estimate of variance/correlation parameters across the boosting iterations.
plotBoostMLR(Result, xlab = "", ylab = "", legend_fraction_x = 0.10, legend_fraction_y = 0, ...)
plotBoostMLR(Result, xlab = "", ylab = "", legend_fraction_x = 0.10, legend_fraction_y = 0, ...)
Result |
Result in the matrix form either training or test error, or estimate of variance/correlation parameters across the boosting iterations. |
xlab |
Label for the x-axis. |
ylab |
Label for the y-axis. |
legend_fraction_x |
Value use to expland the x-axis. |
legend_fraction_y |
Value use to expland the y-axis. |
... |
Further arguments passed to or from other methods. |
Plotting training and test error, and estimate of variance/correlation parameters across the boosting iterations.
Amol Pande and Hemant Ishwaran
##----------------------------------------------------------------- ## Multivariate Longitudinal Response ##----------------------------------------------------------------- # Simulate data involves 3 response and 4 covariates dta <- simLong(n = 100, N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym")$dtaL # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dta$features, tm = dta$time, id = dta$id, y = dta$y, M = 100, VarFlag = FALSE) # Plot training error plotBoostMLR(boost.grow$Error_Rate,xlab = "m",ylab = "Training Error")
##----------------------------------------------------------------- ## Multivariate Longitudinal Response ##----------------------------------------------------------------- # Simulate data involves 3 response and 4 covariates dta <- simLong(n = 100, N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym")$dtaL # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dta$features, tm = dta$time, id = dta$id, y = dta$y, M = 100, VarFlag = FALSE) # Plot training error plotBoostMLR(boost.grow$Error_Rate,xlab = "m",ylab = "Training Error")
Barplot displaying variable importance for the main effect.
plotVIMP(vimp_Object, xvar.names = NULL, cex.xlab = NULL, ymaxlim = 0, yminlim = 0, main = "Variable Importance (%)", col = grey(0.8), cex.lab = 1.5, ylbl = NULL, legend_placement = NULL, plot.it = TRUE, path_saveplot = NULL, Verbose = TRUE)
plotVIMP(vimp_Object, xvar.names = NULL, cex.xlab = NULL, ymaxlim = 0, yminlim = 0, main = "Variable Importance (%)", col = grey(0.8), cex.lab = 1.5, ylbl = NULL, legend_placement = NULL, plot.it = TRUE, path_saveplot = NULL, Verbose = TRUE)
vimp_Object |
List with number of elements equal to the number of response variables. |
xvar.names |
Names of the covariates. If NULL, names will be pulled from |
cex.xlab |
Magnification of the names of the covariates for the barplot. |
ymaxlim |
By default, we use the range of the vimp values for the barplot limit on the y-axis. If one wants to extend the limit, add the amount with which the limit will extend above the x-axis. |
yminlim |
Similar to |
main |
Main title for the plot. |
col |
Color of the plot. |
cex.lab |
Magnification of the x and y lables. |
ylbl |
Label for the y-axis. |
legend_placement |
Do you want name of the covariates on top of the each barplot? If so, use default setting; else set value on the negative direction of y-axis which arrange covariate name beneath the barplot. |
plot.it |
Should the VIMP plot be displayed? |
path_saveplot |
Provide the location where plot should be saved. By default the plot will be saved at temporary folder. |
Verbose |
Display the path where the plot is saved? |
Barplot displaying VIMP for each response. Barplot will be save as pdf file in the working directory.
Amol Pande and Hemant Ishwaran
##----------------------------------------------------------------- ## VIMP plot for multivariate longitudinal response ##----------------------------------------------------------------- # Simulate data involves 3 response and 4 covariates dta <- simLong(n = 100, ntest = 100 ,N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym") dtaL <- dta$dtaL trn <- dta$trn # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dtaL$features[trn,], tm = dtaL$time[trn], id = dtaL$id[trn], y = dtaL$y[trn,], M = 100, VarFlag = FALSE) boost.pred <- predictBoostMLR(Object = boost.grow, x = dtaL$features[-trn,], tm = dtaL$time[-trn], id = dtaL$id[-trn], y = dtaL$y[-trn,], importance = TRUE) # Plot VIMP plotVIMP(vimp_Object = boost.pred$vimp,ymaxlim = 20,plot.it = FALSE)
##----------------------------------------------------------------- ## VIMP plot for multivariate longitudinal response ##----------------------------------------------------------------- # Simulate data involves 3 response and 4 covariates dta <- simLong(n = 100, ntest = 100 ,N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym") dtaL <- dta$dtaL trn <- dta$trn # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dtaL$features[trn,], tm = dtaL$time[trn], id = dtaL$id[trn], y = dtaL$y[trn,], M = 100, VarFlag = FALSE) boost.pred <- predictBoostMLR(Object = boost.grow, x = dtaL$features[-trn,], tm = dtaL$time[-trn], id = dtaL$id[-trn], y = dtaL$y[-trn,], importance = TRUE) # Plot VIMP plotVIMP(vimp_Object = boost.pred$vimp,ymaxlim = 20,plot.it = FALSE)
Function returns predicted values for the response. Also, if the response is provided, function returns test set performance, optimal boosting iteration, and variable importance (VIMP).
predictBoostMLR(Object, x, tm, id, y, M, importance = FALSE, eps = 1e-5, setting_seed = FALSE, seed_value = 100L, ...)
predictBoostMLR(Object, x, tm, id, y, M, importance = FALSE, eps = 1e-5, setting_seed = FALSE, seed_value = 100L, ...)
Object |
A boosting object obtained using the function |
x |
Data frame (or matrix) containing the test set x-values (covariates).
Covariates can be time-varying or time-invariant.
If |
tm |
Vector of test set time values.
If |
id |
Vector of test set subject identifier.
If |
y |
Data frame (or matrix) containing the test set y-values
(response) in case of multivariate response or a
vector of y-values in case of univariate response.
If |
M |
Number of boosting iterations. Value should be less than or equal
to the value specified in the |
importance |
Whether to calculate standardized variable importance (VIMP) for each covariate? |
eps |
Tolerance value used for determining the optimal |
setting_seed |
Set |
seed_value |
Seed value. |
... |
Further arguments passed to or from other methods. |
The predicted response and performance values are obtained for
the test data using the Object
grown using function BoostMLR
on
the training data.
Data |
A list with elements |
x_Names |
Variable names of |
y_Names |
Variable names of |
mu |
Estimate of conditional expectation of |
mu_Mopt |
Estimate of conditional expectation of |
Error_Rate |
Test set error rate for each multivariate response across the boosting iterations. |
Mopt |
The optimal number of boosting iteration. |
nu |
Regularization parameter. |
rmse |
Test set standardized root mean square error (sRMSE) at the |
vimp |
Standardized VIMP for each covariate. This consist of a list of length equal to the number of multivariate response. Each element from the list represents a matrix with number of rows equal to the number of covariates and the number of columns equal to the number of overlapping time intervals + 1 where the first column contains covariate main effects and all other columns contain covariate-time interaction effects. |
Pred_Object |
Useful for internal calculation. |
Amol Pande and Hemant Ishwaran
Pande A., Ishwaran H., Blackstone E.H. (2020). Boosting for multivariate longitudinal response.
Pande A., Li L., Rajeswaran J., Ehrlinger J., Kogalur U.B., Blackstone E.H., Ishwaran H. (2017). Boosted multivariate trees for longitudinal data, Machine Learning, 106(2): 277–305.
Pande A. (2017). Boosting for longitudinal data. Ph.D. Dissertation, Miller School of Medicine, University of Miami.
BoostMLR
,
updateBoostMLR
,
simLong
##----------------------------------------------------------------- ## Multivariate Longitudinal Response ##----------------------------------------------------------------- # Simulate data involves 3 response and 4 covariates dta <- simLong(n = 100, ntest = 100 ,N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym") dtaL <- dta$dtaL trn <- dta$trn # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dtaL$features[trn,], tm = dtaL$time[trn], id = dtaL$id[trn], y = dtaL$y[trn,], M = 100, VarFlag = FALSE) boost.pred <- predictBoostMLR(Object = boost.grow, x = dtaL$features[-trn,], tm = dtaL$time[-trn], id = dtaL$id[-trn], y = dtaL$y[-trn,], importance = TRUE) # Plot test set error plotBoostMLR(boost.pred$Error_Rate,xlab = "m",ylab = "Test Set Error", legend_fraction_x = 0.2)
##----------------------------------------------------------------- ## Multivariate Longitudinal Response ##----------------------------------------------------------------- # Simulate data involves 3 response and 4 covariates dta <- simLong(n = 100, ntest = 100 ,N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym") dtaL <- dta$dtaL trn <- dta$trn # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dtaL$features[trn,], tm = dtaL$time[trn], id = dtaL$id[trn], y = dtaL$y[trn,], M = 100, VarFlag = FALSE) boost.pred <- predictBoostMLR(Object = boost.grow, x = dtaL$features[-trn,], tm = dtaL$time[-trn], id = dtaL$id[-trn], y = dtaL$y[-trn,], importance = TRUE) # Plot test set error plotBoostMLR(boost.pred$Error_Rate,xlab = "m",ylab = "Test Set Error", legend_fraction_x = 0.2)
Simulates longitudinal data from multivariate and univariate longitudinal response model.
simLong(n = 100, ntest = 0, N = 5, rho = 0.8, model = c(1, 2), phi = 1, q_x = 0, q_y = 0, type = c("corCompSym", "corAR1", "corSymm", "iid"))
simLong(n = 100, ntest = 0, N = 5, rho = 0.8, model = c(1, 2), phi = 1, q_x = 0, q_y = 0, type = c("corCompSym", "corAR1", "corSymm", "iid"))
n |
Requested training sample size. |
ntest |
Requested test sample size. |
N |
Parameter controlling number of time points per subject. |
rho |
Correlation parameter. |
model |
Requested simulation model. |
phi |
Variance of measurement error. |
q_x |
Number of noise covariates. |
q_y |
Number of noise responses. |
type |
Type of correlation matrix. |
Simulates longitudinal data from multivariate and univariate longitudinal response model. We consider following 2 models:
model=1
: Simpler linear model consist of three
longitudinal responses, y1
, y2
, and y3
and
four covariates x1
, x2
, x3
, and x4
.
Response y1
is associated with x1
and x4
.
Response y2
is associated with x2
and x4
.
Response y3
is associated with x3
and x4
.
model=2
: Relatively complex model consist of
single longitudinal response and four covariates. Model includes
non-linear relationship between response and covariates and
covariate-time interaction.
An invisible list with the following components:
dtaL |
List containing the simulated data in the following order:
|
dta |
Simulated data given as a data frame. |
trn |
Index of |
Amol Pande and Hemant Ishwaran
Pande A., Li L., Rajeswaran J., Ehrlinger J., Kogalur U.B., Blackstone E.H., Ishwaran H. (2017). Boosted multivariate trees for longitudinal data, Machine Learning, 106(2): 277–305.
Function allows to update boosting object with an additional boosting iterations.
updateBoostMLR(Object, M_Add, Verbose = TRUE, ...)
updateBoostMLR(Object, M_Add, Verbose = TRUE, ...)
Object |
Boosting object. This object is previously obtained using |
M_Add |
Number of additional boosting iterations. |
Verbose |
Print the current stage of boosting iteration? |
... |
Further arguments passed to or from other methods. |
In boosting, Mopt
, the number of boosting iterations
required to achive optimal result, is unknown.
Typically, Mopt
is estimated by specifying a large value
of M
and then search for an optimal value that is less
than M
using the test data.
Function update
allows user to start with a small value of M
, and keep increamenting boosting iterations, each time running through the test data, until an optimal boosting iteration is found. This can significantly reduce unnecessary computations, particularly when Mopt
<< M
.
The procedure can be replicated multiple times using the boosting object (see example below).
Results from update
can be treated the same way we treat results
from BoostMLR
.
Amol Pande and Hemant Ishwaran
BoostMLR
,
predictBoostMLR
,
simLong
##----------------------------------------------------------------- ## Univariate Longitudinal Response ##----------------------------------------------------------------- # Simulate data involves 1 response and 4 covariates dta <- simLong(n = 100, N = 5, rho =.80, model = 2, q_x = 0, q_y = 0,type = "corCompSym")$dtaL # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dta$features, tm = dta$time, id = dta$id, y = dta$y, M = 100, VarFlag = FALSE) # Update boosting object for the additional 100 iteration boost.grow <- updateBoostMLR(Object = boost.grow, M_Add = 100,Verbose = TRUE) # Update boosting object for the additional 50 iteration boost.grow <- updateBoostMLR(Object = boost.grow, M_Add = 50,Verbose = TRUE)
##----------------------------------------------------------------- ## Univariate Longitudinal Response ##----------------------------------------------------------------- # Simulate data involves 1 response and 4 covariates dta <- simLong(n = 100, N = 5, rho =.80, model = 2, q_x = 0, q_y = 0,type = "corCompSym")$dtaL # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dta$features, tm = dta$time, id = dta$id, y = dta$y, M = 100, VarFlag = FALSE) # Update boosting object for the additional 100 iteration boost.grow <- updateBoostMLR(Object = boost.grow, M_Add = 100,Verbose = TRUE) # Update boosting object for the additional 50 iteration boost.grow <- updateBoostMLR(Object = boost.grow, M_Add = 50,Verbose = TRUE)
Calculate standardized variable importance (VIMP) for each covariate or a joint VIMP of multiple covariates.
## S3 method for class 'BoostMLR' vimp(Object, xvar.names = NULL, joint = FALSE, setting_seed = FALSE, seed_value = 100L)
## S3 method for class 'BoostMLR' vimp(Object, xvar.names = NULL, joint = FALSE, setting_seed = FALSE, seed_value = 100L)
Object |
A boosting object of class |
xvar.names |
Names of the x-variables for which VIMP is requested. If NULL, VIMP is calcuated for all the covariates. |
joint |
Whether to estimate VIMP for each covariate from |
setting_seed |
Set |
seed_value |
Seed value. |
Standardized variable importance (VIMP) is calcuated for each covariate or a joint VIMP is calculated for all the covariates specified in xvar.names
.
If joint
= FALSE, a standardized VIMP for each covariate is obtained otherwisea joint VIMP for all the covariates is obtained.
The result consists of a list of
length equal to the number of multivariate response.
Each element from the list represents a matrix with number of rows equal to the number of covariates (in case of joint VIMP, the matrix will have a single row) and the number of columns equal to the number of overlapping time intervals + 1 where the first column contains covariate main effects and all other columns contain covariate-time interaction effects.
Amol Pande and Hemant Ishwaran
Pande A., Ishwaran H., Blackstone E.H. (2020). Boosting for multivariate longitudinal response.
Friedman J.H. Greedy function approximation: a gradient boosting machine, Ann. of Statist., 5:1189-1232, 2001.
##----------------------------------------------------------------- ## Calculate individual and joint VIMP ##----------------------------------------------------------------- # Simulate data involves 3 response and 4 covariates dta <- simLong(n = 100, ntest = 100 ,N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym") dtaL <- dta$dtaL trn <- dta$trn # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dtaL$features[trn,], tm = dtaL$time[trn], id = dtaL$id[trn], y = dtaL$y[trn,], M = 100, VarFlag = FALSE) boost.pred <- predictBoostMLR(Object = boost.grow, x = dtaL$features[-trn,], tm = dtaL$time[-trn], id = dtaL$id[-trn], y = dtaL$y[-trn,], importance = FALSE) # Individual VIMP Ind_vimp <- vimp.BoostMLR(boost.pred) # Joint VIMP Joint_vimp <- vimp.BoostMLR(boost.pred,joint = TRUE)
##----------------------------------------------------------------- ## Calculate individual and joint VIMP ##----------------------------------------------------------------- # Simulate data involves 3 response and 4 covariates dta <- simLong(n = 100, ntest = 100 ,N = 5, rho =.80, model = 1, q_x = 0, q_y = 0,type = "corCompSym") dtaL <- dta$dtaL trn <- dta$trn # Boosting call: Raw values of covariates, B-spline for time, # no shrinkage, no estimate of rho and phi boost.grow <- BoostMLR(x = dtaL$features[trn,], tm = dtaL$time[trn], id = dtaL$id[trn], y = dtaL$y[trn,], M = 100, VarFlag = FALSE) boost.pred <- predictBoostMLR(Object = boost.grow, x = dtaL$features[-trn,], tm = dtaL$time[-trn], id = dtaL$id[-trn], y = dtaL$y[-trn,], importance = FALSE) # Individual VIMP Ind_vimp <- vimp.BoostMLR(boost.pred) # Joint VIMP Joint_vimp <- vimp.BoostMLR(boost.pred,joint = TRUE)