--- title: "Imputation Method vimpute" author: "Eileen Vattheuer" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Imputation Method vimpute} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, fig.align = "center" ) ``` ## Introduction This vignette demonstrates how to use the `vimpute()` function for flexible missing data imputation using machine learning models from the mlr3 ecosystem. ### Function Arguments - `data`: A datatable or dataframe containing missing values to be imputed. - `considered_variables`: A character vector of variable names to be either imputed or used as predictors, excluding irrelevant columns from the imputation process. - `method`: A named list specifying the imputation method for each variable. If not set explicitly, `vimpute()` uses `"ranger"` for all considered variables. - `pmm`: TRUE/FALSE indicating whether predictive mean matching is used. - `pmm_k`: Number of nearest neighbors used for PMM. - `pmm_k_method`: Strategy for selecting PMM donor values from the `k` nearest candidates. - `learner_params`: Optional learner-specific parameter lists for each variable (e.g. `ranger` settings such as `median = TRUE`). - `formula`: If not all variables are used as predictors, or if transformations or interactions are required (applies to all X, for Y only transformations are possible). Only applicable for the methods `"robust"`, `"regularized"`, `"gam"` and `"robgam"`. Provide as a list for each variable that requires specific conditions. - `makeNA`: Optional named list of values that should be treated as imputable missing values for selected variables. - `donorcond`: Optional named list of donor conditions used to restrict the observed training values for selected variables. - `sequential`: Specifies whether the imputation should be performed sequentially. - `nseq`: The number of sequential iterations, if `sequential` is TRUE. - `eps`: The convergence threshold for sequential imputation. - `imp_var`: Specifies whether to add indicator variables for imputed values. - `pred_history`: If enabled, saves the prediction history. - `tune`: Whether to perform hyperparameter tuning. - `verbose`: If enabled, prints additional debugging output. - `boot`: Whether to refit models on bootstrap samples to add model uncertainty. - `robustboot`: Bootstrap sampling strategy used when `boot = TRUE`. - `uncert`: Prediction uncertainty method (`"none"`, `"normalerror"`, `"resid"`, `"pmm"`, `"midastouch"`). - `m`: Number of multiple imputations. If `m > 1`, a `vimmi` object is returned. ## Data To demonstrate the function, the `sleep` dataset from the `VIM` package is used. ```{r, echo = FALSE, results='hide', message=FALSE, warning=FALSE} library(VIM) library(data.table) ``` ```{r setup_2, message = FALSE} data <- as.data.table(VIM::sleep) a <- aggr(sleep, plot = FALSE) plot(a, numbers = TRUE, prop = FALSE) ``` The left plot shows the amount of missings for each column in the dataset sleep and the right plot shows how often each combination of missings occur. For example, there are 9 rows wich contain a missing in both NonD and Dream. ```{r, message = FALSE} dataDS <- sleep[, c("Dream", "Sleep")] marginplot(dataDS, main = "Missing Values") ``` The __red__ boxplot on the left shows the distrubution of all values of Sleep where Dream contains a missing value. The __blue__ boxplot on the left shows the distribution of the values of Sleep where Dream is observed. ## Basic Usage ### Default Imputation In the basic usage, the `vimpute()` function uses the default settings: all variables are imputed with the "ranger" method, sequential imputation is enabled (`nseq = 10`, `eps = 0.005`), PMM is off, formulas are off, no tuning is performed, and imputation indicators are added. ```{r, include=TRUE, results='hide', message=FALSE, warning=FALSE} result <- vimpute( data = data, pred_history = TRUE) ``` ```{r} print(head(result$data, 3)) ``` Results and information about missing/imputed values can be shown in the plot margins: ```{r} dataDS <- as.data.frame(result$data[, c("Dream", "Sleep", "Dream_imp", "Sleep_imp")]) marginplot(dataDS, delimiter = "_imp", main = "Imputation with Default Model") ``` The default output are the imputed dataset and the prediction history. In this plot three differnt colors are used in the top-right. These colors represent the structure of missings. * __brown__ points represent values where `Dream` was missing initially * __beige__ points represent values where `Sleep` was missing initially * __black__ points represent values where both `Dream` and `Sleep` were missing initially ## Advanced Options #### Parameter `method` *(default: "ranger" for all variables)* Specifies the method used for imputation of each variable. If `method` is not provided, `vimpute()` uses `"ranger"` for all variables. In this example, different imputation methods are specified for each variable. The `NonD` variable uses a robust method, `Dream` and `Span` are using ranger, `Sleep` uses xgboost, `Gest` uses a regularized method and `class` uses a robust method. - **`"robust"`**: Robust Regression Models - `lmrob` for numeric variables: Implements MM-estimation for resistance to outliers - `glmrob` for factors: Robust GLM; binary via robust logit, multiclass via one-vs-rest - **`"regularized"`**: Regularized Regression (`glmnet`) - Uses elastic net regularization - Automatically handles multicollinearity - **`"ranger"`**: Random Forest - Fast implementation of random forests - Handles non-linear relationships well - **`"xgboost"`**: Gradient Boosted Trees - State-of-the-art tree boosting - Handles mixed data types well - **`"gam"`**: Generalized Additive Models (`mgcv`) - Supports smooth terms for numeric predictors - Can be used for numeric and factor targets - **`"robgam"`**: Robust GAM - Adds robust weighting / refitting to GAM-style models - Designed for settings with potential outliers You can provide `method` globally as a single method name (applies to all variables), or as a named list per variable. ```{r, include=TRUE, results='hide', message=FALSE, warning=FALSE} result_mixed <- vimpute( data = data, method = list(NonD = "robust", Dream = "ranger", Sleep = "xgboost", Span = "ranger", Gest = "regularized"), pred_history = TRUE ) ``` ```{r} dataDS <- as.data.frame(result_mixed$data[, c("Dream", "Sleep", "Dream_imp", "Sleep_imp")]) marginplot(dataDS, delimiter = "_imp", main = "Imputation with different Models for each Variable") ``` ```{r, include=TRUE, results='hide', message=FALSE, warning=FALSE, echo=F,} result_xgboost <- vimpute( data = data, method = setNames(as.list(rep("xgboost", ncol(data))), names(data)), pred_history = TRUE, verbose = FALSE ) dataDS_xgboost <- as.data.frame(result_xgboost$data[, c("Dream", "Sleep", "Dream_imp", "Sleep_imp")]) result_regularized <- vimpute( data = data, method = setNames(as.list(rep("regularized", ncol(data))), names(data)), pred_history = TRUE ) dataDS_regularized <- as.data.frame(result_regularized$data[, c("Dream", "Sleep", "Dream_imp", "Sleep_imp")]) ``` The side-by-side margin plots compare the performance of two imputation methods: xgboost (left) and regularized (right): ```{r, echo=F, warning=F} par(mfrow = c(1, 2)) marginplot(dataDS_xgboost, delimiter = "_imp", main = "Imputation with xgboost") marginplot(dataDS_regularized, delimiter = "_imp", main = "Imputation with Regularized") par(mfrow = c(1, 1)) ``` xgboost handles missing values with data-driven, uneven imputations that capture complex patterns but may be less stable, while regularized methods produce smoother, more conservative estimates that are less prone to overfitting. The key difference lies in flexibility (xgboost) versus robustness (regularization). #### Parameter `pmm` *(default: FALSE)* ```{r, eval = FALSE} result <- vimpute( data = data, method = list(NonD = "robust", Dream = "ranger", Sleep = "xgboost", Span = "ranger", Gest = "regularized"), pmm = list(NonD = FALSE, Dream = TRUE, Sleep = FALSE, Span = FALSE , Gest = TRUE) ) ``` If `pmm = TRUE`, this is applied only to numeric target variables. `vimpute()` first computes the model prediction for a missing entry, then compares it to observed values of that target and selects donor candidates by smallest absolute distance to the prediction. If `pmm_k = 1` (or `NULL`, which defaults to 1 when PMM is active), the closest observed value is used directly. If `pmm_k > 1`, the final imputed value is derived from the `k` nearest donors using `pmm_k_method` (e.g. `"mean"`, `"median"`, `"random"` or a custom function). If `pmm = FALSE`, raw model predictions are used. In sequential imputation, the convergence criterion is computed from the raw model prediction when PMM is active. This avoids unstable stopping behavior caused by stochastic or donor-based PMM values while still returning the PMM-imputed values in the final data. You can provide `pmm` globally as a single logical value (applies to all numeric variables), or as a named list per variable. #### Parameter `pmm_k` *(default: NULL)* `pmm_k` defines how many nearest donor candidates are considered when PMM is enabled. You can provide `pmm_k` globally as a single integer value (applies to all variables where PMM is active), or as a named list per variable. #### Parameter `pmm_k_method` *(default: "mean")* `pmm_k_method` controls how the final donor value is derived when `pmm_k > 1`. Possible values are: - `"mean"` (default): uses the mean of the `k` nearest observed donor candidates. - `"median"`: uses the median of the `k` nearest donor candidates (more robust to outliers). - `"random"`: randomly draws one value from the `k` nearest donor candidates. - `"custom function"`: a function that gets the `k` nearest donor values and returns one numeric value. If a variable-specific list contains `NULL`, `vimpute()` falls back to `"mean"` for that variable. You can provide `pmm_k_method` globally as a single value/function (applies to all numeric variables where `pmm = TRUE` and `pmm_k > 1`), or as a named list per variable. ```{r, eval = FALSE} result <- vimpute( data = data, pmm = list(Dream = TRUE, Sleep = TRUE, NonD = FALSE), pmm_k = list(Dream = 5, Sleep = 3), pmm_k_method = list( Dream = "median", Sleep = "random" ) ) ``` Custom aggregation functions are also possible. The function receives the nearest donor values and must return exactly one non-missing numeric value. ```{r, eval = FALSE} result <- vimpute( data = data, pmm = list(Dream = TRUE), pmm_k = list(Dream = 5), pmm_k_method = list(Dream = function(x) mean(x, trim = 0.2)) ) ``` #### Parameter `learner_params` *(default: NULL)* Use `learner_params` to pass method-specific settings to the underlying learners. This is useful if different variables are imputed with different methods and each method should receive its own parameter configuration. You can provide `learner_params` globally (when one method is used for all variables), as a method-level list, or as a named list per variable. ```{r, eval = FALSE} result <- vimpute( data = data, method = list(Dream = "ranger", Sleep = "xgboost"), learner_params = list( Dream = list(num.trees = 700, min.node.size = 4), Sleep = list(nrounds = 250, max_depth = 5, eta = 0.05) ) ) ``` #### Parameter `formula` *(default: FALSE)* Specifies custom model formulas for imputation of each variable, offering precise control over the imputation models. **Key Features:** 1. **Variable-Specific Models** - Each formula specifies which predictors should be used for imputing a particular variable - Enables different predictor sets for different target variables - Example: ```r formula = list( income ~ education + age, blood_pressure ~ weight + age ) ``` 2. **Transformations Support** - Handles common transformations on both sides of the formula: - Response transformations: `log(y)`, `sqrt(y)`, `exp(y)`, `I(1/y)` - Predictor transformations: `log(x1)`, `poly(x2, 2)`, etc. - Example with transformations: ```r formula = list( log(income) ~ poly(age, 2) + education, sqrt(blood_pressure) ~ weight + I(1/age) ) ``` 3. **Interaction Terms** - Supports interaction terms using `:` or `*` syntax (on the right side) - Example: ```r formula = list( price ~ sqft * neighborhood + year_built ) ``` **Example Demonstration:** ```{r, eval = FALSE} result <- vimpute( data = data, method = setNames(as.list(rep("regularized", ncol(data))), names(data)), formula = list( NonD ~ Dream + Sleep, # Linear combination Span ~ Dream:Sleep + Gest, # With interaction term log(Gest) ~ Sleep + exp(Span) # With transformations ) ) ``` **Interpreting the Example:** 1. For `NonD`: - Uses linear combination of `Dream` and `Sleep` variables - Model: `NonD = β₀ + β₁*Dream + β₂*Sleep + ε` 2. For `Span`: - Includes interaction between `Dream` and `Sleep` - Plus main effect of `Gest` - Model: `Span = β₀ + β₁*Dream*Sleep + β₂*Gest + ε` 3. For `Gest`: - Uses log-transformed response - Predictors include `Sleep` and exponential of `Span` - Model: `log(Gest) = β₀ + β₁*Sleep + β₂*exp(Span) + ε` 4. For `Sleep` and `Dream` all other variables are used as predictors **Notes:** - Only works with methods `"robust"`, `"regularized"`, `"gam"` and `"robgam"` - All model.matrix-compatible functions work for predictors (for more information see ?model.matrix) - Response transformations (left side) are automatically back-transformed ```{r, eval = FALSE} result_gam <- vimpute( data = data, method = list(Gest = "gam"), formula = list(Gest = log(Gest) ~ Sleep + Dream + Span), sequential = FALSE ) ``` #### Parameter `makeNA` *(default: NULL)* `makeNA` defines values that should be treated as missing for selected variables. This is useful when special codes such as `-999`, `"unknown"` or `"not measured"` should be imputed, while regular `NA` values in the same variable should remain untouched. ```{r, eval = FALSE} result <- vimpute( data = data, method = list(Dream = "ranger"), makeNA = list(Dream = -999) ) ``` If a variable is listed in `makeNA`, only the matching values are imputed for that variable. Variables not listed in `makeNA` continue to use regular `NA` values as the imputation target. #### Parameter `donorcond` *(default: NULL)* `donorcond` restricts which observed values are allowed to act as donors / training observations for a target variable. Conditions are supplied as character strings and evaluated on the target values via the temporary variable `x`. ```{r, eval = FALSE} result <- vimpute( data = data, method = list(Dream = "ranger"), donorcond = list(Dream = "> quantile(x, 0.1, na.rm = TRUE)") ) ``` This can be useful when implausible observed values should not be used for model fitting, while the target variable itself remains part of the imputation workflow. #### Parameters `boot`, `robustboot` and `uncert` *(defaults: `boot = FALSE`, `robustboot = "stratified"`, `uncert = "none"`)* These arguments control additional imputation uncertainty. - `boot = TRUE` refits the imputation model on a bootstrap sample. - `robustboot` controls how bootstrap rows are sampled (`"standard"`, `"stratified"` or `"residual"`). - `uncert` adds uncertainty to numeric predictions: - `"normalerror"` adds normal noise using the model scale estimate. - `"resid"` adds sampled residuals. - `"pmm"` uses score-based predictive mean matching. - `"midastouch"` uses PMM with covariate-distance weighting. ```{r, eval = FALSE} result <- vimpute( data = data, method = list(Dream = "ranger"), boot = TRUE, robustboot = "standard", uncert = "normalerror" ) ``` If explicit `pmm = TRUE` is used for a variable, it takes precedence over `uncert`. #### Parameter `m` *(default: 1)* `m` controls multiple imputation. If `m > 1`, `vimpute()` returns a `vimmi` object instead of a single completed dataset. ```{r, eval = FALSE} mi <- vimpute( data = data, method = list(Dream = "ranger"), m = 5, boot = TRUE, uncert = "resid", imp_var = FALSE ) completed_1 <- complete(mi, 1) completed_all <- complete(mi, "all") completed_long <- complete(mi, "long") ``` The `vimmi` object stores the original data and the imputed values efficiently. It can be inspected with `print()` and `summary()`, and completed datasets can be extracted with `complete()`. If the `mice` package is installed, `as.mids.vimmi()` can be used to convert a `vimmi` object to a `mice::mids` object for downstream pooling workflows. #### Parameter `tune` *(default: FALSE)* ```{r, eval = FALSE} result <- vimpute( data = data, tune = TRUE ) ``` Whether to perform hyperparameter tuning (only possible if seq = TRUE): - When TRUE: - Conducts randomized parameter search (after half of the iterations) - Uses best performing configuration - When FALSE: - Uses default model parameters - Recommended: TRUE for optimal performance You can provide `tune` either as a single global TRUE/FALSE value (applies to all variables), or as a named list per variable. When tuning is enabled, `vimpute()` returns a list containing the imputed data and a `tuning_log` component. If `pred_history = TRUE` is also enabled, the list contains both `pred_history` and `tuning_log`. #### Parameters `nseq` and `eps` *(default: 10 and default: 0.005)* ```{r, eval = FALSE} result <- vimpute( data = data, nseq = 20, eps = 0.01 ) ``` `nseq` describes the number of sequential imputation iterations. Higher values: - Allow more refinement - Increase computation time `eps` describes the convergence threshold for sequential imputation: - Stops early if changes between iterations < eps - Smaller values: Require more precise convergence but may need more iterations #### Parameter `imp_var` *(default: TRUE)* ```{r, eval = FALSE} result <- vimpute( data = data, imp_var = TRUE ) ``` Creating indicator variables for imputed values adds "_imp" columns (TRUE/FALSE) to mark which data points were imputed. This is particularly useful for tracking imputation effects and conducting diagnostic analyses. #### Parameter `pred_history` *(default: FALSE)* ```{r} print(tail(result$pred_history, 9)) ``` When enabled (TRUE), this option saves prediction trajectories in `$pred_history`, allowing users to track how imputed values evolve across iterations. This feature is particularly useful for diagnosing convergence issues. With `pred_history = TRUE`, `vimpute()` returns a list (not only the imputed data table). Therefore, results must be accessed via list elements, e.g. `result$data` for the imputed dataset and `result$pred_history` for the history. ```{r, eval = FALSE} result <- vimpute(data = data, pred_history = TRUE) # Access imputed data head(result$data) # Access prediction history tail(result$pred_history, 9) ``` ## Performance In order to validate the performance of vimpute() the iris dataset is used. Firstly, some values are randomly set to `NA`. ```{r} library(reactable) data(iris) df <- as.data.table(iris) colnames(df) <- c("S.Length","S.Width","P.Length","P.Width","Species") # randomly produce some missing values in the data set.seed(1) nbr_missing <- 50 y <- data.frame(row=sample(nrow(iris),size = nbr_missing,replace = T), col=sample(ncol(iris)-1,size = nbr_missing,replace = T)) y<-y[!duplicated(y),] df[as.matrix(y)]<-NA aggr(df) ``` ```{r} sapply(df, function(x)sum(is.na(x))) ``` The data contains missing values across all variables, with some observations missing multiple values. The subsequent step involves variable imputation, and the following tables present the rounded first five imputation results for each variable. For default model: ```{r, results='hide', message=FALSE, warning=FALSE,include=FALSE} library(reactable) data(iris) df <- as.data.table(iris) colnames(df) <- c("S.Length","S.Width","P.Length","P.Width","Species") # Create complete copy before introducing NAs complete_data <- df # Randomly produce missing values set.seed(1) nbr_missing <- 50 y <- data.frame(row = sample(nrow(df), size = nbr_missing, replace = TRUE), col = sample(ncol(df), size = nbr_missing, replace = TRUE)) y <- y[!duplicated(y),] df[as.matrix(y)] <- NA # Perform imputation result <- vimpute(data = df, pred_history = TRUE) # Extracting the imputed columns from result$data imputed_columns <- grep("_imp$", names(result$data), value = TRUE) # Create a function to compare true and imputed values compare_values <- function(true_data, pred_data, imputed_data, col_name) { comparison <- data.frame( True_Value = true_data[[col_name]], Imputed_Value = ifelse(imputed_data, pred_data[[col_name]], NA) ) comparison <- comparison[!is.na(comparison$Imputed_Value), ] return(comparison) } # Initialize an empty list to store the comparison tables comparison_list <- list() # Loop through each imputed column and create a comparison table for (imputed_col in imputed_columns) { col_name <- sub("_imp$", "", imputed_col) comparison_list[[col_name]] <- compare_values(complete_data, result$data, result$data[[imputed_col]], col_name) } # Prepare the results in a combined wide format, ensuring equal row numbers results <- cbind( "TRUE1" = head(comparison_list[["S.Length"]][, "True_Value"], 5), "IMPUTED1" = head(comparison_list[["S.Length"]][, "Imputed_Value"], 5), "TRUE2" = head(comparison_list[["S.Width"]][, "True_Value"], 5), "IMPUTED2" = head(comparison_list[["S.Width"]][, "Imputed_Value"], 5), "TRUE3" = head(comparison_list[["P.Length"]][, "True_Value"], 5), "IMPUTED3" = head(comparison_list[["P.Length"]][, "Imputed_Value"], 5), "TRUE4" = head(comparison_list[["P.Width"]][, "True_Value"], 5), "IMPUTED4" = head(comparison_list[["P.Width"]][, "Imputed_Value"], 5) ) # Print the combined wide format table print(results) ``` ```{r echo=F,warning=F} # Load the reactable library library(reactable) # Create the reactable reactable(results, columns = list( TRUE1 = colDef(name = "True"), IMPUTED1 = colDef(name = "Imputed"), TRUE2 = colDef(name = "True"), IMPUTED2 = colDef(name = "Imputed"), TRUE3 = colDef(name = "True"), IMPUTED3 = colDef(name = "Imputed"), TRUE4 = colDef(name = "True"), IMPUTED4 = colDef(name = "Imputed") ), columnGroups = list( colGroup(name = "S.Length", columns = c("TRUE1", "IMPUTED1")), colGroup(name = "S.Width", columns = c("TRUE2", "IMPUTED2")), colGroup(name = "P.Length", columns = c("TRUE3", "IMPUTED3")), colGroup(name = "P.Width", columns = c("TRUE4", "IMPUTED4")) ), striped = TRUE, highlight = TRUE, bordered = TRUE ) ``` For xgboost model: ```{r, results='hide', message=FALSE, warning=FALSE,include=FALSE} library(reactable) library(VIM) data(iris) # Create complete copy before introducing NAs complete_data <- iris colnames(complete_data) <- c("S.Length","S.Width","P.Length","P.Width","Species") df <- copy(complete_data) # Randomly produce missing values set.seed(1) nbr_missing <- 50 y <- data.frame(row = sample(nrow(df), size = nbr_missing, replace = TRUE), col = sample(ncol(df), size = nbr_missing, replace = TRUE)) y <- y[!duplicated(y),] df[as.matrix(y)] <- NA # Perform imputation with proper method specification result <- vimpute( data = df, method = setNames(lapply(names(df), function(x) "xgboost"),names(df)), pred_history = TRUE ) # Extracting the imputed columns from result$data imputed_columns <- grep("_imp$", names(result$data), value = TRUE) # Create a function to compare true and imputed values compare_values <- function(true_data, pred_data, imputed_data, col_name) { comparison <- data.frame( True_Value = true_data[[col_name]], Imputed_Value = ifelse(imputed_data, pred_data[[col_name]], NA) ) comparison <- comparison[!is.na(comparison$Imputed_Value), ] return(comparison) } # Initialize an empty list to store the comparison tables comparison_list <- list() # Loop through each imputed column and create a comparison table for (imputed_col in imputed_columns) { col_name <- sub("_imp$", "", imputed_col) comparison_list[[col_name]] <- compare_values(complete_data, result$data, result$data[[imputed_col]], col_name) } # Prepare the results in a combined wide format, ensuring equal row numbers results <- cbind( "TRUE1" = head(comparison_list[["S.Length"]][, "True_Value"], 5), "IMPUTED1" = head(comparison_list[["S.Length"]][, "Imputed_Value"], 5), "TRUE2" = head(comparison_list[["S.Width"]][, "True_Value"], 5), "IMPUTED2" = head(comparison_list[["S.Width"]][, "Imputed_Value"], 5), "TRUE3" = head(comparison_list[["P.Length"]][, "True_Value"], 5), "IMPUTED3" = head(comparison_list[["P.Length"]][, "Imputed_Value"], 5), "TRUE4" = head(comparison_list[["P.Width"]][, "True_Value"], 5), "IMPUTED4" = head(comparison_list[["P.Width"]][, "Imputed_Value"], 5) ) # Print the combined wide format table print(results) ``` ```{r echo=F,warning=F} # Load the reactable library library(reactable) # Create the reactable reactable(results, columns = list( TRUE1 = colDef(name = "True"), IMPUTED1 = colDef(name = "Imputed"), TRUE2 = colDef(name = "True"), IMPUTED2 = colDef(name = "Imputed"), TRUE3 = colDef(name = "True"), IMPUTED3 = colDef(name = "Imputed"), TRUE4 = colDef(name = "True"), IMPUTED4 = colDef(name = "Imputed") ), columnGroups = list( colGroup(name = "S.Length", columns = c("TRUE1", "IMPUTED1")), colGroup(name = "S.Width", columns = c("TRUE2", "IMPUTED2")), colGroup(name = "P.Length", columns = c("TRUE3", "IMPUTED3")), colGroup(name = "P.Width", columns = c("TRUE4", "IMPUTED4")) ), striped = TRUE, highlight = TRUE, bordered = TRUE ) ```