Congratulations to Our New PhD Recipients- Jinwoo Cho, Zi Wang, and Fan Yang

Jinwoo Cho

Dynamic Prediction for Survival Outcomes and Network Assisted Localized Functional Principal Component Analysis

In the first part of the thesis, we develop a “Jointly Estimated Landmarking (JEL)” ap-proach for dynamic survival prediction using longitudinal covariates. JEL specifically models the effects of recent biomarker values (individual intercept) and change in biomarker values (individual slope) on conditional survival risk. The survival model is kept flexible with a transformation function G, including the Cox proportional hazards model and the propor-tional odds model as special cases. Time-varying models are also developed to assess the time-varying effect of longitudinal predictors. In the second part of the thesis, we first develop a network-assisted Localized Functional Principal Component Analysis (LFPCA) approach. Then network-assisted LFPCA is applied to an MEG dataset and a fMRI dataset to illustrate its power of dimension reduction and interpretable feature extraction. Finally, the principal components extracted from a longitudinal fMRI dataset are used for dynamic prediction for survival outcomes.

 

Zi Wang

Mediation Analysis of Semi-Competing Risks Data and Interim Analysis of SMART Survival Data

This dissertation comprises two distinct projects related to treatment effects evaluation for time-to-event data.

The first project focuses on causal mediation analysis. A treatment may have an effect on a nonterminal event (e.g., disease progression), which in turn may influence a terminal event (e.g., death), or treatment may affect the terminal event directly. We are thus interested in evaluating the mediational effect of the treatment through the nonterminal event and the direct treatment effect on the terminal event. However, the conventional definitions of natural direct effect and natural indirect effect are not appropriate here because of the semi-competing risks data structure, where time to a non-terminal event may be censored by a terminal event, but not vice versa. A principal stratification approach is adopted to define the natural direct and indirect effects in the “always diseased” stratum. We propose nonparametric estimators of the direct and indirect effects under suitable assumptions. The theoretical properties of the proposed estimators are established, and their good finite sample performance is illustrated through numerical studies. This work provides a flexible approach to estimating natural causal mediation effects and offers valuable insights into mediation mechanisms in semi-competing risks settings.

Sequential multiple assignment randomized trials mimic the actual treatment processes experienced by physicians and patients in clinical settings and inform the comparative effectiveness of dynamic treatment regimes. In such trials, patients go through multiple stages of treatment, and the treatment assignment is adapted over time based on individual patient characteristics such as disease status and treatment history. In the second project, we develop and evaluate statistically valid interim monitoring approaches to allow for early termination of sequential multiple assignment randomized trials for efficacy targeting survival outcomes. A weighted log-rank Chi-square statistic is proposed to account for overlapping treatment paths and quantify how the log-rank statistics at two different analysis points are correlated. Efficacy boundaries at multiple interim analyses can then be established using the Pocock, O'Brien Fleming, and Lan-Demets boundaries. We run extensive simulations to evaluate the operating characteristics (type I error and power) of our interim monitoring procedure based on the proposed statistic and another existing statistic. The methods are demonstrated via an analysis of a neuroblastoma dataset.

 

Fan Yang

Two Statistical Methods for High-Dimensional Data: Model-Free Inference in Protein Mutation Studies and Robust Distance Correlation for Variable Screening

Data with a much larger number of features than sample size is frequently seen in modern statistical applications, ranging from genomic research, biomedical imaging to signal processing problems. In the high-dimensional settings, statistical inference and variable selection are essential for extracting meaningful scientific insights. This thesis presents two methodological contributions aimed at addressing these challenges in distinct contexts. The first work focuses on protein contact prediction. We propose a novel framework that recasts the task as a statistical hypothesis testing problem within the context of partial correlation graphs for categorical variables. In the second work, a robust version of distance correlation measure are presented, designed for variable screening in ultrahigh-dimensional data. This method addresses both model misspecification and tail robustness, and enjoys the so-called sure screening property. To further enhance its performance, we develop a new false discovery rate (FDR) control procedure based on the Reflection via Data Splitting (REDS) approach.