D-learning to estimate optimal individual treatment rules.

*(English)*Zbl 1454.62381Summary: Recent exploration of the optimal individual treatment rule (ITR) for patients has attracted a lot of attentions due to the potential heterogeneous response of patients to different treatments. An optimal ITR is a decision function based on patients’ characteristics for the treatment that maximizes the expected clinical outcome. Current literature mainly focuses on two types of methods, model-based and classification-based methods. Model-based methods rely on the estimation of conditional mean of outcome instead of directly targeting decision boundaries for the optimal ITR. As a result, they may yield suboptimal decisions. In contrast, although classification based methods directly target the optimal ITR by converting the problem into weighted classification, these methods rely on using correct weights for all subjects, which may cause model misspecification. To overcome the potential drawbacks of these methods, we propose a simple and flexible one-step method to directly learn (D-learning) the optimal ITR without model and weight specifications. Multi-category D-learning is also proposed for the case with multiple treatments. A new effect measure is proposed to quantify the relative strength of an treatment for a patient. We show estimation consistency and establish tight finite sample error bounds for the proposed D-learning. Numerical studies including simulated and real data examples are used to demonstrate the competitive performance of D-learning.

##### MSC:

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

62C05 | General considerations in statistical decision theory |

62G05 | Nonparametric estimation |

62J07 | Ridge regression; shrinkage estimators (Lasso) |

PDF
BibTeX
XML
Cite

\textit{Z. Qi} and \textit{Y. Liu}, Electron. J. Stat. 12, No. 2, 3601--3638 (2018; Zbl 1454.62381)

**OpenURL**

##### References:

[1] | G. Baron, E. Perrodeau, I. Boutron, and P. Ravaud. Reporting of analyses from randomized controlled trials with multiple arms: a systematic review., BMC medicine, 11(1):84, 2013. |

[2] | P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results., Journal of Machine Learning Research, 3(Nov):463–482, 2002. · Zbl 1084.68549 |

[3] | P. Bühlmann and S. Van De Geer., Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011. |

[4] | J. Cohen. Statistical power analysis for the behavior science., Lawrance Eribaum Association, 1988. |

[5] | C. Cortes and V. Vapnik. Support-vector networks., Machine learning, 20(3):273–297, 1995. · Zbl 0831.68098 |

[6] | N. Cristianini and J. Shawe-Taylor., An introduction to support vector machines and other kernel-based learning methods. Cambridge university press, 2000. · Zbl 0994.68074 |

[7] | Y. Cui, R. Zhu, and M. Kosorok. Tree based weighted learning for estimating individualized treatment rules with censored data., Electronic journal of statistics, 11(2) :3927–3953, 2017. · Zbl 1379.62066 |

[8] | A. Fan, W. Lu, and R. Song. Sequential advantage selection for optimal treatment regime., The annals of applied statistics, 10(1):32, 2016. |

[9] | C. Fan, W. Lu, R. Song, and Y. Zhou. Concordance-assisted learning for estimating optimal individualized treatment regimes., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(5) :1565–1582, 2017. · Zbl 1381.62097 |

[10] | J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American statistical Association, 96(456) :1348–1360, 2001. · Zbl 1073.62547 |

[11] | J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent., Journal of Statistical Software, 33(1):1–22, 2010. URL http://www.jstatsoft.org/v33/i01/. |

[12] | L. Gunter, J. Zhu, and S. Murphy. Variable selection for qualitative interactions., Statistical methodology, 8(1):42–55, 2011. |

[13] | S. M. Hammer, D. A. Katzenstein, M. D. Hughes, H. Gundacker, R. T. Schooley, R. H. Haubrich, W. K. Henry, M. M. Lederman, J. P. Phair, M. Niu, et al. A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter., New England Journal of Medicine, 335(15) :1081–1090, 1996. |

[14] | T. N. Kakuda. Pharmacology of nucleoside and nucleotide reverse transcriptase inhibitor-induced mitochondrial toxicity., Clinical therapeutics, 22(6):685–708, 2000. |

[15] | G. S. Kimeldorf and G. Wahba. A correspondence between bayesian estimation on stochastic processes and smoothing by splines., The Annals of Mathematical Statistics, 41(2):495–502, 1970. · Zbl 0193.45201 |

[16] | E. Laber and Y. Zhao. Tree-based methods for individualized treatment regimes., Biometrika, 102(3):501–514, 2015. · Zbl 1452.62821 |

[17] | E. B. Laber, D. J. Lizotte, M. Qian, W. E. Pelham, and S. A. Murphy. Dynamic treatment regimes: Technical challenges and applications., Electronic journal of statistics, 8(1) :1225, 2014. · Zbl 1298.62189 |

[18] | M. Ledoux and M. Talagrand., Probability in Banach Spaces: isoperimetry and processes. Springer Science & Business Media, 2013. · Zbl 1226.60003 |

[19] | Y. Lin and H. H. Zhang. Component selection and smoothing in multivariate nonparametric regression., The Annals of Statistics, 34(5) :2272–2297, 2006. · Zbl 1106.62041 |

[20] | Y. Liu, Y. Wang, M. R. Kosorok, Y. Zhao, and D. Zeng. Augmented outcome-weighted learning for estimating optimal dynamic treatment regimens., Statistics in medicine, 2018. |

[21] | W. Lu, H. H. Zhang, and D. Zeng. Variable selection for optimal treatment decision., Statistical methods in medical research, page 0962280211428383, 2011. |

[22] | S. A. Murphy. Optimal dynamic treatment regimes., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2):331–355, 2003. · Zbl 1065.62006 |

[23] | S. A. Murphy. A generalization error for q-learning., Journal of Machine Learning Research, 6(Jul) :1073–1097, 2005. · Zbl 1222.68271 |

[24] | M. Qian and S. A. Murphy. Performance guarantees for individualized treatment rules., Annals of statistics, 39(2) :1180, 2011. · Zbl 1216.62178 |

[25] | J. M. Robins. Optimal structural nested models for optimal sequential decisions. In, Proceedings of the second seattle Symposium in Biostatistics, pages 189–326. Springer, 2004. · Zbl 1279.62024 |

[26] | P. J. Schulte, A. A. Tsiatis, E. B. Laber, and M. Davidian. Q-and A-learning methods for estimating optimal dynamic treatment regimes., Statistical science: a review journal of the Institute of Mathematical Statistics, 29(4):640, 2014. · Zbl 1331.62437 |

[27] | R. Song, M. Kosorok, D. Zeng, Y. Zhao, E. Laber, and M. Yuan. On sparse representation for optimal individualized treatment selection with penalized outcome weighted learning., Stat, 4(1):59–68, 2015. |

[28] | I. Steinwart and A. Christmann., Support vector machines. Springer Science & Business Media, 2008. · Zbl 1203.68171 |

[29] | I. Steinwart and C. Scovel. Fast rates for support vector machines using gaussian kernels., The Annals of Statistics, 35(2):575–607, 2007. · Zbl 1127.68091 |

[30] | L. Tian, A. A. Alizadeh, A. J. Gentles, and R. Tibshirani. A simple method for estimating interactions between a treatment and a large number of covariates., Journal of the American Statistical Association, 109(508) :1517–1532, 2014. · Zbl 1368.62294 |

[31] | S. A. van de Geer, M. C. Veraar, J. A. Wellner, et al. Nemirovski’s inequalities revisited., American Mathematical Monthly, 117(2):138–160, 2010. · Zbl 1213.60039 |

[32] | G. Wahba. An introduction to smoothing spline anova models in rkhs, with examples in geographical data, medicine, atmospheric sciences and machine learning., IFAC Proceedings Volumes, 36(16):531–536, 2003. |

[33] | C. J. Watkins and P. Dayan. Q-learning., Machine learning, 8(3–4):279–292, 1992. · Zbl 0773.68062 |

[34] | Y. Wu and Y. Liu. Robust truncated hinge loss support vector machines., Journal of the American Statistical Association, 102(479):974–983, 2007. · Zbl 1469.62293 |

[35] | B. Zhang, A. A. Tsiatis, E. B. Laber, and M. Davidian. A robust method for estimating optimal treatment regimes., Biometrics, 68(4) :1010–1018, 2012. · Zbl 1258.62116 |

[36] | C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty., The Annals of statistics, pages 894–942, 2010. · Zbl 1183.62120 |

[37] | Y. Zhao, D. Zeng, A. J. Rush, and M. R. Kosorok. Estimating individualized treatment rules using outcome weighted learning., Journal of the American Statistical Association, 107(499) :1106–1118, 2012. · Zbl 1443.62396 |

[38] | Y.-Q. Zhao, D. Zeng, E. B. Laber, and M. R. Kosorok. New statistical learning methods for estimating optimal dynamic treatment regimes., Journal of the American Statistical Association, 110(510):583–598, 2015. · Zbl 1373.62557 |

[39] | X. Zhou, N. Mayer-Hamblett, U. Khan, and M. R. Kosorok. Residual weighted learning for estimating individualized treatment rules., Journal of the American Statistical Association, 112(517):169–187, 2017. |

[40] | H. Zou and T. Hastie. Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005. · Zbl 1069.62054 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.