统计与数据科学系系列学术报告之四百二十三期

 

时    间: 2024年4月10日(周三)15:00-16:00

地    点:史带楼403室

主持人:复旦大学 管理学院 统计与数据科学系 戴国榕 博士

报告人:张宇谦  助理教授  中国人民大学

题    目:Enhancing efficiency and robustness in high-dimensional linear regression with additional unlabeled data

摘    要:In semi-supervised learning, the prevailing understanding suggests that observing additional unlabeled samples improves estimation accuracy for linear parameters only in the case of model misspecification. This paper challenges this notion, demonstrating its inaccuracy in high dimensions. Initially focusing on a dense scenario, we introduce robust semi-supervised estimators for the regression coefficient without relying on sparse structures in the population slope. Even when the true underlying model is linear, we show that leveraging information from large-scale unlabeled data improves both estimation accuracy and inference robustness. Moreover, we propose semi-supervised methods with further enhanced efficiency in scenarios with a sparse linear slope. Diverging from the standard semi-supervised literature, we also allow for covariate shift. The performance of the proposed methods is illustrated through extensive numerical studies, including simulations and a real-data application to the AIDS Clinical Trials Group Protocol 175 (ACTG175).

个人简介:

张宇谦,中国人民大学统计与大数据研究院助理教授,博士生导师。2016年本科毕业于武汉大学,2022年博士毕业于美国加州大学圣地亚哥分校。主要研究方向包括因果推断、半监督推断、高维统计、机器学习理论、缺失数据等。文章发表或接受于Biometrika、Annals of Statistics、Information and Inference等期刊。主持国家自然科学基金青年基金项目一项,参与面上项目一项。曾获美国统计协会非参数统计组最佳学生论文奖。

 

统计与数据科学系

2024-3-26