报告题目:Feature screening for clustering analysis
报告人:席瑞斌,北京大学数学科学学院、统计科学中心长聘副教授、研究员
报告时间:2022年12月8日(周四)上午 9:30-11:00
腾讯会议:690-950-4009
报告摘要:We consider feature screening for ultrahigh dimensional clustering analyses. Based on the observation that the marginal distribution of any given feature is a mixture of its conditional distributions in different clusters, we propose to screen clustering features by independently evaluating the homogeneity of each feature’s mixture distribution. Important clustering-relevant features have heterogeneous components in their mixture distributions and unimportant features have homogeneous components. The well-known EM-test statistic is used to evaluate the homogeneity. Under general parametric settings, we establish the tail probability bounds of the EM-test statistic for the homogeneous and heterogeneous features, and further show that the proposed screening procedure can achieve the sure independent screening and even the consistency in selection properties. Limiting distribution of the EM-test statistic is also obtained for general parametric distributions. The proposed method is computationally efficient, can accurately screen for important clustering-relevant features and help to significantly improve clustering, as demonstrated in our extensive simulation and real data analyses.
报告人简介:席瑞斌,北京大学数学科学学院、统计科学中心长聘副教授、研究员。2014年入选国家级重大人才青年计划,中国现场统计协会理事、青年统计学家协会副会长、中国现场统计研究会计算统计分会常务理事,Statistics theory and related fields 编委。2009年毕业于美国圣路易斯华盛顿大学,2009-2012年在哈佛大学医学院从事生物医学信息学方面的研究。2012年9月加入北京大学。席瑞斌的主要研究方向是生物统计、生物信息、生物大数据的统计分析、图模型、高维统计等。席瑞斌近年来有40多篇文章发表于Nature, Nature Genetics, PNAS, Science Translational Medicine, Nature Communications, Nucleic Acids Research, Journal of Hepatology, Bioinformatics, Biometrika, IEEE Transactions on Knowledge and Data Engineering等顶级或权威学术期刊。