在有监督学习的任务中,任何方法的主要目标是对未来数据进行准确的预测.作为梯度boosting算法的贝叶斯版本,贝叶斯可加回归树(Bayesian additive regression trees,BART)模型在此方面具有巨大潜力.但是,BART得到的关注远远低于随机森林和梯度boosting算法.为扩展BART的应用范围,文中首先对BART模型作了较为详尽的综述.考虑到BART在高维情况下会出现过拟合,本文提出了RS-BART方法以提高其预测性能.RS-BART首先对所有预测变量根据其相对重要性排序,然后使用重要性度量训练一些低维或中等维度的BART模型,将其预测结果平均或投票来得到最终的预测结果.基于模拟和实际数据的试验结果表明,与一些最先进的方法(如随机森林、boosting和BART)相比,RS-BART具有更好或基本相当的预测性能.因此,RS-BART可以作为用于解决实际应用中高维且稀疏预测任务的一种有效工具.
In supervised learning tasks, it is crucial for any algorithm to make accurate predictions on future data. As a Bayesian version of the gradient boosting algorithm, Bayesian additive regression trees (BART) have great potential to achieve high prediction accuracy. As far as we know, however, BART has not received as much attention as random forests and boosting. Thus, a comprehensive overview of BART is first presented to facilitate its understanding. Considering that BART may suffer from over-fitting in high-dimensional situations, one novel technique called RS-BART is developed to enhance its performance. Through first sorting all the variables with their relative importance, some low- or medium-dimensional BART models are trained with important variables. The predictions produced by these BART models are then integrated into the final result. By conducting experiments with some simulated and real data, RS-BART is demonstrated to perform better than or competitively with some state-of-the-art techniques including random forests, boosting and BART. Thus, RS-BART can be deemed as a competitive tool to solve real prediction tasks, especially high-dimensional but sparse ones.
Chinese Journal of Engineering Mathematics
The National Natural Science Foundation of China (11671317,11601412)
the Key Science and Technology Program of Shaanxi Province (2016GY-067)
the Key Laboratory Program of Science and Technology Co-ordination and Innovation Project of Shaanxi Province (2014SZS20-K04).
Bayesian additive regression tree
王冠伟,Wang Guanwei (Born in 1979),Male,Ph.D.,Lecturer.Research field: pattern recognition,forecasting,fault diagnostics.