Ensemble Learning

Feng Zhou (CMU)
Baoliang Lu (SJTU)

This paper considers the problems of learning concepts from large-scale data sets. The way we take is completely classification algorithm independent. Firstly, the original problem is decomposed into a series of smaller two-class sub-problems which are easier to be solved. Secondly we present two principles, namely the shrink and expansion principles, to restore the global solution from the intermediate results learned from the sub-problems. In the theoretical analysis, this procedure of integration is described as a statistical inference of a posterior probability and is degraded as the min-max principles in the special case considering 0-1 outputs. We also propose a revised approach which reduces the computational complexity of the training and testing stage to a linear level. Finally, experiments on both the synthetic and text-classification data are demonstrated. The results indicate that our methods are effective to large scale problems.

[1]

Learning Concepts from Large-Scale Data Sets by Pairwise Coupling with Probabilistic Outputs
International Joint Conference on Neural Networks (IJCNN), 2007
F. Zhou and B. Lu

[Paper 1MB] [Slides 1MB]
[2]

Research on Ensemble Learning
Master Thesis, Shanghai Jiao Tong University, 2007
F. Zhou and B. Lu

[Paper 2MB (in Chinese)] [Slides 3MB]

A Probabilistic Framework for Ensemble Learning

People

Introduction

Publications