Bayesian model weighting is a well know and well analysed technique used for making decisions based on data D. A number of models M1...Mn are considered and for each model Mi, P(Mi|D) the probability of the model given the data is calculated. Each of the models are then used for making a prediction f1...fn and each prediction is weighted with the probability P(Mi|D) giving as final prediction F=C åi P(Mi|D).fi where C is a normalizing constant C = åi P(Mi|D).
(See the Bayesian Model Averaging page for more details.)
It was recently reported in an article Bayesian Averaging of Classifiers and the Overfitting Problem by Pedro Domingos that Bayesian model weighting may be outperformed in cross validation experiments by democratic voting. In democratic voting, the final prediction does not take the probability P(Mi|D) into account but instead gives each model a single vote: one model one vote! So, democratic voting gives a prediction based on F=(1/n) åi fi.
Can this emperical result be verified in other areas of model selection?
Can this emperical result be explained by looking at emperical results in other areas of model selection?