The problem: A Bayes B net consists of two parts, a network structure BS and a set of probability tables BP. Learning a Bayes net from data therefore consists of two subtasks: learning BS and learning BP. There are super exponential (in the number of variables) many network structures to be considered, so efficient search of the space of network structures is required.
Observation: Often, a metric for the quality Q(BS) of a network structure is used that can be decomposed into the effect of each of the nodes (plus parents), i.e., Q(BS) = åu in U q(u, p(u)) (where p(u) is the set of parents of u). This has the benefit that for operations like arc reversal, arc insertion and arc removal, which affect only one term in the sum, the quality of the new network structure can be efficiently calculated: at most two terms in the sum have to be recalculated. However, it is very difficult to proof optimality of search strategies based on these operations.
Friedman and Koller use an approach where the space is structured by considering orderings on the variables and only allowing network structures that conform to the ordering (i.e., an arrow u->v is only allowed when u>v according to the ordering). According to Friedman and Koller, the space of orderings is much smaller and more regular than the space of structures, and has a smoother posterior `landscape'. The most obvious operations on orderings are swapping of consequtively ordered variabels. Another, used in Friedman and Koller is selecting a node ui at random, and swap the lot before to that after, that is, u1...uiui+1...un => ui+1...unu1...ui.
How to exploit this: Other operations on ordering can be defined. To mention just a few:
Interesting operations: Can other interestion operations be defined and which criteria make an operation interesting? One obvious criterion is how much computational effort is required to determine the quality of the newly created network structure. Another criterion is how easy it is to traverse the search space defined by the operation. Finally, the theoretical properties with respect to existance of convergence of search algorithms and speed of convergence make an operation of interest.
Theoretical properties: What can be said about optimality and speed of convergence of search algorithms based on the new operations.
Experimental behavior: How do the operations behave on data?
Any comment? Pleas mail to Remco.
1: Nir Friedman, Daphne Koller. Being Bayesian about Network Structure Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference, Craig Boutilier and Moises Goldszmidt (editors). Stanford University, Stanford, California, USA, Jun 30 2000 -- Jul 3 2000. © 2000 Morgan Kaufmann Publishers, San Francisco, CA.
Initial update: 9 Jan 2001 webmaster