Efficient inference in Bayes networks

The problem: Decisions can be made using Bayes networks for propagating probabilities through the network. In general, this kind of inference is NP complete. In practice, large cliques do occur in Bayesian networks, which may make the inference process a computational intensive task, which may take a long time to complete.

Observation: Intuitively, decision making tasks are often can be based on simple heuristics: decision making for driving a car on a motorway does not take a large effort. Only occasionally, complex situations occur, for which more effort is required in making the decision: entering a busy multilane street from a parking lot, crossing several lanes to reach the close-by turn off requires considerably more skill (often from any driver involved :-( ).

In short, simple decisions require just simple models, while only complex decisions require complex models.

How to exploit this: instead of using a single Bayesian network to perform the propagation, use a simplified network with respect to the full model. Only when the decision that would not have a large level of confidence perform propagation with the full model.

Open problems

Which simplifications work? Simplification can be done by selecting

Are there any other ways to perform simplification?

How to construct a simplified model: A simplified model can be constructed from the full model. For example, all arcs that are weaker (e.g., according to Kullback Leibner distance. Justification?) than a certain treshhold can be removed. Another way is to target a polytree topography, start with an empty structure and add arcs with the strongest dependency and give direction according to the direction in the full model (similar to Rebane and Pearl's polytree recovery algorithm).

In a machine learning context a simplified model can be selected that has a good fit with the data but has good computational properties. For example, when doing MCMC learning, pick a network with polytree structure that has the highest probability in the chain.

Are there other ways of selecting simplified models?

How to determine confidence of a decision? Since the full model is only used when the confidence in a decision is low, there has to be a way to determine this confidence with a computational cheap method. If probabilities of interest are close to 0.5, this may indicate low confidence.

Another way of determining the confidence of a decision is to calculate the minimum and maximum boundaries of probabilities. This would require a bit more computation, but it allows to determine whether a decision would be the same for upper and lower bound of the probability. If they are different, this is a sign to use the full model.

Structures that do not allow simplification. Consider the example, where we have a network with n binary variables v1,...,vn of which n-1 are mutually indepent (P(vi=1) = 0.5, 0<i<n) and the nth variable is the parity of the other n-1. So, the structure has arrows vi->vn. All arcs can be removed for cases where not more than n-1 of the variables are instantiated, but it leaves no way to detect that a decision has a low confidence. Therefore, it does not seem a good idea to remove arcs in this case. Another way of looking at this situation is to observe that any of the instantiations of any variable leads to all other probabilities to be unchanged to 0.5, which can be interpreted as a decision of low confidence.

Are there other situations where removal of arcs leave a network where low confidence in decisions cannot be measured?

Emperical performance: how does the simplified/full model perform on real life cases. Experiments are required to get some intuition here.

Parallel processing: Say we have two networks, B1 and B2 where B1 is the full network and B2 the optimised network. When having two processors available, the first can start performing calculations with B1 and the second with B2 taking the first valid result. If B2, which will finish first, comes with a result on which a decision can be based, stop inference in B1. If B2 does not report a result on which a decision can be made, wait for B1 to finish and make a decision based on its result.

Does this scheme result in significantly better results than performing sequential processing, given that calculation in the optimized network B2 will be very fast? Instead of 2 networks, we could take n networks, B1...Bn where B1 is the full network, Bn the network most optimized for speed at the cost of accuracy, and the networks Bn, 1<i<n in between increasingly optimized for speed with increasing i. What metric makes sense to determine the number of networks to consider? What is the optimal number of networks n to use?

Any comment? Pleas mail to Remco.


Last updated: 5 Dec 2000 (added Parallel processing). Initial update: 9 Nov 2000 webmaster