Information Theory Workshop - January 2011

Information Theory Workshop

Short Course on Model Selection and Multimodal Inference

Dates: Wednesday and Thursday, January 12th & 13th, 2011

Time: 8:00 am – 5:00 pm

Location: The College of William and Mary
Williamsburg, VA 23187

Organizers: Romuald Lipcius and Gina Ralph

Goals/Objectives: This short course is an overview of these new methods and their underlying philosophy.  Several examples will demonstrate the application of these methods.  A hands-on session will take place on Thursday.

Overview:
A substantial paradigm shift is occurring in science and application. The past century relied on null hypothesis testing, asymptotic distributions of the test statistic, P-values, and an arbitrary ruling concerning “significant” or “not significant.” Under this analysis paradigm a test statistic (T) is computed from the data. The P-value is the focus of the analysis and is the Prob {T or more extreme, given the null hypothesis}. With this definition in mind, we can abbreviate slightly, Prob{X|H0}, where it is understood that X is the data or more extreme (unobserved) data. This is a so-called “tail probability.”

The null hypothesis (H0 ) takes center stage but is often trivial or even silly. The alternative hypothesis HA is not the subject of the test; support for the alternative occurs only if the P-value (for the null hypothesis) is low, often < 0.05). Support for the alternative hypothesis comes only by default.

The proper interpretation of the P-value is quite strained; this might explain why so many people erroneously pretend it means something quite different (i.e., the probability that the null hypothesis is true). This is not what is meant by a P-value.

These traditional methods are being replaced by “information-theoretic” methods (and to a lesser extent, at least at this time by a variety of Bayesian methods). They are termed “information-theoretic” because they are based on Kullback-Leibler information theory. These approaches focus on an a priori set of plausible science hypotheses, H1, H2, …, HR. Evidence for or against members of this set of “multiple working hypotheses” consists of a set of probabilities. Specifically, Prob{H1, H2, …, HR , given the data} or Prob{Hj|X}. These probabilities are direct evidence, where evidence = information = -entropy.

Simple evidence ratios allow a measure of the strength of the evidence for any two hypotheses. Note the radical difference in the probability statements (above) stemming from either a P-value or the probability of hypothesis j. Statistical inference should be about models and parameters, conditional on the data, however, P-values are probability statements about the data, conditional on the null hypothesis.

These new approaches allow statistical inference to be based on all (or some) the models in the a priori set (multimodel inference) and this is useful in prediction and well as getting robust estimates of parameters of particular interest. Alternative science hypotheses take center stage in these approaches and will require much more attention than in the past century (where one started with an alternative and the null was merely the nothing/naïve position; thus little science thinking was called for).

The set of science hypotheses “evolves” through time as implausible hypotheses are eventually dropped from consideration, new hypotheses are added, and existing hypotheses are further refined. Rapid progress in the theoretical or applied sciences can be realized as this set evolves, based on careful inferences from new data. This is an exciting time to be in science and biostatistics. There are important philosophies involved here; these approaches go well beyond methods for “data analysis.”

Schedule of Events: TBA

This overview course is based on the reference book,
Burnham, K. P., and D. R. Anderson. 2002. Model selection and multimodel inference: a practical information-theoretic approach. 2nd Ed., Springer- Verlag, New York, NY. 488pp.
and on the recent textbook (supplied as part of the registration fee),
Anderson, D. R. 2008. Model based inference in the life sciences: a primer on evidence. Springer, New York, NY. 184pp.

Cost: TBA

To register: Please complete the online Information Theory Workshop Registration Form.

The deadline for registration is December 22, 2010.