Ask Dr. SETI: Bayesian Statistics

The SETI League, Inc., a membership-supported, non-profit {501(c)(3)}, educational and scientific organization Searching for Extra-Terrestrial Intelligence

Departments

Membership Services
   General Info
   Financial Info
   Activities
   Awards
   Coordinators
   Director's Info
   Members' Info
   Policies
   Forms

Publications
   Official Publications
   Director's Publications
   Ask Dr. SETI ®
   Fiction
   Non-Fiction
   Reviews
   Reading Lists

Technical Support
   Systems
   Antennas
   Amplifiers
   Receivers
   Accessories
   Hardware
   Software

Press Relations
   Fact Sheets
   Local Contacts
   Editorials
   Press Releases
   Photo Gallery
   Newsletters
   Internet Svcs

Ask Dr. SETI ®

Chapter 4: Psychology

Bayesian Statistics

Dear Dr. SETI:
What are Bayesian statistics, and how can they be applied to SETI?

An amateur radio astronomer

The Doctor Responds:
Bayes' Theorem is an elegant tool, used extensively by psychologists, for analyzing conditional probabilities (by which we mean, occurrences which depend in some way upon one another). The relationship need not be causal, just correlated.

Let's say we have two events, A and B, which can occur individually, or in combination. Assume we know, or can compute, the probability that A will occur, given that B has. We indicate this conditional probability as P(A|B), pronounced "the probability of A, given B." We can use Bayes' Theorem to turn the problem around, and compute the converse, P(B|A), the probability of B, given A.

Trivial example: some people have blonde hair. Some have blue eyes. If we know the probability that a blue-eyed person will also be blonde, we can invoke Bayes' Theorem to compute the probability that a blonde will also be blue-eyed. So, set, theory is involved. The population of all people who are either blonde or blue-eyed is the union of two sets (the set of blondes, and the set of blue-eyed folks). The population of all people who are both blonde and blue-eyed is the (somewhat smaller) intersection of the two constituent sets.

Of course, we probably (oops-- wrong word to use here!) can't assemble all the blue-eyed people in the world, nor lay all the world's blondes end-to-end (though some might like to try). So, instead of dealing in populations, we often deal in samples: small, measurable sub-groups which we hope are representative of the populations in question. The degree to which a sample truly reflects the characteristics of the underlying population is a science unto itself, called sampling theory, and involves such factors as the population size, the sample size, replacement, and how well we shook up the box when drawing the sample.

What has all this to do with radio astronomy? Consider two possibily interrelated sets, one which represents natural astrophysical phenomena, and the other which represents a particular class of received signals. One research question might be: "to what probability (and to what degree of certainty) is a specific received signal representative of a particular astrophysical phenomenon?" Bayesian statistics can help us to answer that question.

Assuming our sample (what we've observed) is truly representative of our population (what's actually out there), we can set up the problem in terms of conditional probabilities. Say we can compute the probability (on a scale of 0 to 1) that a given astronomical event is associated with a particular microwave signature. If so, we can invoke Bayes' Theorem to compute the probability that a given received signal is associated with that kind of astronomical event. Blonde hair, blue eyes.

(Note that I say "associated with" here, and not "caused by." Establishing causal relationships is another matter altogether, requiring far more research.)

This approach works well with things like pulsars, and pulse trains. We observe that a certain class of rapidly rotating neutron star produces radio pulses. If we can compute the probability that a given class of pulsar will produce a particular pulse train, then if we receive that kind of pulse train, Bayes' Theorem lets us compute the probability that it came from that kind of pulsar. (Note that we are allowing here for the possibility that some other kind of event, like RFI, could also produce those very kinds of pulses).

What Bayes' Theorem does not work particularly well for is fields of study in which at least one conditional probability is not known. A good example of this kind of problem science is SETI. If we receive a candidate SETI signal, we want to know the probability that it was produced by intelligent aliens. So, we invoke Bayes' Theorem, which derives this estimate from the probability that intelligent aliens generate these kinds of signals. But, since we have yet to detect a single clear, unambiguous alien RF artifact, we don't yet know the second conditional probability, so how are we going to use it to compute the first?