The Global Spiral  is an e-publication of Metanexus Institute. Through articles, essays, book reviews, and news, the Global Spiral  explores humanity's most profound questions and challenges.
Email



If you enjoy this article, consider making an online donation to support the Global Spiral.

View / Add Comments ( 5 ) | Printer-Friendly | Email This Article


Specified Complexity and Information, Part 1/2

Reading the recent contributions from Matt Young and John Bracht, it is clear that confusion over the terms "specified complexity" and "information" continues unabated.  Unfortunately, proponents of Intelligent Design use these terms equivocally, with the result that they and their critics spend much of their time talking past each other.  In this post I hope to clear up some of this confusion.  I will concentrate on the work of William Dembski, who is the leading exponent of "specified complexity" and "information" as markers of intelligent design.  Metaviews subscribers with long memories may recall that I briefly addressed this topic in an earlier post to this forum[1], in which I called on Dembski to clarify his usage.  In his latest book, No Free Lunch[2], he has cleared up a number of points, but much ambiguity remains.

Dembski cites Leslie Orgel[3] and Paul Davies[4] as the sources of the term "specified complexity," so it will be useful to start by looking at how they define it.  Orgel describes the concept as follows:

"It is possible to make a more fundamental distinction between living and nonliving things by examining their molecular structure and molecular behavior. In brief, living organisms are distinguished by their specified complexity. Crystals are usually taken as the prototypes of simple well-specified structures, because they consist of a very large number of identical molecules packed together in a uniform way. Lumps of granite or random mixtures of polymers are examples of structures which are complex but not specified. The crystals fail to qualify as living because they lack complexity; the mixtures of polymers fail to qualify because they lack specificity.

"These vague ideas can be made more precise by introducing the idea of information. Roughly speaking, the information content of a structure is the minimum number of instructions needed to specify the structure. One can see intuitively that many instructions are needed to specify a complex structure. On the other hand, a simple repeating structure can be specified in rather few instructions. Complex but random structures, by definition, need hardly be specified at all."

Davies' approach is similar but he is more explicit, so I will concentrate on his.  He begins as follows:

"To bring out this point clearly, consider the way in which the four bases A, C, G and T are arranged in DNA. As explained, these sequences are like letters in an alphabet, and the letters may spell out, in code, the instructions for making proteins. A different sequence of letters would almost certainly be biologically useless. Only a very tiny fraction of all possible sequences spells out a biologically meaningful message, in the same way that only certain very special sequences of letters and words constitute a meaningful book. Another way of expressing this is to say that genes and proteins require exceedingly high degrees of specificity in their structure. As I stated in my list of properties in Chapter 1, living organisms are mysterious not for their complexity per se, but for their tightly specified complexity."

For Davies, an object exhibits "specified complexity" if it has both "specificity" and "complexity."  "Specificity" can broadly be thought of as indicating the presence of some special property.  The DNA sequence of organisms is "specified," because it is a member of that special set of sequences which code for a living organism.

To explain "complexity," Davies, like Orgel, introduces the idea of information as the minimum number of instructions needed to produce a structure.  This is the subject of a branch of mathematics known as "algorithmic information theory."  Information theorists usually refer to it as "Kolmogorov-Chaitin complexity" or just "Kolmogorov complexity," after the mathematicians who developed it.  But it is also more loosely referred to as "information," "complexity," "randomness," and "incompressibility."  All these terms are appropriate descriptions of the concept, but their multiplicity can be confusing!

Unlike Orgel, Davies specifically refers to "algorithmic information" and "algorithmic complexity," and he mentions Gregory Chaitin by name, so there is no doubt that the "complexity" he has in mind is that of Kolmogorov and Chaitin.  He gives the following example of a structure with very low "information content":

"10101010101010101010101010101010101010101010101010"

This sequence of binary digits has a simple pattern, and so could be compressed to the instruction "print 10 twenty-five times."  It therefore has low Kolmogorov complexity.  On the other hand, a randomly generated sequence of digits, such as the following, is extremely unlikely to exhibit any significant pattern, and can hardly be compressed at all:

"11100111100000100001110100100000101001011001000110"

It would therefore be considered to have high Kolmogorov complexity.  It may seem strange that a random sequence is considered to have high "information," but this type of information must not be confused with significance.  If this sequence represents a series of coin tosses, say, then it gives us information about the outcomes of all the tosses, and, unlike the repetitive sequence above, this information cannot be reduced to a compressed equivalent.  Whether or not this information has any significance for us depends on the context.

It should be noted that the criteria of "specificity" and "complexity" are to some degree in contradiction.  Stating that a sequence matches a particular specification (e.g. that it codes for a living organism) reduces the possibilities and therefore makes it easier to compress.  For example, if there are N possible sequences but only M of these match the specification in question, then it may be possible to compress the sequence to just a description of the specification (which narrows down the sequence to just M possibilities) plus a number between 1 and M (which identifies a particular sequence out of those M possibilities).  Davies appears to be aware of this contradiction:

"A functional genome is both random and highly specific--properties that seem almost contradictory. It must be random to contain substantial amounts of information, and it must be specific for that information to be biologically relevant."

To sum up, Davies considers a phenomenon to exhibit "specified complexity" if it is relatively incompressible and matches a "specification" of some sort.  As an example, he gives the following portion of the genome of the virus MS2 (converted to binary digits), which appears to be incompressible and presumably matches the specification of being biologically functional:

"...010001110111010010011100110101101011101110101000010..."

How does this compare with Dembski's use of the term?  First let us consider Dembski's definition of "complexity" on its own, without the qualifier "specified."  Dembski simply uses "complexity" as a synonym for improbability.  Statements such as the following are commonplace in his work:

"Complexity as I am describing it here is a form of probability.... Complexity and probability therefore vary inversely: the greater the complexity, the smaller the probability. Thus to determine whether something is sufficiently complex to underwrite a design inference is to determine whether it has sufficiently small probability." [NFL, p.9]

It becomes clear that Dembski is not only defining "complexity" as improbability in a loose sense, but that he defines it precisely as -log_2(p), where log_2 represents a base 2 logarithm and p represents a probability. As far as I can see, he does not state this explicitly, but it follows from certain statements that he makes:

- He writes that "Complexity as I am using it here is in the information-theoretic or Shannon sense" and then goes on to write:  "The (Shannon) information I(A) associated with an event is by definition -log_2{P(A)}, where P(A) is the probability of that event and the logarithm is taken to the base 2." [NFL, p. 230]

- He defines a "universal probability bound" of 1/(10^150) and a corresponding "universal complexity bound" of 500 bits. Inserting p=1/(10^150) into the formula -log(p) gives a result of approximately 500.

Contrary to Dembski's assertion, this is not complexity "in the information-theoretic or Shannon sense".  Nor is it complexity in the everyday sense of the term, meaning complicated. It is simply a rescaled measure of improbability.  The function -log_2(p) is a monotonically decreasing function of p in the range 0 < p < 1, meaning that higher "complexity" always corresponds to lower probability, and vice versa.

Dembski's relabelling of improbability as "complexity" serves no useful purpose; it merely serves to create confusion.  In a recent article[5], where he wants to refer to both improbability and complexity in the same paragraph, he is forced to invent two new terms, "probabilistic complexity" (i.e. improbability) and "patterned complexity" (i.e. complexity in the normal sense).  Such problems would be avoided if he simply referred to improbability as "improbability."

Having considered Dembski's use of the word "complexity" on its own, let us now turn to "specified complexity."  Dembski's clearest and most formal definition of "specified complexity" is provided in his General Chance Elimination Argument, the final two steps of which read as follows:

"#7: S calculates the probability of the rejection region R conditional on each of the chance hypotheses in {H_i} and determines that P(R|H_i) < alpha for all i in the index set I.

"#8: S is warranted in inferring that E did not occur according to any of the chance hypotheses in {H_i} and therefore that E exhibits specified complexity." [NFL, p. 73]

A full discussion of Dembski's notation is beyond the scope of this post.

But the following brief comments will suffice here:

- S represents the subject making the inference; - E is the observed event or object whose origin is in question; - R is a rejection region or "specification" based on E; - H_i is a "chance hypothesis";
- alpha is a small probability bound; - P(R|H_i) means the probability of R if H_i is true.

Dembski uses the term "chance hypothesis" to denote any hypothesis regarding the origin of E which does not involve intelligent design, and which is sufficiently detailed to enable us to calculate the probability of E's occurrence.  Despite the name, the hypothesis may even be purely deterministic, conferring a probability of 1 on E. The set {H_i} is the set of all "relevant" chance hypotheses, i.e. all those which we think may have been responsible for E.

In short, Dembski tells us to consider all the chance hypotheses which we think may account for E, and calculate the probability of E (plus any other potential outcomes matching the same specification as E) on the basis of each hypothesis.  If the probability is small enough (less than alpha) for a particular hypothesis, we should reject that hypothesis as a reasonable explanation for E.  If we can reject all available chance hypotheses, we should "infer" specified complexity.

Dembski's word "infer" is misleading here, since this is a definition of specified complexity.  We do not "infer" specified complexity; we observe a state of affairs and we choose to call it "specified complexity."  Dembski then goes on to tell us that when we observe specified complexity we should infer that intelligent design was involved in the origin of the object (more on this below).

We can now see that Dembski's "specified complexity" is merely a label we apply to an object for which we do not currently have a viable detailed explanation, i.e. one which is detailed enough to allow us to calculate a probability and where the probability turns out to be reasonably large.  The "specified" part of the label arises because Dembski is summing the probability of all outcomes matching a "specification."  It might be argued that Dembski's meaning of "specification" is similar to that of Orgel and Davies.  However, his meaning of "complexity" is quite different from theirs.  As we have seen, it is merely improbability in disguise, and must be calculated with respect to all the chance hypotheses we can think of. Orgel and Davies, on the other hand, use "complexity" to mean Kolmogorov complexity, i.e. incompressibility, which does not depend on probability at all and so is independent of any chance hypothesis.  Clearly, Dembski's "specified complexity" is quite different from that of Orgel and Davies.

Consider an example. Dembski's only complete (well, almost complete) example of determining specified complexity is the Caputo case.  In that case, the following sequence (which we take as E) is observed:

"DDDDDDDDDDDDDDDDDDDDDDRDDDDDDDDDDDDDDDDDD"

(this sequence contains 40 'D's and 1 'R').  Given the background of the Caputo case (which need not concern us here) Dembski decides there is only one relevant chance hypothesis H, namely the hypothesis that each symbol in the sequence had equal probability (1/2) of being either 'D' or 'R', as in a sequence of 41 coin tosses.  He also decides that the appropriate specification (or rejection region) R is "40 or more 'D's out of 41", and calculates a corresponding probability of 1 in 50 billion, i.e. P(R|H) equals 1 in 50 billion.  He then calculates a value of 1 in 25 billion for alpha, so P(R|H) < alpha, and therefore rejects the chance hypothesis H.  Since this was the only relevant chance hypothesis, he concludes that E exhibits specified complexity.  Furthermore, by a very similar argument, the sequence

"DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD"

(all 41 'D's) would also have exhibited specified complexity.  But these are highly compressible sequences, quite the opposite of the incompressible sequences which exhibit specified complexity according to the usage of Orgel and Davies.

So Dembski's "specified complexity" is quite different from that of previous writers.  Regrettably, however, he persistently conflates the two meanings. In No Free Lunch, he repeats the following passage from Davies no less than four times:  "Living organisms are mysterious not for their complexity per se, but for their tightly specified complexity."  He also quotes from Orgel.  Yet he never mentions that his meaning of the term is quite different from theirs, leaving readers to assume that they are the same.

------------

REFERENCES

1. Richard Wein, "What's Wrong With The Design Inference" (retitled as "Wrongly Inferred Design"), Metaviews online forum, October 2000, http://www.metanexus.net/Magazine/tabid/68/id/2654/Default.aspx.

2. William Dembski, No Free Lunch: Why Specified Complexity Cannot be Purchased without Intelligence, Rowman & Littlefield, 2002.

3. Leslie Orgel, The Origins of Life, Chapman & Hall, 1973.

4. Paul Davies, The Fifth Miracle, The Penguin Press, 1998.

5. William Dembski, "Obsessively Criticized but Scarcely Refuted: A Response to Richard Wein", May 2002, http://www.designinference.com/documents/05.02.resp_to_wein.htm


Did you enjoy this article? ... Your donation is tax-deductible to the fullest extent of the law.

Separater


Published   2002.09.11
Comments: Share your thoughts on this article:
View / Add Comments ( 5 )
Printer-Friendly | Email This Article


©1997-2008 Metanexus Institute
www.metanexus.net
Making Sense of Evolution
Politics by Other Means