August 14, 1999
Dec. 11, 2010


In general, a sequence is an ordered list of elements. The sequences we work with represents the history or predicted occurrence of simple objects like the drops of rain falling on sheet of a paper. We only care about a few attributes of the drops such as its size or weight, and not its shape. We are primarily concerned about when each drop fell, and the temporal duration between when each drop fell, ie., the rate at which the drops fell.

Programming Sequences

Mathematically, we can write this concept of a sequence as a function. This function describes the relationship between the sequence itself and the rate of change of the sequence. I've always between impressed and mystified at why the formula below describing this elegant mathematical relationship works out so efficiently and beautifully.

Our abstract description of sequences is written as a list of elements in a set. Each element is a function of time dependent states. The state values for the nerve cells and the transition matrices representing rules for synapses are the fundamental objects in our computer code simulating neural processes. I think wrote the equation above to always remind myself that the internal workings of the software machinery must always carry state information along each step in time.

Now imagine a sequence like S = (aaadadaa). It's relatively short and simple containing eight elements. It's rate of change from a to d to a again is slow. I could give this rate of change a numerical value. From my experience in writing search algorithms, I would say that the prefix of the sequence is more important than the suffix, but only because I usually start my search process from the left side. Generally, I can remember this sequence by thinking there are 3 a's followed by "dad". Triple-A Dad. My focus on memorizing this sequence would be to cluster the a's and associate the sequence d-a-d with a familiar object. Association is another form of clustering.

In terms of an ordered structure, this binary sequence could look like the one below if you followed the left branch first traversal method.

          |           S = (aaadadaa)
         / \
        a   d
       /   / \       You can see the pattern cluster "aaa" 
      a   a   d      in the first left branch, and then "dad"
     /       /       in the right branch d off the root or first
    a       a        node.              / \
           /                           a   d

In pattern recognition algorithms using sequences you can often fit patterns which seem unique to you subjectively into branching structures. The methods of clustering data which appear non-random or outstanding to you are often times are the ones which optimizes entropy production either to the minimum or maximum.

Over the years I've found that learning patterns from sequences using the trail-and-error method works the best. Using simple algorithms for string manipulation recursively also works best for me. That is, use the same algorithms for processing symbols like the letters (a b c d ...) also works the same with words containing complex associations like (see jane run). It's the selection of simple algorithms that survive in the long term that's the most important thing.

I've also learned that it's easy to be fooled by simple hackery (*1). A decade ago I had too much blind faith in how much order one could extract from a notion like the reverse of entropy or negative entropy. Using recursive programming does not intrinsically lead to more complexity as I had intuitively though. You can see from recursive programming the beauty in the creation of fractal structures, in the leaves of the trees, the contours of the mountains, and the reflection of light on water, but there's no naturally occurring decrease of entropy in this.

Binary Sequences

Correlation in a temporal sequence was studied by Satosi Watanabe [1,2]. He called the measure of the strength of correlation in a temporal or ordered sequence the correlation index, W. W is the correlation in an ordered Markov sequence of a specific length. It may describe sub-sequences which may contain redundant information.

Satosi Watanabe studied the decomposition of a very simple binary sequence of level 4, that is, 2 to the 4th power.

Each of the four cells variables have an entropy of 1, since black and white appear with equal probabilities.

The entropy of the intersecting set of variables is log (1/16) or 4 log 2 or 4.

The total correlation is

= 1 + 1 + 1 + 1 - 4 = 0

The redundancy in the complete set of binary outcomes is zero.

From the above example, we can see more easily the general formula for the total correlation for a sequence of length r is


[1] Satosi Watanabe, Information Theoretical Analysis of Multivariate Correlation, 1960, IBM Journal of Research and Development 4-1:66
[PDF image]

[2] Satosi Watanabe, Information, 1972 article in Scientific Thought,
Mouton/Unesco 1972, Paris
[PDF image]

*1. Long ago, for awhile, I worked alot on programming cellular automata like L-systems. You could see visually the details of a plant simulation for example on your screen, and you thought there must be something to the information in the code being replicated. There seemed to be a mystical power in the code.

next: Circuits