August 14, 1999
Dec. 11, 2010
In general, a sequence is an ordered list of elements. The
sequences we work with represents the history or predicted occurrence
of simple objects like the drops of rain falling on sheet of a paper.
We only care about a few attributes of the drops such as its size
or weight, and not its shape. We are primarily concerned about
when each drop fell, and the temporal duration between
when each drop fell, ie., the rate at which the drops fell.
Mathematically, we can write this concept of a sequence as
a function. This function describes the relationship between
the sequence itself and the rate of change of the sequence. I've
always between impressed and mystified at why the formula below
describing this elegant mathematical relationship works out so
efficiently and beautifully.
Our abstract description of sequences is written as a list of elements
in a set. Each element is a function of time dependent states. The
state values for the nerve cells and the transition matrices representing
rules for synapses are the fundamental objects in our computer code
simulating neural processes. I think wrote the equation above to always
remind myself that the internal workings of the software machinery
must always carry state information along each step in time.
Now imagine a sequence like S = (aaadadaa). It's relatively short and
simple containing eight elements. It's rate of change from a to d to a
again is slow. I could give this rate of change a numerical value. From
my experience in writing search algorithms, I would say that the prefix of
the sequence is more important than the suffix, but only because I usually
start my search process from the left side. Generally, I can remember this
sequence by thinking there are 3 a's followed by "dad". Triple-A Dad.
My focus on memorizing this sequence would be to cluster the a's and
associate the sequence d-a-d with a familiar object. Association is another
form of clustering.
In terms of an ordered structure, this binary sequence could look like
the one below if you followed the left branch first traversal method.
| S = (aaadadaa)
/ / \ You can see the pattern cluster "aaa"
a a d in the first left branch, and then "dad"
/ / in the right branch d off the root or first
a a node. / \
/ a d
In pattern recognition algorithms using sequences you can often
fit patterns which seem unique to you subjectively into branching
structures. The methods of clustering data which appear non-random
or outstanding to you are often times are the ones which optimizes
entropy production either to the minimum or maximum.
Over the years I've found that learning patterns from sequences
using the trail-and-error method works the best. Using simple
algorithms for string manipulation recursively also works best for me.
That is, use the same algorithms for processing symbols like the letters
(a b c d ...) also works the same with words containing complex
associations like (see jane run). It's the selection of simple
algorithms that survive in the long term that's the most important
I've also learned that it's easy to be fooled by simple hackery (*1).
A decade ago I had too much blind faith in how much order one could
extract from a notion like the reverse of entropy or negative entropy.
Using recursive programming does not intrinsically lead to more
complexity as I had intuitively though. You can see from recursive
programming the beauty in the creation of fractal structures, in the
leaves of the trees, the contours of the mountains, and the reflection
of light on water, but there's no naturally occurring decrease of entropy
Correlation in a temporal sequence was studied by Satosi Watanabe [1,2].
He called the measure of the strength of correlation in a temporal or
ordered sequence the correlation index, W. W is the correlation in
an ordered Markov sequence of a specific length. It may describe
sub-sequences which may contain redundant information.
Satosi Watanabe studied the decomposition of a very simple binary
sequence of level 4, that is, 2 to the 4th power.
Each of the four cells variables
have an entropy of 1, since black and white appear with equal probabilities.
The entropy of the intersecting set of variables
is log (1/16) or 4 log 2 or 4.
The total correlation is
= 1 + 1 + 1 + 1 - 4 = 0
The redundancy in the complete set of binary outcomes is zero.
From the above example, we can see more easily the general formula for
the total correlation for a sequence of length r is
 Satosi Watanabe, Information Theoretical Analysis of Multivariate Correlation, 1960, IBM Journal of Research and Development 4-1:66
 Satosi Watanabe, Information, 1972 article in Scientific Thought,
Mouton/Unesco 1972, Paris
*1. Long ago, for awhile, I worked alot on programming cellular automata
like L-systems. You could see visually the details of a plant simulation
for example on your screen, and you thought there must be something to
the information in the code being replicated. There seemed to be a
mystical power in the code.