22.6.1988, submitted for the IEEE special issue on neural computing

NEURAL NETWORKS AS MEASUREMENT TYPE COMPUTERS
some theoretical reasons for non-algorithmic information processing

Ilkka Tuomi
Nokia Research Center Finland P.O.Box 780, 00101 Helsinki

ABSTRACT
There are theoretical limits for computers which can not be overcome without reconsidering the current ideas of information and its processing. In this article it is argued that some difficult questions in artificial intelligence can only be solved if seen as problems on dynamical systems theory. The natural way to implement information processing systems for these difficult tasks is in neural networks, constructed from analog elements. The starting problem of a Turing machine is introduced, and assumptions underlying most neural network simulations are analyzed in the context of modern statistical physics, algorithmic information theory, and syntactical systems. It is shown that typical neural network simulations use only an extremely limited amount of the processing power available to non-linear analog networks of artificial neurons.

1. Introduction: Problems in artificial intelligence

The most commonly discussed problem in artificial intelligence is the combinatorial explosion. If we have finite space and time, we can, in general, solve only problems which "size" grows polynomially with space and time [11]. The reason that this problem is widely recognized is that most simple strategies in artificial intelligence (AI) produce algorithms that explode with the size of the problem. Exhaustive search of possible solutions is a typical example.

Also commonly recognized problem in AI is the context dependency of natural language [2]. In other areas in AI this is seen as a problem of defining "frames" [26] or "scripts" [36]. In feature recognition tasks essentially the same problem arises, as an example, in making the difference between the figure and the ground. In this article, context-independece is taken to mean that the value, operation, or meaning of an entity can be defined with references to only its well defined neighbors. Such context-independence implicates, then, that the object under discussion is local, in some practically describable way.

From more logical and theoretical perspectives, the less discussed problems of decidability, or computability, seem to have an effect on the interpretation of the status, and future, of artificial intelligence. Especially, these perspectives are interesting in the context of artificial neural networks, recent developments in theoretical physics, and algorithmic information theory. It seems that a fresh look into the old questions of computability reveal severe limitations in conventional approaches to artificial intel¬ligence.

As a result, it appears that ideas presented in the 40's by the developers of the digital computer, especially Wiener and von Neumann, should be re-evaluated. It may be that we should take some lessons from history, and continue the discussion from the point it reached 40 years ago.

I will argue that von Neumann computers and conventional parallel computers are realizations of the same basic concepts of computing. I will also try to show that these concepts of computing allow us only a very limited subset of information processing methods that we need for artificial intelligence. To have intelligence equal to a fly, or a man, we need new concepts for computing and computers. These concepts are naturally implemented in artifi¬cial neural networks. At the same time, it will be argued that most current research on artificial neural networks is also based on untenable ideas of information processing.

I will be discussing algorithmic information processing, so it may be worth remembering that computers were developed for numerical work. There were especially one practical problem, solving partial differential equations, which dictated many important decisions on computer design. Both von Neumann and Wiener quite explicitly discuss this question in their writings [46], [47]. After four decades we don't always remember that our computer architectures are based on the idea that its main purpose is to take initial values as input, iterate great many small steps guided by exactly given differential form, and finally reach the result without losing any digits we don't want to lose.

"The ideal computing machine must then have all its data inserted at the beginning, and must be as free as possible from human interference to the very end," writes Wiener. And he continues:

"This means that not only must the numerical data be inserted at the beginning, but also all the rules for combining them, in the form of instructions covering every situation which may arise in the course of the computation. Thus the computing machine must be a logical machine as well as an arithmetic machine and must combine contingencies in accordance with a systematic algorithm." [47] p.118.

This citation should not be taken to indicate that Wiener would have been unaware of the limitations of algorithmic computers. This is not the case. Only later generations have been able to see the incomparable success of computers to the extent that they have become blinded with it. The idea of computer as a Turing machine, able to calculate everything, is more or less an indication of the phenomenal success of IBM in the 60's.

It is true that a digital computer is equivalent to a universal Turing machine. As discussed below, however, computations allowed by a Turing machine are only a subset of those needed in artificial intelligence.

2. The promise of distributed processing

It is easy to view the von Neumann machine as an informa¬tion channel. It has an input, a processing unit, an output, and some memory to store the details of data dependent "transmis¬sion equations." Typically, artificial intelligence implements this approach to attack its potential problems. For example, it is quite common to see the problem of visual pattern recognition as a problem of seeing the intensity changes of light (initial values), analyzing these intensities (transfor¬mation), and finally producing the output as some kind of "conclusions," "features" or "classifications."

The recent enthusiasm in artificial neural networks can be easily understood in this context. Many workers in artifi¬cial intelligence are frustrated after finding most of the most interesting problems too difficult to solve with conventional methods [24]. The strategy, then, is to divide difficult problems into many small subproblems. If you have potentially thousands or billions of simple "neural processors" these subproblems will be small and easily solv­able.

Neural networks are usually defined as networks of very many simple "neuron-like" elements, which, in spite of their simplicity, as by miraculous emergence can calculate complex functions. And, as if this would not be good enough, at the same time we can get rid of the problem of knowledge representation. If we have neural networks which create their own representations by learning, and distribute those representations in their hidden network structures, the problem of knowledge representation becomes so untractable that not many will be willing to talk about it.

However, it should be remembered that most artificial neural nets are based on elements that are essen¬tially, or exactly, McCulloch-Pitts model neurons, developed in the beginning of the 40's. These models take "neurons" to be binary logical elements (see e.g [25], [21]). This is most certainly an extremely trivializing assumption. There are quite probably more than 30 different transmitters in the brain [19], at least in simple invertebrates neurons are different enough to allow unique naming [20], and the electrical behavior of a single neuron is far from a simple threshold element [37]. Neural models which would use the current knowledge of real biological neurons are rare.

Information in an open system

It is easy, or at least usually possible, to create a toy problem in AI, and solve it using algorithmic methods. As soon as we want our system to function also in an undefined environment, we get frustrated. As Dreyfus has noticed, historically in AI there has been a tendency to define "progress very carefully as 'displacement toward the ultimate goal.' According to this definition, the first man to climb a tree could claim tangible progress reaching the moon" [5]. This is the fundamental scaling problem of AI.

The scaling problem results from the fact that the real world is an open system. Conventional concepts of computing don't provide methods to efficiently deal with open systems.

For example, think about a typical task in robotics. A mobile robot is searching a path through a building. It tries different possibilities, guided by heuristics in its knowledge base, and updates its database according to its findings. We know that this problem is unsolvable in general. If our heuristics are not powerful enough to cut, on average, all the branches in the robots search tree except one, the size of the problem will explode exponentially when the building gets bigger (Hogg et al. (1987)). However, this is only the classical side of the problem. In real situations we have two additional complications.

The first complication is that the world changes. If there is an unfriendly person in the robot laboratory, he can open a door after the robot has decided that it is closed. This is normally seen as a change in the data base. Equally well it can be seen as a change in the axiom system underlying the algorithm in question during the proof. It is not fair to change the rules of a game during the play, of course. In real world, however, situations, not only data but also the difference between data and noise, change faster than we can know. If we admit this, we are in trouble, so normally we don't.

A popular way to react to open systems is to develop non-monotonic logics (e.g. [10], [2]). In non-monotonic logics new assertions can invalidate prior conclusions. The assumption behind various "truth maintenance systems" needed for these logics, is that adding a new assertion to the system doesn't create avalances of exªtruths needing to be updated. Below it will be shown that this assumption is not valid in general.

The second complication is that it is not possible to exactly define the attributes in an open world. For example, a robot encountering a partially opened door will have to decide whether that counts as an open door for its purposes. This depends on the particular task at hand. The problem can be to see into the room or to get there. If the door is only partially open, getting into the room depends also on the size of the robot [16]. These attributes are exact, in the sense of typical logics, but only after we know the context.

Information can be defined in traditional information theory only if the possible messages communicated through the channel can be defined. If the difference between the noise and the signal is created only after interpretation, there is no way to calculate the amount of information trans¬mitted. This corresponds to the situation in classical logic, where axioms and syntax have to be formalized before theorems can be proved. Truths are true always, so there can be no time dependency in the frozen logical universe. Temporal logics deal with this problem by slicing the time into such small intervals that the logical universe becomes an enumerable succession of frozen universes. The collective effect of the needed infinitely many discontinous switchings from one point of time to the next, are taken to be nil, without any clear reasons.

In open systems the results of information processing arise from co-operative effects, or interactions of information flows. In statistical physics, open systems lead to discussions of the rate of production of entropy [29]. In AI this problem is approached from different directions, and leads, for example, to fuzzy logic [49]. Although information and entropy are closely connected concepts, the classical Shannonian information is weaker in the sense that it is inadequate to describe open systems.

Normal computing analogies are necessary misleading when we talk about open systems. The underlying difficulty is that we have to be able to define the program, the alphabet, and the transition rules of our Turing machine, before we can start the computation. If the definition takes infinite amount of time or space, or more than there exists in our universe, we will never start the machine. On the other hand, if the program or the initial data changes faster than we can describe it, we will never have the information needed for computation. This could be called the starting problem of a Turing machine.

3. Computer as a syntactic information processor

If we consider parallel computer architectures, we realize they are not practically viewed as simple informa¬tion channels. There are clearly two essentially different types of information around. We have some initial values, or flow of input data, that is distributed to the inputs of all processors. However, in addition there is very much processing of control information. This control information is used to coordinate and synchronize the numerous processors. As the number of processors increases, the work needed to write algorithms that transform input to output decreases, but the algorithms of the operating system get increasingly complex [39].

It is easy to recognize parallel computer programm­ing is difficult in general. When the computer hardware can not be connected to model our problem, we have to make great efforts to program control code in the operating system. We can take, and maybe even should take this difficulty as an anomaly, a symptom of some basic deficiency of our underlying model of computation, or programming. On the other hand, we can reasonably ask if a system with hundreds or thousands of processors should be discussed in terms of Turing machines. Is there not a more natural metaphor for such a complex machine?

Von Neumann computers and conventional parallel computers are realizations of the same basic concepts of computing. These concepts of computing allow us only a very limited subset of information processing methods which we need for artificial intelligence. To see why this is so, it is instructive to consider problems that can be solved by a computer.

If we have a differential equation:

dx/dt = f(x)

we can solve it if some initial value is known, and if we can tell the computer what is the function . To get a value for x at a certain moment of time, we just take the initial value, add a small time-step multiplied with  f , and iterate until the result emerges.

Now, two problems arise. The first is: what types of functions we can approximate the initial values with a limited number of digits, without making too big errors in our calculation. This question is discussed later in this article, in the contexts of deterministic chaos and algorithmic information theory. The second question is, what kind of functions we can describe to our computers.

It may seem, at first, that it is possible to write down an algorithm which describes any function we would like to specify. We certainly can write down functions which consist of limited number of additions, multiplications, and divisions. We quite easily also can use all elementary functions, be they trigonometric, logarithmic, or exponential. Even if we don't have a closed, algorithmic form for our function, we can use table lookup to find the needed values. As we have limited space and time, however, the table necessary is inaccurate.

So, we get the feeling that our computer is quite a practical and powerful machine. However, differential equation is a very special relation. When it is valid, it means that the rate at which the variable changes at some point , depends only on the value of the function in the immediate neighborhood of . Rosen calls these kinds of equations "purely syntactic objects" [34].

This amounts to saying that differential equa¬tions can be calculated because they are context-independent. Differential equations are local relations. We can compute them because, by definition, their values can be calculated without any reference to the values of the function in other points of space. In practice, of course, our computer is a discrete machine that needs values of the function in the immediate well defined neighborhood of the point in question.

To simulate an equation with a computer, we have to be able to write a string of symbols which describe the equation; its initial values and representation of the functional dependencies. If we want to manipulate a string of symbols with an algorithm, it has to be syntactic. There has to be a definite, describable way to declare the transforma¬tions which lead from the initial values to the result. We can not represent non-local effects without creating com¬binatorial explosion. This means that we can not represent non-local relations in general. It can be shown that most systems are not simple syntactic systems. For example, a general vector field cannot be described to a Turing machine in a systematic way [34]. When we compute these kinds of objects, we deform our original problem to a form where it is purely syntactic. At the same time, we lose all those phenomena that cannot be approximated this way.

If we consider information processing machines as dynamical systems, there is not much difference between multiprocessing machines and single proces¬sor von Neumann machines. They can simulate with a program only simple syntactic systems. The practical point is that von Neumann machines are simpler to program with the existing methods. While parallel transputer-systems or highly connected machines with more than 64 000 processors are valuable for, as an example, simulations of neural networks at some algorithmic and syntactic level, they can not solve any of the difficult problems in AI [27]. Any finite number of processors is inadequate if the processors are to be used to process initial data with an algorithm [11], and if the simulated system is not a syntactic object.

Digital computers process only syntactic objects

The feeling of computational power originates from the fact that all digital "information processing" is concerned with syntactical objects. Computer is typically used to manipulate and compare character strings or, what is essentially the same thing, to compute simple calculations or differential equations. In artificial intelligence we are not trying to calculate any special differential equation, given the initial values. We are trying to simulate or imitate the intelligent functions of man, which amounts to simulating the whole intelligent system on at least some level ( see e.g. [28], [36], [2], [26]). There seems to be no a priori reason to believe that these systems are syntactic and possible to simulate with an algorithmic program. In this respect we don't have to make many philosophi¬cal assumptions concerning AI. The situation is same if we are simulating a man or a fly.

Computers are machines that were developed to calculate differential equations. The computer theoreticians of the 40's were the best mathematicians in their time, and the assumptions underlying their suggestions were the best ones possible. The remarkable thing is that, as an example, Wiener takes seriously profoundly alternative assumptions, which only four decades later seem to receive more general support. Also von Neumann clearly recognizes the limits of the algorithmic method. When discussing the problem of recognizing different types of triangles as triangles he remarks:

"Now it is perfectly possible that the simplest and only practical way actually to say what constitutes a visual analogy consist in giving a description of the connections of the visual brain. We are dealing here with parts of logic with which we have practically no past experience. ... We have no right to assume that the logical notations and procedures used in the past are suited to this part of the subject. It is not at all that certain that in this domain a real object might not constitute the simplest description of itself, that is, any attempt to describe it by the usual literary or formal-logical method may lead to something less manageable and more involved.
... There is an equivalence between logical principles and their embodiment in a neural network, and while in the simpler cases the principles might furnish a simplified expression of the network, it is quite possible that in cases of extreme complexity the reverse is true" [45].

What von Neumann actually here says is that processing information in a neural network can locally create information. This means that an algorithm can not be used to simulate real neural networks. If we have a set of axioms, and some way to encode those axioms into symbols, the informa¬tion content of this logical system can not exceed the total informa¬tion in our string of axioms. Logical proof is an exactly defined set of transformations, and information created in this kind of syntactic logical system is zero. In proving new theorems we produce new ways to encode the original information con¬tained in our axioms. If the logical system is independent of context, all these codings are equivalent. This corresponds to classical deterministic dynamical system. If its state is known at some point of time, its future and past will be uniquely determined. The only information in a deterministic system is in its initial values. However, as will be discussed below, the amount of information embedded in initial values quite often is infinite. This is another reason that makes the use of algorithms impossible.

An algorithm is, at best, a perfect information channel. It can preserve information of its initial values, or act as an information sink. As it never, in the context of Turing machines and classical information theory, can act as an information source, it also is unable to simulate systems with information sources.

4. Systems in phase space

It has been known for some years now that extremely simple real systems are beyond our computers capac¬ity. Those problems for which we can have a solution by knowing their initial values, are extremely rare [42], [38]. If a dynamical system dissipates energy, if its description is on collective level or if it has more than two degrees of freedom, we have good possibilities to be thrown out of the area of classical mathematics and science. Differential equations won't help us. Algorithms won't help us. Traditional logic won't help us.

That this is the case, can be seen from many directions. On the most fundamental level, the underlying idea is based on the theory of deterministic chaos [7], [13], [32]. To see how the recent results in the theory of dynamical systems effect our view of computing, we need to discuss some properties of classical dynamical systems.

Deterministic chaos in dynamical systems

To completely define the position of a system of N classical physical particles in space, it is necessary to specify 3N co-ordinates. The number of independent quantities which must be specified in order to define uniquely the position of a system is called the number of degrees of freedom. If we have a system with s degrees of freedom and some s quantities which completely define the position of the system, these s quantities are called generalized co-ordinates of the system, and their time derivatives are called generalized velocities.

The classical physics is based on the idea that when we know the generalized co-ordinates and velocities, the dynamical system is completely defined. All higher derivatives, especially acceleration, are given as functions of the position and velocity. The relation between these quantities is called the equation of motion. Using this equation, we can calculate the future, as well as the history of the system with unlimited accuracy.

The classical way to write Newtonian dynamical equations is only one possible alternative. Dynamical systems can more generally be formulated in terms of Lagrangian or Hamiltonian functions [22]. This formalism has the advantage that it can, with small modifications, also be applied for other physical formalisms, e.g. statistical physics and quantum mechanics.

We don't need to go into details of these formalisms here. It suffices to realize that the only thing we need to describe a dynamical system is the knowledge of its generalized position co-ordinates and their first time derivatives. If we have N particles, and a co-ordinate system with 6N generalized co-ordinates, each point in this co-ordinate system completely specifies the state of the system.

This co-ordinate system is called the phase space. If we have our system in a certain initial state, it can be visualized as a single point in this 6N-dimensional phase space. When the system evolves in time it goes through succeeding points in the phase space, defining a trajectory in this space.

There are simple classifications of the behavior of a system in phase space. Typically the system will either be attracted by some point in the phase space or its trajectory will asymptotically evolve into a limit cycle. The future of the system will depend on its initial state. Each asymptotical¬ly stable point or cycle in the phase space will have a base of attraction, and if the system is in this base, it eventually will find itself in the close vicinity of its attractor [15].

The classical dream of determinism can be viewed as saying that two trajectories can never cross in phase space. If two systems are in the same state at a certain moment of time, they necessary have the same history and future.

When we want to predict future, in principle we have to know exactly the point where our system is at some defined moment of time. The interesting question is, how much our predictions will fail when our knowledge of the initial values is not perfect. The interesting answer is that model examples of classical physics are such that their trajectories converge to point attractors and limit cycles. If we make approximations in initial values, the effect of these ap¬proximations decrease in time [15].

On the other hand, the emphasis here is on the fact that only model examples of classical physics behave this way. Almost all dynamical systems are able to behave chaotically. A typical simple example of a potentially chaotic system is a pendulum in more than one dimension [40]. Two neighboring points in phase space will in most cases lead to exponentially diverging trajectories. It means that whatever finite amount of information we have of the initial state of the system, it will be inadequate to predict its future.

Under these conditions the initial value problem will be uninteresting. We can never have any physical system which could simulate by calculation the future of any chaotic physical system in this classical sense. As far as we are dealing with extremely simple text book systems, we can believe that our methods are adequate. As soon as we realize there is something wrong with the method, per se, we see that most of it is wrong. Mathematically speaking, the volume covered by classical trajectories in phase space is of measure zero.

Theoretically, then, there are no a priori reasons to believe that we can describe natural information processing systems in the classical way. It is not necessary so that the system is in a state, where the effects of imperfect initial information diminish with time. The classical Newtonian description is, as it typically leads to chaos, in most cases self-contradictory. If we want to have a theory of artificial intelligence it most probably has to be based on different philosophical assumptions.

It is not clear what kind of assumptions we need. The first natural alternative to Newtonian description could be a statisti¬cal description. If we can not follow the exact trajectories of real physical systems with any artificial machines, maybe we could still say something about the general properties of the different areas in their phase space. This is a new and very difficult question. As far as I know, there are only some speculations of possible answers. It is possible that the traditional AI assumptions have to be replaced with quite unconventional approaches (e.g. [35], [39], [3], [9]).

5. Computable numbers

It may be, at first, a little difficult to see the connec¬tion between classical Newtonian particles and computer algorithms, or artificial intelligence in general. To clarify the situation, I will rephrase the same fundamental connection in a more conventional context.

Half a century ago, Turing published his paper "On computable numbers, with an application to the Entscheidungsproblem" [44]. In that remarkable paper Turing constructs a universal Turing machine that can simulate any other Turing machines. By using the method of Cantor's he constructs an incomputable real, from which he deduces the unsolvability of the halting problem. As a corollary, he gets a form of Gödel's incompleteness theorem.

This means, at least in principle, that for half a century we have known there are correct results of computations which can not be calculated with a Turing machine. The recent results of Gregory Chaitin show that these numbers are not exceptional. In summary, the reals between zero and one constitute an interval of length one, and the subset that are computable can be covered by intervals whose total length is arbitrarily small. The probability of picking up a computable real is zero. And, what is more disturbing; there are well defined simple mathematical questions, for which answers fluctuate in a completely unpredictable manner. The truth or falsity of the assertion that there are infinitely many solutions to certain equations is indistinguishable from the result of independent tosses of a fair coin [1].

As Chaitin puts it: "Such questions escape the power of mathematical reasoning. This is a region in which mathematical truth has no discernible structure or pattern and appears to be completely random."

Our universal Turing machine appears to be not so universal after all.

If the structure of algebra is random, as it is according to Chaitin's results, it also means that generally valid technical realizations of non-monotonic logics are impossible. Each time we add a new assertion to a theory the truth of all previous theorems fluctuate in a completely random manner. The only way to find out the truth value of a well formed formula is by deducing it from the proper axioms. This is at least as complex a task as an exhaustive search. Such non-monotonic logics could be said to be chaotically sensitive to their initial values, or proper axioms.

If we think our logical theorems as points in the high dimensional space of well formed sentences, we see that each specific proof can be represented as a path in this space. A set of axioms generates the co-ordinate axes of this space. The assumption of completeness of the axiom system can be interpreted as saying that all points in this space that are true can be reached using paths starting from any single axiom point. That we can not reach all points in this "truth space" we have known since Gá”ádel. That the volume we can reach from any single point in this space seems to be zero simply indicates that the ergodic assumption, fundamental to statistical physics [33], [32], is invalid.

6. Dynamical information processing

As the necessary assumption of insensitivity to errors in initial values seem to be difficult to realize in any physical system, be it a system of billiard balls or neurons, we should think that assumption more carefully. Also, it seems that to simulate non-syntactic objects we need new kind of computers. Several questions arise. How can we describe physical systems? On what level we can predict their behavior? What could it mean to simulate perceptual or cognitive faculties?

To use the original Wiener-Caldwell terminology, we have two types of computers. The machines we now call digital computers are devices which compute by calculation. They can be analyzed in terms of Turing machines, and can simulate all logical machines which can be described with an algorithm. The other type of machines are computers which compute by measurement. More often they are called analogy machines. Wind tunnels are the classical examples of measure¬ment type computers. The reason we don't typically use analogy computers is that differential equations can be solved more accurately with digital computers. The signal to noise ratio of a digital computer exceeds the signal to noise ratios reachable in analogy machines of the 40's [12], if the problem can be described purely syntactically.

7. Neural networks as non-distributed information processors

It is quite easy to find articles on highly connected information processing networks that are based on the idea of neural networks as complex calculating systems (e.g. [8]). They are continuing in the tradition of symbolic and algorithmic information processing. A typical example definition of neural networks sounds like a parallel version of cognitive science:

"A new form of computational model involving large numbers of highly interconnected parallel computational units, a form that takes into account constraints from studies of the nervous system, constraints from psychological data, and a deep understanding of computational requirements." [30]

The trick that is commonly used in network simulations is that the time scale is divided in as many small steps as is needed to make the system syntactic. For example, Peretto and Niez develop an asyncronous model for neural networks which seems to be powerful enough to deal with non-sym¬metrical connections between neurons [31]. The device they employ is discretization of time, until in one time step only one neuron can change its state. This means that transitions happen only to neighboring points in the phase space of the system. This is the already familiar local and context independent system. In statistical physics it is called Markovian process. It is easily simulated in computers as it already is a discrete version of differential equa¬tions. Markov processes have only memory of the immediately preceding state, and so can not sustain oscillations; a result which also Peretto and Niez prove. What is more important, however, is that there is no self-organization, or complex collective phenomena in these systems. In effect, this is a model of neural networks in which no real distributed processing, or correlations between activities in distant parts of the system are allowed. The famous Little-Hopfield model is even simpler, as it needs symmetric connections between neurons [23], [17]. The reason for this symmetry is that valid energy functions are dependent only of the state of the system, and not of its history. To make energy of some stationary state independent of the path, along which the system entered its stationary state, evolution of the system has to be described with exact differentials. They guarantee that partial differentiations can be carried out in any order, locally, and independent of context. They also guarantee that the system can be simulated with a counting computer.

8. Conclusions

The main point of this article has been that the search for "the computational processes of brain" is most probably futile. It is not an indication of power to have neural networks which are equivalent to universal Turing machines. Power in neural networks, artificial or biological, comes from their dynamic properties. In practice this means that most powerful implementations of artificial neural networks will probably be analog devices.

All conventional algorithmic methods, as far as I can see, and also most work done in neural networks, result from the idea that we can "freeze" the world. I don't believe that this approach will be very general. It seems quite unprobable that, for example, associative memory would equal to teaching and finding global minima of some energy or cost functions. If the system really is dynamical, self-organizing system, it can't have well defined minima, at least of any function which is based on the idea that the system is static. In this context, also the work done on probabilistic and fuzzy logic can be seen as conservative, although natural and interesting, attempts to rescue the underlying tradition.

In typical neural network models the input information is divided among multiple processors which independently compute with their bits of information. The division to independent, i.e. context indepen¬dent, subtasks is done because we have to do it, if we want to see neural networks as implementations of the universal counting computer. If we would see a neural network as complex measurement type computer, it would be natural to discuss , for example, the effects of correlations of its activity. Only then we enter the realm of collective distributed processing. What we lose in that point, is our ability to simulate the network with a Turing machine. This may be frustrating, especially if we in the old Leibnizian tradition believe that explanation equals to different encodings of the original truth.

On the other hand, nothing deprives us from the possibility to model neural networks with measuring computers. It is quite probable that we can build artifacts that behave as biological neural networks do. The hard thing to digest is that we can build machines that work, but whose simplest complete description is the details of their construction. This, of course, contrasts strongly with conventional, symbolic or algorithmic approaches to AI. We have all around evidence that we can not formally define all our machines. A wind tunnel is a single example. That these examples are not often discussed in artificial intelligence literature is maybe an indication of the motivational basis of the Leibnizian tradi¬tion. The success of the research program or paradigm of Leibnizian method depends on people, who, like Turing, have an interest to find the order behind the apparently chaotic world.

If losing compressed descriptions of functioning of some technical devices seems like a dead end, it maybe is only an indication of the level on which re-evaluation is needed. It has not been typical to have to read about Newtonian mechanics in order to be involved or understand work in artificial intelligence. If you read articles on artificial neural networks this is much more common. It is also possible that you have to study divergences of path integrals in curved four-dimensional phase spaces [43]. We already have optical computers and at least speculation of biocomputers around. We have to face the fact that they don't speak Lisp.

It is quite clear that the information processing capability of biological neural networks is a result of co-operative phenomena. When some "driving forces" change the state of the system, new types and measures of order appear. As some parameters of the system are changed, at certain values of the parameters, the behavior and proper¬ties of the system qualitatively change. It is a basic result of modern statistical physics that in these points, called non-equi¬librium phase transition points, the system behaves collective¬ly (see e.g. [13]). Very small effects create global changes in the system. I believe this is the mechanism that gives neural networks their processing power.

Biological neural networks are complex self-organizing systems [43]. Only recent developments in science have produced methods which can be used to qualitatively analyze such complex systems (e.g. [14], [42]). It is possible that even these methods are not adequate to simulate the system. What they show, however, is that the conceptual basis of Leibnizian logic, Turing machines, al­gorithmic computer programs, and cognition as computation is not easily defended in real life.

The important questions will be concerned with the number and types of attractors, the statistical properties of the phase spaces of networks, and high level learning strategies (e.g. [4], [48]). For those, who want practical applications in AI, I express my sympathy, but also my belief that much work still has to be done on quite theoretical level. If we have well specified, practical problems, call them toy problems, micro-worlds, or whatever, [6] the existing methods can be more adequate.

If there is noise in the system, this noise can be seen as generating causal relations from the future to the present. We can not find any causal description which could determine the exact course of fluctuations of the system. However, the result of these fluctuations is that the system can test its potential futures and to evolve into any of its metastabile states.

In a Turing machine, or in its parallel implementations in artificial neural networks, there are no intrinsic sources of noise or fluctuations. The only possible evolution of the system is toward its predefined metastabile states. This means that the only ordered states which emerge in the system are equivalent to those that emerge in real physical systems when they freeze. This simple crystallization differs qualitatively from the order which is created in living systems through fluctuations [41]. It shouldn't be expected that the functioning of real neural networks could be modeled with essentially dead systems.

I think we will, within the next couple of years, see implementations of information processing machines which only very remotely will be based on the ideas behind conventional computers. There is already fundamental work done on these questions, although not widely referenced in technical or artificial intelligence literature. It is possible that we find ways to model interacting dynamic systems in conventional computers. At this time I don't see how this practically could be accomplished, without a new, dynamical, view of information, and its processing.

9. References

[1] G.J. Chaitin, Incompleteness theorems for random reals. Adv. in Applied Math. 8, (1987) 119-146.

[2] E. Charniak and D. McDermott, Introduction to Artificial Intelligence. Reading, Mass.: Addison Wesley (1985) chapter 6.

[3] M. Conrad, Molecular computer design: a synthetic approach to brain theory. in: Real Brains, Artificial Minds. Casti, J.L., Karlqvist, A. (eds.), New York, Amsterdam, London: North-Holland Publishers (1987) 197-226.

[4] M. Cottrell, Stability and attractivity in associative memory networks. Biol. Cybern. 58, (1988) 129-139.

[5] H.L. Dreyfus, What Computers Can't Do: The Limits of Artificial Intelligence (revised edition). New York, Hagerstown, San Francisco, London: Harper & Row (1979), p. 100.

[6] H.L. Dreyfus, From micro-worlds to knowledge representation: AI at an impasse. in: Mind Design. Philosophy, Psychology, Artificial Intelligence, Haugeland, John (ed.), Cambridge, Massachusetts, London: A Bradford Book, the MIT Press (1981) 161-204.

[7] M.I. Feigenbaum, The universal metric properties of nonlinear transformations. J. Stat. Phys. 21, (1979) 669-706.

[8] J.A. Feldman, M.A. Fanty, N.H. Goddard and K.J. Lynne, Computing with structured connectionist networks. Comm. ACM, 31, no. 2 (Feb. 1988) 170-187.

[9] A.D. Fisher, W.L. Lippincott and J.N. Lee, Optical implementations of associative networks with versatile adaptive learning capabilities. Appl. Optics, 26, no.23 (1987) 5039-5054.

[10] R.A. Frost, Introduction to Knowledge Base Systems. London: Collins (1986) chapter 6.

[11] M.R. Garey and D.S. Johnson, A Guide to the Theory of NP-Completeness. San Francisco: W.H. Freeman (1979).

[12] H.H. Goldstine and J. von Neumann, On the principles of large scale computing machines (1946). in: John von Neumann, Collected Works, vol V., ed. A.H. Taub, Oxford, London, New York, Paris: Pergamon Press (1963).

[13] H. Haken, (ed.) Chaos and Order in Nature. Proceed­ings of the International Symposium on Synergetics at Schloss Elmau, Bavaria, April 27- May 2, 1981. Berlin, Heidel¬berg, New York: Springer-Verlag (1981).

[14] H. Haken, (ed.) Complex Systems - Opera¬tional Approac¬hes, in Neurobiology, Physics, and Computers. Proceedings of the International Symposium on Synergetics at Schloss Elmau, Bavaria, May 6-11, 1985. Berlin, Heidelberg, New York, Tokyo: Springer-Verlag (1985).

[15] R.H.G. Helleman, Self generated chaotic behavior in nonlinear mechanics. in: Fundamental Problems in Statistical Mechanics, vol 5; ed. E.G.D. Cohen, Amsterdam, New York: North-Holland Publishers (1980) 165-233.

[16] T. Hogg and B.A. Huberman, Artificial intelligence and large scale computation: a physics perspec¬tive. Phys. Rep. 156, No. 5, (1987) 227-310.

[17] J.J. Hopfield, Neural networks with emergentÜv  Ü collective computational abilities. Proc. Natl. Acad. Sci. USA, 79 (April 1982) 2554-2558.

[19] L.L. Iversen, The chemistry of brain. Sci. Am. (September, 1979) 118-129.

[20] E.R. Kandel, Small systems of neurons. Sci. Am. (September, 1979) 61-70.

[21] T. Kohonen, An introduction to neural computing. Neural Networks (January 1988) (preprint).

[22] L.D. Landau, and E.M. Lifshitz, Course of Theoretical Physics, vol 1., Mechanics, 3rd ed. Oxford, New York, Toronto, Sydney, Paris, Frankfurt: Pergamon Press (1976).

[23] W.A. Little, The existence of persistent states in the brain. Math. Biosci., vol. 19 (1974) 101-120.

[24] A.J. Maren, The IEEE 1st International Conference on Neural Networks - a first-hand account. SIGART Newsletter, Number 103 (January 1988) 27-29.

[25] M. Minsky, and S. Papert, Perceptrons, An Introduction to Computational Geometry. Cambridge, Mass., London: The MIT Press (1969).

[26] M. Minsky, The Society of Mind. London: Pan Books Ltd (1988).

[27] B. Nebel, Computational complexity of terminological reasoning in BACK. Artificial Intelligence, 34, (1988) 371-383.

[28] A. Newell and H.A. Simon, Computer science as empirical inquiry: symbols and search. Comm. ACM, 19 (March 1976) 113-126.

[29] G. Nicolis and I. Prigogine, Self-Organiza¬tion in Nonequilibrium Systems: From Dissipative Structures to Order through Fluctuations. New York, London, Sydney, Toronto: A Wiley-Interscience Publication, John Wiley & Sons (1977).

[30] D.A. Norman, Reflections on cognition and parallel distributed processing. in: Parallel Distributed Processing, Explorations in the Microstructure of Cognition, vol 2. eds. J.L. McClelland, D.E. Rumelhart and the PDP Research Group. Cambridge, Mass., London: The MIT Press (1986) 531-546, p.533.

[31] P. Peretto and J. J. Niez, Stochastic dynamics of neural networks. IEEE Trans. Syst. Man, Cybern., vol. SMC-16, No. 1 (January/February 1986) 73-83.

[32] I. Prigogine, From Being to Becoming: Time and Complexity in Physical Sciences. San Francisco: W.H. Freeman and Company (1980).

[33] L.E. Reichl, A Modern Course in Statistical Physics. Austin: Univ. of Texas Press (1980).

[34] R. Rosen, On the scope of syntactics in mathematics and science: the machine metaphor. in: Real Brains, Artificial Minds. J.L. Casti, A. Karlqvist, (eds.), New York, Amsterdam, London: North-Holland Publishers (1987) 1-23.

[35] O.E. Röessler, Endophysics. in: Real Brains, Artificial Minds. J.L. Casti and A. Karlqvist, (eds.), New York, Amsterdam, London: North-Holland Publishers (1987) 25©46.

[36] R.C. Schank and P.G. Childers, The Cognitive Computer: On Language, Learning and Artificial Intelligence. Reading, Mass: Addison-Wesley (1984).

[37] A.C. Scott, Neurophysics. New York, London, Sydney, Toronto: John Wiley & Sons (1977).

[38] R. Shaw, Modeling chaotic systems. in: Chaos and Order in Nature., ed. H. Haken, Proceedings of the International Symposium on Synergetics at Schloss Elmau, Bavaria, April 27- May 2, 1981. Berlin, Heidelberg, New York, 1981: Springer-Verlag (1981) 218-231.

[39] H. Shimizu and Y. Yamaguchi, Synergetic computer and holonics -information dynamics of a semantic computer. Physica Scripta, vol. 36 (1987) 970-985.

[40] D. Tritton, Chaos in the swing of a pen¬dulum. New Scientist (24 July, 1986) 37-40.

[41] I. Tuomi, On the origins of life. (in Finnish), Arkhimedes 34œ, Helsinki, (1982) 106-116.

[42] I. Tuomi, The man who discovered the synergetics, Ilkka Tuomi's interview with professor Hermann Haken. (in Finnish), Arkhimedes 38, Helsinki (1986) 31-38.

[43] I. Tuomi, On the statistical physics of neural networks. (in Finnish), University of Helsinki, Department of Theoretical Physics. Helsinki (1986).

[44] A.M. Turing, On computable numbers, with an application to the Entscheidungsproblem. Proc. London Math. Soc. 42, (1937) 230-265.

[45] J. von Neumann, The general and logical theory of automata. in: Cerebral Mechanisms in Behaviour -The Hixon Symposium: John Wiley, (1951), reprinted in: John von Neumann, Collected Works, vol. V, 288-328 (op. cit.), p. 311.

[46] J. von Neumann, Col¬lected Works, vol. V. Oxford, London, New York, Paris: Pergamon Press (1963).

[47] N. Wiener, Cybernetics, or Control and Communication in the Animal and the Machine. 2nd ed. Cambridge, Mass.: The MIT Press (1975).

[48] G.V. Wilson, G.S. Pawley, On the stabi¬lity of the travelling salesman problem algorithm of Hopfield and Tank. Biol. Cybern. 58 (1988) 63-70.

[49] L.A. Zadeh, The role of fuzzy logic in the management of uncertainty in expert systems. in: Approximate Reasoning in Expert Systems. M.M. Gupta, A. Kandel, W. Bandler, and J.B. Kiszka (eds.), Amsterdam, New York, Oxford: North-Holland (1985) 3-31.