Polynomial Neural Networks

by Ivan Galkin, U.Mass Lowell
(Materials for UML 91.550 Data Mining course)

 

1. Artificial Neural Networks

See here for a short introduction to the theory of artificial neural networks and terminology.
 

2. Success Story: ModelQuest

Year Product Price Availability
1994 AIM for DOS 1.1 $200 discontinued
1994 AIM for Windows 2.0 $1,000 discontinued
1995 ModelQuest (TM) / ModelQuest Prospector $1,000 discontinued
1996, Oct. ModelQuest Expert $4,000 discontinued
1997, Apr.  ModelQuest Expert 2.0 w/StatNet Expert $6,000  
1997, June ModelQuest Miner   discontinued
1997 Oct. ModelQuest Enterprise $60,000  
1997 Nov. ModelQuest MarketMiner $60,000  

3. GMDH: Group Method of Data Handling

In the hope to capture the complexity of a process, the artificial neural networks attempt to decompose it into many simpler relationships each described by a processing function of a single neuron. The processing function of the neurons is quite simple; it is the configuration of the network itself that requires much work to design and adjust to the training data. In 1961, Frank Rosenblatt had identified the key weakness of neurocomputing as the lack of means for effectively selecting structure and weights of the hidden layer(s) of the perceptron. In 1968, when backpropagation technique was not known yet, a technique called Group Method of Data Handling (GMDH) was developed by an Ukranian scientist Aleksey Ivakhnenko who was working at that time on a better prediction of fish population in rivers.

Ivakhnenko made the neuron a more complex unit featuring a polynomial transfer function. The interconnections between layers of neurons were simplified, and an automatic algorithm for structure design and weight adjustment was developed.

3.1 Ivakhnenko Polynomial

The GMDH neuron has two inputs and its output is a quagratic combination of 2 inputs (total 6 weights).
 
GMDH-neuron.gif (911 bytes)
Output of GMDH neuron 

GMDH-output.gif (910 bytes)

Thus, GMDH network builds up a polynomial (actually, a multinomial) combination of the input components.

3.2 GMDH Neural Network

Typical GMDH network maps a vector input x to a scalar output y'. Each GMDH neuron has two inputs and one output evaluated as descibed above.
 
GMDH Network
GMDH-network.gif (6855 bytes)
This example GMDH network has four inputs (the component of the input vector x) and one output y' which is an estimate of the true function f(x) = y.

3.3 Evolving of GMDH Networks

Evolving of GMDH Networks is arranged in a simple feed-forward manner, as with the perceptrons. However, GMDH networks are not fully interconnected. The neurons of the first layer are simply fanout units distributing input values to the first hidden layer. The output y' can be expressed as a polynomial of degree 2(K-1), where K is the total number of layers in the network.

3.4 Design and Adjustment of GMDH Networks

The GNDH network is developed by starting at the input layer and growing the network progressively towards the output layer, one layer at a time. Each next layer k starts with maximum possible number of neurons (which is a number of combinations C(Mk-1, 2)), adjusted by trimmimg of extraneous neurons and determining weights, and then frozen. This is different from the backpropagation/counterpropagation technique where all of the layers may participate simultaneously in the training process.

The basic idea of GMDH adjustment is that each neuron wants to produce y at its output (i.e., the overall desired output of the network). In other words, each neuron of the polynomial network fits its output to the desired value y for each input vector x from the training set. The manner in which this approximation is accomplished is through the use of linear regression.

The training set is used to guide the process of adjusting the six weights of each neuron in the layer under construction. Each example in the training set gives one linear equation on six unknowns. Then the mean square technique is used to derive the best combination of six weights (for each neuron! plenty of matrix algebra...).

Usually, the mean square error of y' differs enormously from one neuron to another. The next step in adjusting the layer is eliminating the neurons of the layer which have an unacceptably large error. The definition of "unacceptably large" is left to the user. Certain heuristics exist to help automatic selection of the thershold. The elimination of "bad" neurons effectively reduces otherwise overwhelming combinatorial explosion of building all possible C(Mk-1, 2) configurations.

The process of building the network continues layer by layer until a stopping criterion is satisfied. Usually, the mean square error of the best performing neuron is lower with each subsequent layer until an absolute minimum is reached. If further layers are added, the error of best performaing neuron actually rises. After the last layer is determined, each of the preceding layers undergoes anouther round of trimming to exclude those neurons that do not contribute into the final output.

3.5 Use of Control Training Set for Post-Fitting

A technique of "global post-fitting" the GMDH network against another large block of training data was found useful for further refinement of the weights. After the final configuration of the network is obtained, the output y' can be expressed as a Ivakhnenko polynomial of degree 2(K-1) in the components of input vector x. This polynomial can be then adjusted directly on the control set of data. Each input example gives a single linear equation on coefficients, and after assembling a large number of these equations, the weights can be readjusted one more time.

4. Analysis of GMDH Approach vs. ANN

GMDH networks are different creatures. They are capable of organizing themselves in response to some features of the data. They are inductive self-organizing networks. To build ANN, it is necessary to somehow infer apriori knowledge about the process to be modelled and translate it into language of ANN archtectures. The alternative is "trial and error", and that's what makes ANN technique less attractive.

The following is comparison results from here.
 
  Neural networks  Statistical learning GMDH networks 
Data analysis  universal approximator  universal structure identificator 
Analytical model  indirect approximation  direct approximation 
Architecture  preselected unbounded network structure; experimental selection of adequate architecture demands time and experience  bounded network structure evolved during estimation process 
Network synthesis  globally optimized fixed network structure  adaptively synthesized structure 
Apriori Information  without transformation in the concepts of neural networks not usable  can be used directly to select the reference functions and criteria 
Self-organization  deductive, subjective choice of layers number and number of nodes  inductive, number of layers and of nodes estimated by minimum of external criterion (objective choice) 
Parameter estimation  in a recursive way; 
demands long samples 
estimation on training set by means of maximum likelihood techniques, selection on testing set (may be extremely short or noised) 
Optimization  global search in a highly multimodal space, result depends from initial solution, tedious and requiring from user to set various algorithmic parameters by trial and error, time-consuming technique  simultaneously optimize the structure and dependencies in model, not time-consuming technique, inappropriate parameters not included automatically 
Access to result  available transiently in a real-time environment  usually stored and repeatedly accessible 
Initial knowledge  needs knowledge about the theory of neural networks  necessary knowledge about the kind of task (criterion) and class of system (linear,non-linear) 
Convergence  global convergence is difficult to guarantee  model of optimal complexity is founded 
Computing  suitable for implementation using hardware with parallel computation  efficient for ordinary computers and also for massively parallel computation 
Features  general-purpose, flexible, non-linear (especially linear) static or dynamic models  general-purpose, flexible linear or nonlinear, static or dynamic, parametric and non-parametric models 


5. "GMDH Algorithms" vs. "Algorithms of GMDH Type"

Original GMDH technique is based on an inductive approach which reduces apriori information as much as possible. It uses (1) an external criterion to select "bad" neurons in the layers, and (2) another external criterion to stop adding layers. So-called deductive GMDH, or "GMDH-type" algorithms are also used where the number of layers is selected by a human expert and the element of self-organization is used only to eliminate neurons on individual layers. The GMDH-type algorithms are implemented in AIM (ModelQuest) by AbTech, NeuroShell 2 by Ward Systems, ASPN-II by Barron Associates, and SelfOrganize! by DeltaDesign Software.