Polynomial Neural Networks

by Ivan Galkin, U.Mass Lowell

(Materials for UML 91.550 Data Mining course)

1. Artificial Neural Networks

See here for a short introduction to the theory of artificial neural networks and terminology.

2. Success Story: ModelQuest

Year	Product	Price	Availability
1994	AIM for DOS 1.1	$200	discontinued
1994	AIM for Windows 2.0	$1,000	discontinued
1995	ModelQuest (TM) / ModelQuest Prospector	$1,000	discontinued
1996, Oct.	ModelQuest Expert	$4,000	discontinued
1997, Apr.	ModelQuest Expert 2.0 w/StatNet Expert	$6,000
1997, June	ModelQuest Miner		discontinued
1997 Oct.	ModelQuest Enterprise	$60,000
1997 Nov.	ModelQuest MarketMiner	$60,000

3. GMDH: Group Method of Data Handling

In the hope to capture the complexity of a process, the artificial neural networks attempt to decompose it into many simpler relationships each described by a processing function of a single neuron. The processing function of the neurons is quite simple; it is the configuration of the network itself that requires much work to design and adjust to the training data. In 1961, Frank Rosenblatt had identified the key weakness of neurocomputing as the lack of means for effectively selecting structure and weights of the hidden layer(s) of the perceptron. In 1968, when backpropagation technique was not known yet, a technique called Group Method of Data Handling (GMDH) was developed by an Ukranian scientist Aleksey Ivakhnenko who was working at that time on a better prediction of fish population in rivers.

Ivakhnenko made the neuron a more complex unit featuring a polynomial transfer function. The interconnections between layers of neurons were simplified, and an automatic algorithm for structure design and weight adjustment was developed.

3.1 Ivakhnenko Polynomial

The GMDH neuron has two inputs and its output is a quagratic combination of 2 inputs (total 6 weights).

Output of GMDH neuron

GMDH-output.gif (910 bytes)

Thus, GMDH network builds up a polynomial (actually, a multinomial) combination of the input components.

3.2 GMDH Neural Network

Typical GMDH network maps a vector input x to a scalar output y'. Each GMDH neuron has two inputs and one output evaluated as descibed above.

GMDH Network

This example GMDH network has four inputs (the component of the input vector x) and one output y' which is an estimate of the true function f(x) = y.

3.3 Evolving of GMDH Networks

Evolving of GMDH Networks is arranged in a simple feed-forward manner, as with the perceptrons. However, GMDH networks are not fully interconnected. The neurons of the first layer are simply fanout units distributing input values to the first hidden layer. The output y' can be expressed as a polynomial of degree 2(K-1), where K is the total number of layers in the network.

3.4 Design and Adjustment of GMDH Networks

The GNDH network is developed by starting at the input layer and growing the network progressively towards the output layer, one layer at a time. Each next layer k starts with maximum possible number of neurons (which is a number of combinations C(M_k-1, 2)), adjusted by trimmimg of extraneous neurons and determining weights, and then frozen. This is different from the backpropagation/counterpropagation technique where all of the layers may participate simultaneously in the training process.

The basic idea of GMDH adjustment is that each neuron wants to produce y at its output (i.e., the overall desired output of the network). In other words, each neuron of the polynomial network fits its output to the desired value y for each input vector x from the training set. The manner in which this approximation is accomplished is through the use of linear regression.

The training set is used to guide the process of adjusting the six weights of each neuron in the layer under construction. Each example in the training set gives one linear equation on six unknowns. Then the mean square technique is used to derive the best combination of six weights (for each neuron! plenty of matrix algebra...).

Usually, the mean square error of y' differs enormously from one neuron to another. The next step in adjusting the layer is eliminating the neurons of the layer which have an unacceptably large error. The definition of "unacceptably large" is left to the user. Certain heuristics exist to help automatic selection of the thershold. The elimination of "bad" neurons effectively reduces otherwise overwhelming combinatorial explosion of building all possible C(M_k-1, 2) configurations.

The process of building the network continues layer by layer until a stopping criterion is satisfied. Usually, the mean square error of the best performing neuron is lower with each subsequent layer until an absolute minimum is reached. If further layers are added, the error of best performaing neuron actually rises. After the last layer is determined, each of the preceding layers undergoes anouther round of trimming to exclude those neurons that do not contribute into the final output.

3.5 Use of Control Training Set for Post-Fitting

A technique of "global post-fitting" the GMDH network against another large block of training data was found useful for further refinement of the weights. After the final configuration of the network is obtained, the output y' can be expressed as a Ivakhnenko polynomial of degree 2(K-1) in the components of input vector x. This polynomial can be then adjusted directly on the control set of data. Each input example gives a single linear equation on coefficients, and after assembling a large number of these equations, the weights can be readjusted one more time.

4. Analysis of GMDH Approach vs. ANN

GMDH networks are different creatures. They are capable of organizing themselves in response to some features of the data. They are inductive self-organizing networks. To build ANN, it is necessary to somehow infer apriori knowledge about the process to be modelled and translate it into language of ANN archtectures. The alternative is "trial and error", and that's what makes ANN technique less attractive.

The following is comparison results from here.

Neural networks Statistical learning GMDH networks

Data analysis universal approximator universal structure identificator

Analytical model indirect approximation direct approximation

Architecture preselected unbounded network structure; experimental selection of adequate architecture demands time and experience bounded network structure evolved during estimation process

Network synthesis globally optimized fixed network structure adaptively synthesized structure

Apriori Information without transformation in the concepts of neural networks not usable can be used directly to select the reference functions and criteria

Self-organization deductive, subjective choice of layers number and number of nodes inductive, number of layers and of nodes estimated by minimum of external criterion (objective choice)

Parameter estimation in a recursive way;
demands long samples estimation on training set by means of maximum likelihood techniques, selection on testing set (may be extremely short or noised)

Optimization global search in a highly multimodal space, result depends from initial solution, tedious and requiring from user to set various algorithmic parameters by trial and error, time-consuming technique simultaneously optimize the structure and dependencies in model, not time-consuming technique, inappropriate parameters not included automatically

Access to result available transiently in a real-time environment usually stored and repeatedly accessible

Initial knowledge needs knowledge about the theory of neural networks necessary knowledge about the kind of task (criterion) and class of system (linear,non-linear)

Convergence global convergence is difficult to guarantee model of optimal complexity is founded

Computing suitable for implementation using hardware with parallel computation efficient for ordinary computers and also for massively parallel computation

Features general-purpose, flexible, non-linear (especially linear) static or dynamic models general-purpose, flexible linear or nonlinear, static or dynamic, parametric and non-parametric models

5. "GMDH Algorithms" vs. "Algorithms of GMDH Type"

Original GMDH technique is based on an inductive approach which reduces apriori information as much as possible. It uses (1) an external criterion to select "bad" neurons in the layers, and (2) another external criterion to stop adding layers. So-called deductive GMDH, or "GMDH-type" algorithms are also used where the number of layers is selected by a human expert and the element of self-organization is used only to eliminate neurons on individual layers. The GMDH-type algorithms are implemented in AIM (ModelQuest) by AbTech, NeuroShell 2 by Ward Systems, ASPN-II by Barron Associates, and SelfOrganize! by DeltaDesign Software.

	Neural networks	Statistical learning GMDH networks
Data analysis	universal approximator	universal structure identificator
Analytical model	indirect approximation	direct approximation
Architecture	preselected unbounded network structure; experimental selection of adequate architecture demands time and experience	bounded network structure evolved during estimation process
Network synthesis	globally optimized fixed network structure	adaptively synthesized structure
Apriori Information	without transformation in the concepts of neural networks not usable	can be used directly to select the reference functions and criteria
Self-organization	deductive, subjective choice of layers number and number of nodes	inductive, number of layers and of nodes estimated by minimum of external criterion (objective choice)
Parameter estimation	in a recursive way; demands long samples	estimation on training set by means of maximum likelihood techniques, selection on testing set (may be extremely short or noised)
Optimization	global search in a highly multimodal space, result depends from initial solution, tedious and requiring from user to set various algorithmic parameters by trial and error, time-consuming technique	simultaneously optimize the structure and dependencies in model, not time-consuming technique, inappropriate parameters not included automatically
Access to result	available transiently in a real-time environment	usually stored and repeatedly accessible
Initial knowledge	needs knowledge about the theory of neural networks	necessary knowledge about the kind of task (criterion) and class of system (linear,non-linear)
Convergence	global convergence is difficult to guarantee	model of optimal complexity is founded
Computing	suitable for implementation using hardware with parallel computation	efficient for ordinary computers and also for massively parallel computation
Features	general-purpose, flexible, non-linear (especially linear) static or dynamic models	general-purpose, flexible linear or nonlinear, static or dynamic, parametric and non-parametric models