The mechanism of weights update is known as training algorithm. There are several training algorithms proposed in the literature.I will give a brief description of those that are related with the purposes of my study. The algorithms described here are related to feed-forward networks. A NN is characterized as feed-forward network "if it is possible to attach successive numbers to the inputs and to all of the hidden and output units such that each unit only receives connections from inputs or units having a smaller number". All these algorithms use the gradient of the cost function to determine how to adjust the weights to minimize the cost function. The gradient is determined using a technique called back propagation, which involves performing computations backwards through the network. Then the weights are adjusted in the direction of the negative gradient.
The training algorithm of back propagation involves four stages,.
1. Initialization of weights
2. Feed forward
3. Back Propagation of errors
4. Updation of the weights and biases.
A three-layer neural network consists of an input layer, a hidden layer and an output layer interconnected by modifiable weights represented by links between layers. The feed forward operations consists of presenting a pattern to the input units and passing (or feeding) the signals through the network in order to get outputs units (no cycles!)
Weights Adjustment
The power of NN models lies in the way that their weights (inter unit-connection strengths) are adjusted. The procedure of adjusting the weights of a NN based on a specific data-set is referred as the training of the network on that set (training set). The basic idea behind training is that the network will be adjusted in a way that will be able to learn the patterns that lie in the training set. Using the adjusted network in future situations (unseen data) it will be able based on the patterns that learns to generalize giving us the ability to make inferences. In my case i will train NN models on a part of my time series (training set) and i will measure the ability to generalize on the remaining part (test set). The size of the test set is usually selected to be 40% of the available samples . Each sample consists of two parts the input and the target part (supervised learning). Initially the weights of the network are assigned random values (usually within [-1 1]). Then the input part of the first sample is presented to the network. The network computes an output based on: the values of its weights, the number of its layers and the type and mass of neurons per layer.
Learning rate
Larger the learning rate the bigger the step. If the learning rate is made too large the algorithm will become unstable and will not converge to the minimum of the error function. If the learning rate is set too small, the algorithm will take a long time to converge. Methods suggested for adopting learning rate are as follows.
(i) start with a high learning rate and steadily decrease it. Changes in the weight vector must be small in order to reduce oscillations or any divergence.
(ii) A simple suggestion is to increase the learning rate in order to improve performance and to decrease the learning rate in order to worsen the performance.
Learning in Back Propagation
There are two types of learning.
(iii) sequential learning or pre - pattern method
(iv) Batch learning or pre-epoch method
In sequential learning a given input pattern is propagated forward, the error is determined and back propagated, and weights are updated.
In batch learning the weights are updated only after the entire set of training network has been presented to the network. Thus the weight update is only performed after every epoch.
How to Stop Training
A significant decision related with the training of a NN is the time on which its weight adjustment will be ceased. As i have explained so far over-trained networks become over-fitted to the training set and they are useless in generalizing and inferring from unseen data. While under-trained networks do not manage to learn all the patterns in the underlying data and due to this reason under perform on unseen data. Therefore there is a tradeoff between over-training and under-training our networks. The methodology that is used to overcome this problem is called validation of the trained network. Apart from the training set a second set, the validation set, which contains the same number of samples is used. The weights of the network are adjusted using the samples in the training set only. Each time that the weights of the network are adjusted its performance (in terms of error function) is measured on the validation set. During the initial period of training both the errors on training and validation sets are decreased. This is due to the fact that the network starts to learn the patterns that exist in the data. From a number of iterations of the training algorithm and beyond the network will start to over-fit to the training set. If this is the case, the error in the validation set will start to rise. In the case that this divergence continues for a number of iterations the training is ceased. The output of this procedure would be a not over-fitted network. After describing the way that a NN works and the parameters that are related to its performance we select these parameters in a way that will allow us to achieve optimum performance in the task we are aiming to accomplish. The methodology will follow in order to define these parameters is described in the next paragraph.
One of the major advantages of neural nets is their ability to generalize. This means that a trained net could classify data from the same class as the learning data that it has never seen before. In real world applications developers normally have only a small part of all possible patterns for the generation of a neural net. To reach the best generalization, the data-set should be split into three parts:
• The training set is used to train a neural net. The error of this data-set is minimized during training.
• The validation set is used to determine the performance of a neural network on patterns that are not trained during learning.
• A test set for finally checking the over all performance of a neural net.
About Author / Additional Info: