Neural Networks

Март 3, 2021

Содержание

2. Pachshenko Galina Nikolaevna Associate Professor of Information System Department, Candidate of Technical Science
3. Week 7 Lecture 7
4. Topics Types of Optimization Algorithms used in Neural Networks Gradient descent
5. Have you ever wondered which optimization algorithm to use for your Neural network Model to produce
6. What are Optimization Algorithms ?
7. Optimization algorithms helps us to minimize (or maximize) an Objective function (another name for Error function)
8. For example — we call the Weights(W) and the Bias(b) values of the neural network as
9. The internal parameters of a Model play a very important role in efficiently and effectively training
10. This is why we use various Optimization strategies and algorithms to update and calculate appropriate and
11. Optimization Algorithm falls in 2 major categories
12. First Order Optimization Algorithms — These algorithms minimize or maximize a Loss function E(x) using its
13. The First order derivative tells us whether the function is decreasing or increasing at a particular
14. What is a Gradient of a function?
15. A Gradient is simply a vector which is a multi-variable generalization of a derivative(dy/dx) which is
16. The difference is that to calculate a derivative of a function which is dependent on more
17. A Gradient is represented by a Jacobian Matrix — which is simply a Matrix consisting of
18. Hence summing up, a derivative is simply defined for a function dependent on single variables ,
19. Second Order Optimization Algorithms — Second-order methods use the second order derivative which is also called
20. The Hessian is a Matrix of Second Order Partial Derivatives. Since the second derivative is costly
21. The second order derivative tells us whether the first derivative is increasing or decreasing which hints
22. Some Advantages of Second Order Optimization over First Order — Although the Second Order Derivative may
23. What are the different types of Optimization Algorithms used in Neural Networks ?
24. Gradient Descent Variants of Gradient Descent: Batch Gradient Descent; Stochastic gradient descent; Mini Batch Gradient Descent
25. Gradient Descent is the most important technique and the foundation of how we train and optimize
26. “Gradient Descent — Find the Minima , control the variance and then update the Model’s parameters
27. θ=θ−η⋅∇J(θ) — is the formula of the parameter updates, where ‘η’ is the learning rate ,’∇J(θ)’
28. The parameter η is the training rate. This value can either set to a fixed value
29. It is the most popular Optimization algorithms used in optimizing a Neural Network. Now gradient descent
30. Now we all know a Neural Network trains via a famous technique called Backpropagation , in
31. After this we propagate backwards in the Network carrying Error terms and updating Weights values using
33. The image on above shows the process of Weight updates in the opposite direction of the
34. As one can notice if the Weight(W) values are too small or too large then we
35. Gradient Descent we descent downwards opposite to the Gradients until we find a local minima.
36. 1.find slope 2. (x = x — slope) until slope=0
37. Problem
38. 1. find slope 2. alpha = 0.1 (or any number from 0 to 1) 3. x
39. Problem
41. Solving the problem
42. The next picture is an activity diagram of the training process with gradient descent. As we
43. The gradient descent training algorithm has the severe drawback of requiring many iterations for functions which
44. Gradient descent is the recommended algorithm when we have very big neural networks, with many thousand
45. Optimization algorithm for Neural network Model Annealing Stochastic Gradient Descent AW-SGD Momentum Nesterov Momentum AdaGrad AdaDelta
47. Скачать презентацию

Pachshenko
Galina Nikolaevna
Associate Professor of Information System Department,
Candidate

of Technical Science

Week 7
Lecture 7

Topics
Types of Optimization Algorithms used in Neural Networks
Gradient descent

Have you ever wondered which optimization algorithm to use for your Neural

network Model to produce slightly better and faster results by updating the Model parameters such as Weights and Bias values .
Should we use Gradient Descent or Stochastic gradient Descent?

What are Optimization Algorithms ?

Optimization algorithms helps us to minimize (or maximize) an Objective function (another name for Error function) E(x) which is

simply a mathematical function dependent on the Model’s internal learnable parameters which are used in computing the target values(Y) from the set of predictors(X) used in the model.

For example — we call the Weights(W) and the Bias(b) values of the neural network as its internal

learnable parameters which are used in computing the output values and are learned and updated in the direction of optimal solution i.e minimizing the Loss by the network’s training process and also play a major role in the training process of the Neural Network Model .

Слайд 9

The internal parameters of a Model play a very important role in

efficiently and effectively training a Model and produce accurate results.

Слайд 10

This is why we use various Optimization strategies and algorithms to update

and calculate appropriate and optimum values of such model’s parameters which influence our Model’s learning process and the output of a Model.

Слайд 11

Optimization Algorithm falls in 2 major categories

Слайд 12

First Order Optimization Algorithms — These algorithms minimize or maximize a Loss function E(x) using its Gradient values

with respect to the parameters. Most widely used First order optimization algorithm is Gradient Descent.

Слайд 13

The First order derivative tells us whether the function is decreasing or

increasing at a particular point. First order Derivative basically give us a line which is Tangential to a point on its Error Surface.

Слайд 14

What is a Gradient of a function?

Слайд 15

A Gradient is simply a vector which is a multi-variable generalization of a derivative(dy/dx) which

is the instantaneous rate of change of y with respect to x.

Слайд 16

The difference is that to calculate a derivative of a function which

is dependent on more than one variable or multiple variables, a Gradient takes its place. And a gradient is calculated using Partial Derivatives . Also another major difference between the Gradient and a derivative is that a Gradient of a function produces a Vector Field.

Слайд 17

A Gradient is represented by a Jacobian Matrix — which is simply a Matrix consisting of first order partial

Derivatives(Gradients).

Слайд 18

Hence summing up, a derivative is simply defined for a function dependent

on single variables , whereas a Gradient is defined for function dependent on multiple variables.

Слайд 19

Second Order Optimization Algorithms — Second-order methods use the second order derivative which is also called Hessian to

minimize or maximize the Loss function.

Слайд 20

The Hessian is a Matrix of Second Order Partial Derivatives. Since the second derivative is costly

to compute, the second order is not used much .

Слайд 21

The second order derivative tells us whether the first derivative is increasing or decreasing

which hints at the function’s curvature.
Second Order Derivative provide us with a quadratic surface which touches the curvature of the Error Surface.

Слайд 22

Some Advantages of Second Order Optimization over First Order —
Although the Second Order

Derivative may be a bit costly to find and calculate, but the advantage of a Second order Optimization Technique is that is does not neglect or ignore the curvature of Surface. Secondly, in terms of Step-wise Performance they are better.

Слайд 23

What are the different types of Optimization Algorithms used in Neural Networks ?

Слайд 24

Gradient Descent
Variants of Gradient Descent: Batch Gradient Descent; Stochastic gradient descent; Mini Batch Gradient Descent

Слайд 25

Gradient Descent is the most important technique and the foundation of how we

train and optimize Intelligent Systems. What is does is —

Слайд 26

“Gradient Descent — Find the Minima , control the variance and then update the Model’s

parameters and finally lead us to Convergence.”

Слайд 27

θ=θ−η⋅∇J(θ)
— is the formula of the parameter updates, where ‘η’ is the learning rate ,’∇J(θ)’ is

the Gradient of Loss function-J(θ) w.r.t parameters-‘θ’.

Слайд 28

The parameter η is the training rate. This value can either set

to a fixed value or found by one-dimensional optimization along the training direction at each step. An optimal value for the training rate obtained by line minimization at each successive step is generally preferable. However, there are still many software tools that only use a fixed value for the training rate.

Слайд 29

It is the most popular Optimization algorithms used in optimizing a Neural

Network. Now gradient descent is majorly used to do Weights updates in a Neural Network Model , i.e update and tune the Model’s parameters in a direction so that we can minimize the Loss function (or cost function).

Слайд 30

Now we all know a Neural Network trains via a famous technique

called Backpropagation , in which we first propagate forward calculating the dot product of Inputs signals and their corresponding Weights and then apply a activation function to those sum of products, which transforms the input signal to an output signal and also is important to model complex Non-linear functions and introduces Non-linearities to the Model which enables the Model to learn almost any arbitrary functional mappings.

Слайд 31

After this we propagate backwards in the Network carrying Error terms and updating Weights values using Gradient Descent, in

which we calculate the gradient of Error(E) function with respect to the Weights (W) or the parameters , and update the parameters (here Weights) in the opposite direction of the Gradient of the Loss function w.r.t to the Model’s parameters.

Слайд 32

Слайд 33

The image on above shows the process of Weight updates in the

opposite direction of the Gradient Vector of Error w.r.t to the Weights of the Network. The U-Shaped curve is the Gradient(slope).

Слайд 34

As one can notice if the Weight(W) values are too small or too

large then we have large Errors , so want to update and optimize the weights such that it is neither too small nor too large , so we descent downwards opposite to the Gradients until we find a local minima.

Слайд 35

Gradient Descent we descent downwards opposite to the Gradients until we find a local

minima.

Слайд 36

1.find slope 2. (x = x — slope) until slope=0

Слайд 37

Problem

Слайд 38

1. find slope 2. alpha = 0.1 (or any number from 0

to 1) 3. x = x — (alpha*slope) until slope=0

Слайд 39

Problem

Слайд 40

Слайд 41

Solving the problem

Слайд 42

The next picture is an activity diagram of the training process with

gradient descent. As we can see, the parameter vector is improved in two steps: First, the gradient descent training direction is computed. Second, a suitable training rate is found.

Слайд 43

The gradient descent training algorithm has the severe drawback of requiring many

iterations for functions which have long, narrow valley structures. Indeed, the downhill gradient is the direction in which the loss function decreases most rapidly, but this does not necessarily produce the fastest convergence. The following picture illustrates this issue.

Слайд 44

Gradient descent is the recommended algorithm when we have very big neural

networks, with many thousand parameters. The reason is that this method only stores the gradient vector (size n), and it does not store the Hessian matrix (size n2).

Слайд 45

Optimization algorithm for Neural network Model
Annealing
Stochastic Gradient Descent
AW-SGD
Momentum
Nesterov Momentum
AdaGrad
AdaDelta
ADAM
BFGS
LBFGS

Neural Networks

Содержание

Pachshenko Galina Nikolaevna Associate Professor of Information System Department, Candidate

Week 7Lecture 7

TopicsTypes of Optimization Algorithms used in Neural NetworksGradient descent

Have you ever wondered which optimization algorithm to use for your Neural

What are Optimization Algorithms ?

Optimization algorithms helps us to minimize (or maximize) an Objective function (another name for Error function) E(x) which is

For example — we call the Weights(W) and the Bias(b) values of the neural network as its internal

The internal parameters of a Model play a very important role in

This is why we use various Optimization strategies and algorithms to update

Optimization Algorithm falls in 2 major categories

First Order Optimization Algorithms — These algorithms minimize or maximize a Loss function E(x) using its Gradient values

The First order derivative tells us whether the function is decreasing or

What is a Gradient of a function?

A Gradient is simply a vector which is a multi-variable generalization of a derivative(dy/dx) which

The difference is that to calculate a derivative of a function which

A Gradient is represented by a Jacobian Matrix — which is simply a Matrix consisting of first order partial

Hence summing up, a derivative is simply defined for a function dependent

Second Order Optimization Algorithms — Second-order methods use the second order derivative which is also called Hessian to

The Hessian is a Matrix of Second Order Partial Derivatives. Since the second derivative is costly

The second order derivative tells us whether the first derivative is increasing or decreasing

Some Advantages of Second Order Optimization over First Order —Although the Second Order

What are the different types of Optimization Algorithms used in Neural Networks ?

Gradient DescentVariants of Gradient Descent: Batch Gradient Descent; Stochastic gradient descent; Mini Batch Gradient Descent

Gradient Descent is the most important technique and the foundation of how we

“Gradient Descent — Find the Minima , control the variance and then update the Model’s

θ=θ−η⋅∇J(θ) — is the formula of the parameter updates, where ‘η’ is the learning rate ,’∇J(θ)’ is

The parameter η is the training rate. This value can either set

It is the most popular Optimization algorithms used in optimizing a Neural

Now we all know a Neural Network trains via a famous technique

After this we propagate backwards in the Network carrying Error terms and updating Weights values using Gradient Descent, in

The image on above shows the process of Weight updates in the

As one can notice if the Weight(W) values are too small or too

Gradient Descent we descent downwards opposite to the Gradients until we find a local

1.find slope 2. (x = x — slope) until slope=0

Problem

1. find slope 2. alpha = 0.1 (or any number from 0

Problem

Solving the problem

The next picture is an activity diagram of the training process with

The gradient descent training algorithm has the severe drawback of requiring many

Gradient descent is the recommended algorithm when we have very big neural

Optimization algorithm for Neural network ModelAnnealingStochastic Gradient DescentAW-SGD Momentum Nesterov Momentum AdaGradAdaDeltaADAMBFGSLBFGS

Похожие презентации

Pachshenko
Galina Nikolaevna
Associate Professor of Information System Department,
Candidate

Week 7
Lecture 7

Topics
Types of Optimization Algorithms used in Neural Networks
Gradient descent

Some Advantages of Second Order Optimization over First Order —
Although the Second Order

Gradient Descent
Variants of Gradient Descent: Batch Gradient Descent; Stochastic gradient descent; Mini Batch Gradient Descent

θ=θ−η⋅∇J(θ)
— is the formula of the parameter updates, where ‘η’ is the learning rate ,’∇J(θ)’ is

Optimization algorithm for Neural network Model
Annealing
Stochastic Gradient Descent
AW-SGD
Momentum
Nesterov Momentum
AdaGrad
AdaDelta
ADAM
BFGS
LBFGS