Gradient Descent

deep learning
math
gradient descent
Gradient Descent cover image

Introduction

Imagine that we have a function f(x)f(x) and we would like to find its minimum. What would you do ?

Simple right ? We only need to solve the following equation:

f(x)=0f'(x) = 0

The thing is that finding the formula of ff' is not always easy as they tend to be extremely complicated especially in deep learning where we deal with complex functions. So we need to find another method that can provide us the minimum of a function without the need of finding the formula of the derivative ff'.

Let's build some intuition

Let's suppose that we have a function f with the corresponding graph:

Graph 1

Let's start with a random point x0x_{0}. The goal is to move this point and make it closer and closer to xx* such that f(f'(x*)=0) = 0. So the problem can be divided in two parts:

  • In which direction should we move the point xx ? Left or Right ?

  • How much should we move it ?

The direction

Let’s build some intuition in order to answer the first question. Take a look at the following point:

Graph 2

Graph 3

Note that:

  • when the point x0x_{0} is to the right of the optimal point xx* its tangent line goes up.

  • when the point x0x_{0} is to the right of the optimal point xx* its tangent line goes down.

The direction of a line is determined by the sign of its slope:

  • A line goes up     \implies the slope aa is positive.

  • A line goes down     \implies the slope aa is negative.

Note that: \

The slope of the tangent line of a function in a certain point x0x_{0} is no more than the derivative in that point f(x0)f'(x_{0}):

tangent(x0):g(x)=f(x0).(xx0)+f(x0)tangent(x*{0}): g(x) = f'(x*{0}).(x-x*{0}) + f(x*{0})

So as an answer to the question "Where should we move x0x_{0} ?":

  • f(x0)<0f'(x_{0}) < 0     \implies x0x_{0} to the right of xx*     \implies We need to move x0x_{0} to the left.

  • f(x0)>0f'(x_{0}) > 0     \implies x0x_{0} to the left of xx*     \implies We need to move x0x_{0} to the right.

The steps

Now for the second question, How much should we move x0x_{0} ?

Take a look at the following examples:

Graph 4

Graph 5

We can conclude that:

  • x0x_{0} is close to xx* => The slope of the tangent is small => f(x0)f'(x_{0}) is small.

  • x0x_{0} is distant from xx* => The slope of the tangent is big => f(x0)f'(x_{0}) is big.

By answering both questions, we concluded that only the knowledge of the derivative in the point x0x_{0} can give us a lot of insight about the direction and the distance of the optimal point x0x_{0}.

Gradient descent

Gradient descent is the formulation of the answers of the previous two questions. It's an optimization iterative algorithm that approximates the minimum xx* of function starting from a random initial point x0x_{0}. The algorithm is stated as follow:

xn+1=xnlr×dfdx_nx*{n+1} = x*{n} - lr \times \frac{\mathrm{d} f}{\mathrm{d} x\_{n}}

where:

  • dfdxn\frac{\mathrm{d} f}{\mathrm{d} x*{n}} is no more than the derivative of ff in the point xnx*{n}.

  • lrlr is a positive constant that determines how big the steps are going to be.

Notice that:

  • xnx_{n} is to the right of xx* => dfdxn>0\frac{\mathrm{d} f}{\mathrm{d} x_{n}} > 0 => xn+1=xnpositivex_{n+1} = x_{n} - positive => xnx_{n} moves to the left.

  • xnx_{n} is to the left of xx* => dfdxn<0\frac{\mathrm{d} f}{\mathrm{d} x_{n}} < 0 => xn+1=xn+positivex*{n+1} = x*{n} + positive => xnx_{n} moves to the right.

  • xnx_{n} close to xx* => dfdxn\frac{\mathrm{d} f}{\mathrm{d} x_{n}} close to 00 => Small update to xnx_{n}.

Quizz

  • When does gradient descent stop iterating:

  • When xnx_{n} is small enough.

  • When xnx_{n} is close to x0x_{0} .

  • When dfdx_n=0\frac{\mathrm{d} f}{\mathrm{d} x\_{n}} = 0 . XXX

  • How do we choose x0x_{0}:

  • We pick it randomly. XXX

  • We take it in the neighborhood of xnx_{n}.

  • It depends on the problem.

  • Why do we need gradient descent:

  • Because computers are not powerful enough to calculate derivatives.

  • Because it's extremely hard to find the derivatives formulas of deep learning models. XXX

  • Because functions have more than one local minimum.


Master Data Science and AI with Code Labs Academy! Join Our Online Bootcamp – Flexible Part-Time & Full-Time Options Available.


Career Services background pattern

Career Services

Contact Section background image

Let’s stay in touch

Code Labs Academy © 2025 All rights reserved.