Gradient Descent

Introduction

Imagine that we have a function $f(x)$ and we would like to find its minimum. What would you do ?

Simple right ? We only need to solve the following equation:

$f'(x) = 0$

The thing is that finding the formula of $f'$ is not always easy as they tend to be extremely complicated especially in deep learning where we deal with complex functions. So we need to find another method that can provide us the minimum of a function without the need of finding the formula of the derivative $f'$ .

Let's build some intuition

Let's suppose that we have a function f with the corresponding graph:

Let's start with a random point $x_{0}$ . The goal is to move this point and make it closer and closer to $x*$ such that $f'($ x* $) = 0$ . So the problem can be divided in two parts:

In which direction should we move the point $x$ ? Left or Right ?
How much should we move it ?

The direction

Let’s build some intuition in order to answer the first question. Take a look at the following point:

Note that:

when the point $x_{0}$ is to the right of the optimal point $x*$ its tangent line goes up.
when the point $x_{0}$ is to the right of the optimal point $x*$ its tangent line goes down.

The direction of a line is determined by the sign of its slope:

A line goes up $\implies$ the slope $a$ is positive.
A line goes down $\implies$ the slope $a$ is negative.

Note that: \

The slope of the tangent line of a function in a certain point $x_{0}$ is no more than the derivative in that point $f'(x_{0})$ :

$tangent(x*{0}): g(x) = f'(x*{0}).(x-x*{0}) + f(x*{0})$

So as an answer to the question "Where should we move $x_{0}$ ?":

$f'(x_{0}) < 0$ $\implies$ $x_{0}$ to the right of $x*$ $\implies$ We need to move $x_{0}$ to the left.
$f'(x_{0}) > 0$ $\implies$ $x_{0}$ to the left of $x*$ $\implies$ We need to move $x_{0}$ to the right.

The steps

Now for the second question, How much should we move $x_{0}$ ?

Take a look at the following examples:

We can conclude that:

$x_{0}$ is close to $x*$ => The slope of the tangent is small => $f'(x_{0})$ is small.
$x_{0}$ is distant from $x*$ => The slope of the tangent is big => $f'(x_{0})$ is big.

By answering both questions, we concluded that only the knowledge of the derivative in the point $x_{0}$ can give us a lot of insight about the direction and the distance of the optimal point $x_{0}$ .