Gradient Descent Explained for Machine Learning Beginners
Updated on December 10, 2025 9 minutes read
Updated on December 10, 2025 9 minutes read
Gradient descent is an iterative optimisation algorithm that repeatedly moves parameters in the direction that most reduces a loss function. It uses the derivative (or gradient) to decide which way to move and how big each step should be.
Deep learning models define a loss function that measures how well the model fits the data. During training, gradient-based optimisers compute gradients of this loss with respect to the model parameters and apply update rules like gradient descent or Adam to reduce the loss over many iterations.
There is no single best learning rate. If it is too large, the algorithm can diverge or oscillate; if it is too small, training will be very slow. In practice, we choose a reasonable starting value, monitor training behaviour, and adjust or schedule the learning rate based on experiments.