Newton's Method

Newton's method for finding the zeros of a function

Let's imagine you have a function $y=f(x)$ and you want to find the zeros. That means find the value for $x$ such that $f(x)=0$. Newton's method is a very good way to do this providing that the function $f(x)$ is "well behaved" (more on what that means below). But first we should ask: why would we want to do this?

To answer that, imagine you have some kind of complex function of the variable $x$ that you want to solve, for example: $$7x^3 - 11\ln x + 12x = 0\nonumber$$ That function is certainly not solvable analytically. So you can solve it numerically by guessing a value for $x$, seeing what the left side comes to, and adjusting so that it converges to zero. Newton knew calculus, so what he did was to put this problem into another form and solve it that way, using calculus. So let's rewrite the function as $y=f(x)$: $$y = 7x^3 - 11\ln x + 12x\nonumber$$ and plot it for $0.05\le x\le 1.05$ as in the plot below.

$x_{min}$ $x_{max}$
$y_{min}$ $y_{max}$
Precision:

You can see that the function crosses the $x$ ($y=0$) at around $0.5$. Voila, you have solved the equation $7x^3 - 11\ln x + 12x = 0$ for $x\sim 0.5$. Of course the value of $0.5$ is just a guess by eye, in fact it looks like the line crosses at a poing slightly greater than $0.5$, so you could then blow up the plot, say plot it between $0.4$ and $0.6$, and keep iterating the blowup until you are ok with the uncertainty. Just change the values in the text windows and hit return and it will update. Note that as you get closer to the right value you will have to increase the precision.

Or, you could use Newton's method, using calculus ideas, and do it algorithmically. Here's how. The plot below shows the function $(x-a)^2 + (y-b)^2 = r^2$ where the circle is centered at $(a, b)$ and has radius r$$, and all are constants. The center of the circle is not shown (it's off the page), and the angle $\theta$ goes between $160^\circ$ and $90^\circ$. We want to find where it crosses the $x$ axis, which means finding the zeros of $y = b +\sqrt{r^2 - (x-a)^2}$ (remember, $b$ is negative so we want the positive square root).

So how do we find the value for the point $x_0$ that gives us $y=0$, where the blue curve crosses the $x$ axis? The first thing you do is pick a value for $x$ that you think might be close to $x_0$. Call this point $x_1$, and let's say it's the blue open circle in the drawing below. Since we know what $y=f(x)$ is, then we know the point on the curve at $x_1$ is $y(x_1)$. We also know the slope $m$ of the straight line tangent to the curve at $(x_1,y(x_1))$, which we get from differentiating, so $$\begin{align} m(x_1,y(x_1)) &= \frac{dy}{dx}\big|_{x_1}\nonumber\\ &= -\frac{x-a}{\sqrt{(x-a)^2 - r^2}}\big|_{x_1}\nonumber\\ &=-\frac{x_1-a}{\sqrt{(x_1-a)^2 - r^2}}\nonumber \end{align}$$ The straight line hits the $x$ axis at a point $(x_2,0)$, so by the definition of the slope (rise over the run), we have $$m = \frac{y(x_1)}{x_1-x_2}\nonumber$$ Solving for $x_2$ gives $$x_2 = x_1 - \frac{y(x_1)}{m} = x_1 - \frac{y(x_1)}{y'(x_1)}\nonumber$$ This is the heart of Newton's method for approximating zeros of function. Another way to see it, imagine we expanded $y=f(x)$ around the point $x_1$ using a Taylor expansion, and just kept the first 2 terms: $$y(x)\big|_{x_1} \sim f(x_1) + f'(x_1)(x-x_1)\nonumber$$ Now, if we evaluate that function at $x_2$, and propose that $f(x_2)$ is a zero of the function, then we would have $$0 = f(x_1) + f'(x_1)(x_1-x_2)\nonumber$$ Solving for $x_2$ gives $$x_2 = x_1 - \frac{f(x_1)}{f'(x_1)}\label{en}$$

Click the blue circlec at $x_1$. It will find the point $f(x_1)$, calculate the slope there using the derivative, draw it as a straight line to the $x$ axis ($y=0$), and will show you the new point, $x_2$. Then, click on $x_2$, and it will find the next point $x_3$, and so on. You will see that each time you iterate, it gets closer and closer to the real zero, which will be at the analytic value given below. You can see how fast this converges.

Analytic solution:
Iteration		1
Estimated zero is at
Distance from analytic is

Convergence

So how fast does Newton's method actually converge, and can we quantify that? First we have to understand we mean by convergence is this: let's say that the point $x=a$ is a zero of the function, and we want to find the value of $a$. Each successive $x_n$ will get, hopefully, closer and closer to $a$. Let's define $\epsilon_n\equiv a-x_n$. So by convergence we want to know how $\epsilon_{n+1}$ is related to $\epsilon_n$.

To see that, let's keep the next term in $f''(x)$ in the Taylor expansion and expand around the root $a$ by an amount $x_n$, where $x_n$ is the nth try in the iterative method here: $$y(a)\big|_{x_n} \sim f(x_n) + f'(x_n)(a-x_n) + \half f''(x_n)(a-x_n)^2\label{en2}$$ Dividing by $f'(x_n)$ and rearranging we get $$\frac{f(x_n)}{f'(x_n)} + (a-x_n) = -\frac{f''(x_n)}{2f'(x_n)}(a-x_n)^2\nonumber$$ Now we can take equation $\ref{en}$, let $x_1$ be $x_n$ and $x_2$ be $x_{n+1}$ and write it as $$\frac{f(x_n)}{f'(x_n)} = x_n - x_{n+1} \nonumber$$ and substitute it into equation $\ref{en2}$. This gives us $$(x_n-x_{n+1})-(a-x_n) = -\frac{f''(x_n)}{2f'(x_n)}(a-x_n)^2\nonumber$$ Collecting terms (notice that $x_n$ drops out of the left hand side) and using the defintion for $\epsilon_n$ gives: $$\epsilon_{n+1} = -\frac{f''(x_n)}{2f'(x_n)}\epsilon_n^2\label{econ}$$ This shows that each successive $\epsilon_{n+1}$ converges quadratically with how cloes the previous $\epsilon_n$ got.

Note that if your function is not analytic, you can still use Newton's method for finding the zeros by simply calculating the derivative $f'(x_n)$ numerically.

Conditions

The conditions for Newton's approximation to work are

The first "guess", $x_1$ should be "close enough". This will apply to functions that have more than 1 place where the derivative is zero. It also means that if your guess is too far from the final answer, depending on the function you won't converge.
If the first guess is at a place where the derivative is already zero, then the method doesn't work since at that point you will be dividing by zero in the estimate formula above.
The derivatives have to exist and be continuous for convergence. If the 1st derivative at the root happens to be zero, then it might still converge but it won't be quadratically
2nd derivatives have to exist, and just like 1st derivatives at the root, if the 2nd derivative there happens to be zero, it also won't converge quadratically

Minimization

Newton's method can be used for finding minimizations of a function, that is, find $x_n$ so that $f(x_n)$ is a minimum. This is easy to do, because we know from calculus that the minimum (or maximum, and we call the minimum or maximum an extrenum) of a function $f(x)$ occurs where the derivative is zero. So to build on Newton's method for finding the zeros to find the minimum, we can go back to the Taylor expansion in equation $\ref{eqn}$, and pick a value for $x_n$ (starting at $n=1$ of course) that we think might be close to the extremum $x_m$, and expand $f(x)$ around $x_n$: $$f(x) \sim f(x_n) + f'(x_n)(x-x_n) + \half f''(x_n)(x-x_n)^2\nonumber$$ and take the derivative (and remember, $f'(x_n)$ is a constant and not a function of $x$) $$f'(x) = f'(x_n) + f''(x_n)(x-x_n)\nonumber$$ If we evaluate this function at the extremum $x=x_m$ where the derivative is zero, we get $$x_m = x_n - \frac{f'(x_n)}{f''(x_n)}\nonumber$$ And just as we did above, the left hand side is usually not $x_m$ (unless the initial guess for $x_1$ was damn close to being correct) but will then be used as the next guess, $x_{n+1}$, which gives us the iterative formula for finding the minimum (or maximum) of a function as $$x_{n+1} = x_n - \frac{f'(x_n)}{f''(x_n)}\label{eqmin}$$ In the figure below, the small solid black dot shows where the true minimum is. You can click on the open circle and successive interations. The blue line is a polynomial $f(x)$ up to $x^5$, the light dashed yellow line is the slope $f'(x)$. It also converges pretty fast. But you have to be "close to" the true answer. You can drag the initial $x_1$ along the $x$ axis and see for yourself what "close to" means - it means that the slope of the 1st derivative (the 2nd derivative) has to point in the direction of the true minimum. And of course you can't start where the second derivative is 0 (where the slope of the 1st derivative is 0) because then the slope won't intersect the $x$ axis at all.

		x	y
Function minimum
Iteration	1
Distance from $x_{min}$

All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law.