Any medium, be it air, water, solids, will have equilibrum conditions that characterize the medium in the absence of a disturbance. For instance, the equilibrium condition for a body of water on earth is that all points of the surface are at the same height: this is called "sea level". That there are surface waves is due to disturbances in the equilibrum condition due to things like wind and earth/sea quakes, mostly.
All media have what are called "natural" frequences of oscillation. Take a child on a swing, pull it back and release, and the child will oscillate back and forth at a fequency determined by the distance of the weight from the pivot, and of course gravity. Hit a piece of steel with a hammer and it "rings", with a frequency that has to do with how "stiff" the still is (think of the still as a spring with spring constant $k$) and the mass density. As a rule, the natural oscillation frequency of a medium is a function of the interplay between forces of inertia, and forces of elasticity. For for a spring, the natural oscillation fequency is given by $$\omega = \sqrt{k/m}\nonumber$$ where again $k$ is the "elastic" part and $m$ is the "interial" part. For sound, which is an oscillation due to pressure disturbance of air, or any "fluid", the speed of the sound is given by $$v = \sqrt{K/\rho}\nonumber$$ where again $K$ is the coefficient of stiffness, or bulk modulus, and $\rho$ is the density: elasticity vs inertia again.
So, all things oscillate when disturbed from equilibrium. What about for "driven" systems, where you are injecting energy into a physical system that can oscillate? For a simple system, like a pendulum (child on a swing), you inject energy at the natural oscillation energy to produce big amplitudes. That means you push the child at the right time, at the oscillation frequency. If you've ever sat in a bathtub and all of a sudden produce a wave by moving forward, the wave travels to the end of the tube near your feet, reflects off the end, and heads back. If you time it so that whenever the wave gets to you you give another push, you are then injecting more energy into the disturbance at a frequency and "phase" such that the amplitude of the wave builds up. By "phase" we mean at just the right time. For instance, you could stand easily stand in a position for a swinging child that injects energy at the natural frequency but against the motion, opposing it, and canceling it out, causing the swing to stop.
This whole concept of injecting energy at the right frequency and phase is called "resonance". It's one of the most important topics at the intersection of engineering and physics. People who build bridges want to make sure that they understand what the natural frequences of oscillation are for the bridge (the "resonance fequencies"), and what natural processes like wind can do to inject energy, and make sure that they do not match or you will get a situation like that for the Takoma Narrows Bridge that collapsed on November 7, 1940:
The video makes it look like the bridge fell down because of a large gust of wind. Actually, the bridge started oscillating due to a rather modest wind, around 40 mph, across the road. Bridges like that should be able to easily withstand such a wind. But this wind blowing laterally across the road caused an oscillation that was in resonance with the natural oscillation frequency. The energy built up over time, which means that the amplitude of oscillations built up, and eventually, the whole bridget fell apart. Such is the power of resonance: add a little bit of energy at the resonance frequency, and in phase, over and over, that energy will build up and cause increasing oscillation amplitudes, and mechanical systems can only absorb so much energy before structural failure occurs. The Takmoa Narrows Bridge was started in 1938 and opened on July 1, 1940. The Golden Gate Bridge connecting San Francisco to Marin County was a slightly older bridge, started in 1933 and opened in May of 1937, and when the Takoma Narrows Bridge fell, it caused quite a bit of consernation in San Francisco. Luckily, the conditions in the San Francisco Bay coupled with the natural frequency of the Golden Gate Bridge never caused a resonance, and the bridge stands to this day. Nowadays, people simulate things like bridges and buildings and get quite a good understanding of the resonance frequency (or frequencies, things can oscillate at more than 1 natural frequency), and make sure the environment doesn't come up with a driving frequency that will resonate. This is done all the time with buildings, especially in earthquake prone regions like California.
Of course, nature is capable of injecting energy in a great many ways, and not just sinusoidally like pushing a child on a swing. For instance, when you hit that piece of steel with a hammer, the ringing you hear is not just at one frequency, although it's possible in some situations that one frequency dominates greatly. The principle of superposition states that oscillations add linearly  if you add energy at two different frequencies with different amplitudes and phases, the resulting oscillation is the linear sum, each osillation acting independently of the other. To be precise, say you have a driving force that is given by: $$F_1(t) = A_0\sin(\omega_1 t)\nonumber$$ and another given by $$F_2(t) = B_0\sin(\omega_2 t)\nonumber$$ where $\omega_1 \equiv 2\pi f_1$ and $\omega_2 \equiv 2\pi f_2$, $f_1$ and $f_2$ being the two frequencies of oscillation. $A_0$ and $B_0$ are the 2 oscillation amplitudes. The principle of superposition says that the resulting oscillation is given by $$\begin{align} F_{tot} &= F_1 + F_2 \nonumber\\ &= A_0\sin(\omega_1 t) + B_0\sin(\omega_2 t)\nonumber \end{align}$$ This is a miraculous equation. It means that if you hit a piece of steel with a hammer, it could be setting up all kinds of wave oscillations at all kinds of fequencies, amplitudes, phases, and that all of those waves will all add up linearly. What we want to do here is to "reverse engineer" this miracle: if we were to know the shape in time of the input driving force $h(t)$, how do we figure out what the component waves are that constitute $h(t)$, with their corresponding frequencies and power. This is called "Fourier Analysis", and is one of the most powerful of all tools we have for studying and understanding physical systems.
The idea that a function $h(t)$ could be represented by the sum of "component" oscillatory functions should seem familiar, because there's a direct analogy with ordinary vectors. As shown in the figure below, a vector is a real object (blue) that has a length, and points from one location to another. Since a vector is a line, you need 2 points, labeled $A$ and $B$, to specify the vector. In other words, if you want to communicate how to get to $A$, you tell someone start at the origin and go some distance along $x$ (here the distance would be $0$), and similarly along $y$ (also $0$). Same for $B$: go $34.2$ units along $x$ and then turn left and go up $94$ units along $y$. Of course, all of this is relative to what we choose for the $x$ and $y$ axes. We could have chosen those 2 directions (always perpendicular so that they are completely independent) such that they are at some angle to our horizontal and vertical. That choice of axes, or "representation", is entirely up to you: you can choose the coordinate frame any way you like. So to illustrate, in the figure below, you can drag the frame around and change the angle of the $x$ and $y$ axes relative to horizontal and vertical, and that will change the amount of $x$ and $y$ needed to describe the point along those new axes. But it won't change the vector itself, the endpoints are anchored in space and are invariant with respect to coordinate choices.
If we use the rules for vector addition that says that to add 2 vectors, you add the tail of the 2nd to the head of the 1st, that means you can represent any real vector $\vec{v}$ as $$\vec v = a_x\ihat + a_y\jhat\nonumber$$ where $\ihat$ points along the $x$ axis and $\jhat$ points along the $y$ axis, and $a_x$ and $a_y$ are the lenghts along those axes respectively.
So, to represent a vector in some coordinate frame, all we need are the lengths $a_x$ and $a_y$. To get those, we use the fact that direction vectors $\ihat$ and $\jhat$ have a special property: $$\ihat\cdot\jhat = 0\nonumber$$ and that the lengths of $\ihat$ and $\jhat$ are 1 (unity vectors): $$\ihat\cdot\ihat = \jhat\cdot\jhat = 1\nonumber$$ A nice way to represent both properties is to use a general description, and define $\widehat k_i \equiv\ihat$ and $\widehat k_j \equiv\jhat$, and use the Kronecker delta: $$\widehat k_i\cdot\widehat k_j = \delta_{ij}\nonumber$$ where
$\delta_{ij}$  $=$  $1$  ($i=j$) 
$=$  $0$  ($i\ne j$) 
Now that we have these orthonormal vectors, we can use this property to calculate $a_x$. Take the scalar product of $\vec{v}$ and $\ihat$: $$\vec{v}\cdot\ihat = (a_x\ihat+a_y\jhat)\cdot\ihat=a_x\nonumber$$
And doing the same for calculating $a_y$ gives you: $$a_x = \vec{v}\cdot\ihat\nonumber$$ $$a_y = \vec{v}\cdot\jhat\nonumber$$
This is pretty interesting  it says that you can define two directions in space that are perpendicular ($\ihat$ and $\jhat$), and represent any vector $\vec{v}$ in that space using the "inner product" $\vec{v}\cdot\ihat$ and $\vec{v}\cdot\jhat$ to find "how much" along those 2 independent directions. (Note that the orthogonality principle guarantees that you cannot represent $\ihat$ in terms of $\jhat$ and vice versa, which is a way of saying that $\ihat$ and $\jhat$ are "independent".) Another way to say this is that we can expand any vector $\vec v$ as a sum over an orthonormal set $(\widehat k_i,\widehat k_j)$ with some coefficients ($a_i,a_j$), with the following prescription:
Where the inner product for vectors is defined as $\langle\vec{a},\vec{b}\rangle=\vec{a}\cdot\vec{b}$ and here $\widehat k_1=\ihat$ and $\widehat k_2=\jhat$.
This property of vectors can be extended to mathematical functions of continuous variables ($\theta$) in the following way: we define a set of "basis functions" $g_i(\theta)$ analogous to $\widehat k_i$, except here $i=\infty,+\infty$, and define an inner product of any two arbitrary functions $a(\theta)$ and $b(\theta)$, with notation $\langle a(\theta),b(\theta)\rangle$, as an integral over all $\theta$: $$\langle a(\theta),b(\theta)\rangle \propto \int_{\infty}^\infty a(\theta)b(\theta)d\theta\nonumber$$ (we use $\propto$ to be general, but from dimensional considerations, $\propto\to =$). By analogy with vectors, we shoujld be able to expand some arbitrary function $f(\theta)$: $$f(\theta) = \sum_n a_ng_n(\theta)\nonumber$$ with the coefficients $a_n$ given by $$a_n = \langle f(\theta),g_n(\theta)\rangle\nonumber$$
Note we use the variable $\theta$, but it is just a variable. By analogy with vectors, our prescription will look like this:
Does this really work? To see this, let's step back and generalize some more, and propose that we have some function $f(t)$ that is periodic in a complex way, and we want to approximate it by a series of sines and cosines (also periodic functions) added together in some way. So we try as a guess: $$f(t) \sim g(t) = a_0\cos\omega_0t + a_1\cos\omega_1t + ... + b_0\sin\omega_0t + b_1\sin\omega_1t +...\nonumber$$ where the cosines and sines are multiplied by some constants, $a_0, a_1...$ and $b_0, b_1...$. We want to find out the conditions for $g(t)$ to be arbitrarily close to $f(t)$, within some uncertainty. We do this by defining the variance $\sigma$ in the usual way: $$\sigma^2 = \int_0^T [g(t)f(t)]^2dt\nonumber$$ where $T$ is the period (the time over which the function is periodic), and we integrate over 1 period (if we minimize $\sigma$ over 1 period, it will be minimized over all periods). Also in the usual way, we solve for the constants $a_k$ and $b_k$ ($k=0,\infty$) by taking the derivative of $\sigma^2$ with respect to those constants and set it equal to 0. First, let's rewrite $g(t)$ in a more compact form: $$g(t) = \sum_{k=0}^\infty a_k\cos\omega_kt + b_k\sin\omega_kt\label{egt}$$ The derivative of $\sigma^2$ with respect to $a_k$ is: $$\frac{\partial\sigma^2}{\partial a_k} = 2\int_0^T [g(t)f(t)]\cdot \frac{\partial[g(t)f(t)]}{\partial a_k}dt=0\label{econ}$$ The partial derivative of $f(t)$ with respect to $a_k$ is zero, since $f(t)$ doesn't depend on anything in $g(t)$. The partial of $g(t)$ would be: $$\begin{align} \frac{\partial g(t)}{\partial a_k} &= \frac{\partial}{\partial a_k}\sum_{k=0}^\infty a_k\cos\omega_kt + b_k\sin\omega_kt\nonumber\\ &= \sum_{k=0}^\infty \frac{\partial}{\partial a_k} [a_k\cos\omega_kt + b_k\sin\omega_kt]\nonumber\\ &= \sum_{k=0}^\infty \cos\omega_kt\nonumber \end{align} $$ Similarly, we have $$\frac{\partial g(t)}{\partial b_k} = \sum_{k=0}^\infty \sin\omega_kt\nonumber$$ Plugging this result into equation $\ref{econ}$ gives: $$\frac{\partial\sigma^2}{\partial a_k} = 2\int_0^T [g(t)f(t)]\cdot \sum_{k=0}^\infty \cos\omega_kt \cdot dt=0\nonumber$$ Substituting $\ref{egt}$ for $g(t)$ and rearranging, we have to solve $$\int_0^T [\sum_{n=0}^\infty a_n\cos\omega_nt + b_n\sin\omega_nt] \sum_{k=0}^\infty \cos\omega_kt\cdot dt = \int_0^T f(t)\sum_{k=0}^\infty \cos\omega_kt \cdot dt\label{eqmin}$$ This is not so difficult, because we know that the integral of sines and cosines over 1 period always vanishes, except when the integrands have the same period: $$\int_0^T \cos\omega_n\cos\omega_m dt = \frac{T}{2}\delta_{nm}\nonumber$$ and is also true for any combination of sine and cosine. This means that the left hand side of equation $\ref{eqmin}$ will quickly converge to $$\int_0^T [\sum_{n=0}^\infty a_n\cos\omega_nt + b_n\sin\omega_nt] \sum_{k=0}^\infty \cos\omega_kt \cdot dt = \frac{T}{2}\sum_{n=0}^\infty a_k\nonumber $$ which leaves us with: $$\frac{T}{2}\sum_{n=0}^\infty a_k = \sum_{n=0}^\infty \int_0^T f(t)\cos\omega_kt\cdot dt\nonumber$$ after reversing the sum and the integral operations. If the above equation is true for the sum, it is true for each part of the sum, which gives us: $$a_k = \frac{2}{T}\int_0^T f(t)\cos\omega_kt\cdot dt\label{eak}$$ Similarly, it's easy to show that $$b_k = \frac{2}{T}\int_0^T f(t)\sin\omega_kt\cdot dt\label{ebk}$$ Voila! We have expanded $f(t)$ in a complete set of basis functions times some constants, and the constants are given by the inner product of $f(t)$ with the basis functions. The analogy between this kind of "functional analysis" and vectors is definitely true.
Putting this altogether, we have the following expansion for $f(t)$: $$f(t) = \sum_{k=0}^{\infty}a_k\cos\omega_kt + b_k\sin\omega_kt\label{fouriert}$$ where $\omega_k = 2\pi k/T$. The 1st term in $a_k$ is, $a_0\cos\omega_0t$, gives $$a_0\cos\omega_0t = a_0 = \frac{2}{T}\int_0^T f(t)dt\nonumber$$ which is the average of $f(t)$ over 1 period (imagine $f(t)$ is a square wave that is non zero for only half the period). The second term $b_0\sin\omega_0t = 0$. So you will also often see the Fourier series written as $$f(t) = \frac{a_0}{2}+\sum_{k=1}^{\infty}a_k\cos\omega_kt + \sum_{k=1}^{\infty}b_k\sin\omega_kt\nonumber$$ where the factor of 2 is just a convention, so that $a_0/2$ measures the average value for $f(t)$ over 1 cycle and $a_0$ is the amplitude of the incoming wave.
As an example, let's say we have a periodic square wave with period $T$ that swings between $0$ and $A$ as in the figure below:
$f(t)$  $=$  A  $0\le t\lt \half T$ 
$=$  0  $\half T\le t\lt T$ 
To illustrate, in the figure below, we start with 2 periods of a square wave (in white), going from $0$ to $A$, and the first term in the fourier expansion in blue. That first term represents a baseline (non periodic) approximation to $f(\theta)$ that has an amplitude at the average value for $f$. Click the up arrow once to see the first term: a sine wave with the same period as $f$. More clicks add higher terms (remember, for a square wave the even terms vanish as shown above). Plotted below the figure are each of the nonzero sine waves that go into the Fourier sum.
Highest term:
0 
As you increase the number of terms you will see an interesting pattern  the expansion becomes a better and better approximation, except at the edges where the wave transitions (where the derivative is infinite). There are a few things to note about this. Probably the most important point is that in the real world, derivatives are not infinite, and square waves don't have discontinuities. But the sharper the transition, the more terms you need to add to the Fourier expansion, which means you have to keep adding higher frequencies. This is a general rule  sharp transitions mean higher frequencies. So to turn it around, if you want to use a Fourier expansion to synthesize a square wave, you have to decide what kind of transition is acceptable, and generate enough higher frequency components to make it work. Note that we usually associate high frequency with high current in electronic devices, so generating "perfect" square waves from a Fourier composition is going to cost you!
In the plot below, we generate many points between $0$ to $2T$, or 2 periods. You can vary the total number of terms (we start with the 6th nonzero term, or $n=11$), and the graph will show you the "residuals", which is the difference between the square wave $f(\theta)$ and the Fourier composition for that number of terms. You can see that the difference between the "true" $f(t)$ and the Fourier composition converges to $0$ except at the transition (discontinuity) regions (or equivalently, the edges). These large residuals are called "Gibbs ears", and there's a thereom that as you have more and more terms, the Gibbs ears converge to a fixed number that has to do with the difference in the function $f(t)$ on either side of the discontinuity. What you can do to make the convergence almost perfect for all points is to also bandwidth limit your function $f(t)$, so that it's not a perfect square wave but something close. Then you can generate as many terms in the Fourier composition to get whatever conversion you desire.
11 
The following will let you "play" with Fourier analysis for square, triangle, and sawtooth waves. The figure shows the Fourier sum, and to the right it shows the values for all of the coefficients. If you want to explore what happens to the Fourier sum if you "tweak" the coefficients, just drag it around along it's yellow axis.
Square Triangle Sawtooth
Terms:  1  Cycles:  3 
Range 0  2

If the average power $\overline P$ in the period $T$ is some quantity divided by $T$, then that leads to the interpretation that each coefficient squared is proportional to the total energy at that particular value of $k$ (in this case where the incoming waveform has units of volts, the coefficients will have units of energy per ohms). If the waveform $V(t)$ is complex enough to need both $a_k$ and $b_k$ terms, and remembering that for a given frequency $\omega_k=k\omega$, the power will be given by $P_n = (a_n^2 + b_n^2)/T$ for that frequency.
This is one of the most powerful things about Fourier decomposition  it tells you how much of each frequency is present, and how much power there is at that frequency!
For the case of the square wave above, with $V(t)=V_0$ ($t\lt \half T$), we would get $$\overline P = \frac{V_0^2}{R}\Big( \half + \frac{2}{\pi^2}\sum_{n=odd}^\infty \frac{1}{n^2}\Big) \label{fourpowert}$$ Back to TOP
What is the significance of expanding using both $\sin$ and $\cos$ functions? This can be seen by first using the Euler formula relating sines, cosines, and exponentials for any angle $\phi$: $$\begin{align} \cos\phi & = \half(e^{i\phi}+e^{i\phi})\nonumber\\ \sin\phi & = i\half(e^{i\phi}e^{i\phi})\nonumber\\ \end{align}$$ We can then write the guts of equation $\ref{fouriert}$ as $$\begin{align} a_k\cos(k\theta)+b_k\sin(k\theta) &= \half[a_k(e^{ik\theta}+e^{ik\theta})ib_k(^{ik\theta}e^{ik\theta})]\nonumber\\ & = \half[(a_kib_k)e^{ik\theta}+(a_k+ib_k)e^{ik\theta}]\nonumber\\ \end{align}$$ Now we substitute the Euler formula relating trig functions to exponents of imaginary numbers and get $$\begin{align} f(t) &= a_0 + \sum_{k=1}^\infty a_k\cos\omega_kt + b_k\sin\omega_kt\nonumber\\ &= a_0 + \half\sum_{k=1}^\infty a_k\big(e^{i\omega_kt}+e^{i\omega_kt}\big) i b_k\big(e^{i\omega_kt}e^{i\omega_kt}\big) \end{align}$$ Collecting terms gives: $$f(t) = a_0 + \half\sum_{k=1}^\infty \big(a_kib_k\big)e^{i\omega_kt} + \half\sum_{k=1}^\infty\big(a_k+ib_k\big)e^{i\omega_kt} \nonumber$$ Now take the 2nd term (the one with the $i\omega_kt$ in the exponent), make the substitution $k=m$ and rewrite: $$\half\sum_{m=1}^{\infty}\big(a_{m}+ib_{m}\big)e^{i\omega_{m}t}\nonumber$$ This may look odd, however notice that $\omega_k=2\pi k/T$ which means $\omega_{m} = 2\pi m/T = \omega_m$, which means we can rewrite it as: $$\half\sum_{m=1}^{\infty}\big(a_{m}+ib_{m}\big)e^{i\omega_mt}\nonumber$$ This means that the operation $m\to m$ is equivalent to taking the complex conjugate of the exponential term. However, we have to be careful because $f(t)$ is a real signal, so if we take the complex conjugate of the exponential term we also have to take the complex conjugate of what it multiplies, which says that $$a_{m}+ib_{m} = a_m  ib_m\nonumber$$ So we can rewrite the 2nd term as $$\half\sum_{m=\infty}^{1}\big(a_mib_m\big)e^{i\omega_mt}\nonumber$$ This is the same term as for the sum that goes from $1$ to $\infty$, so we can change $m$ to $k$ (it's just a symbol) and rewrite the whole thing as: $$f(t) = \sum_{k=\infty}^{+\infty}c_ke^{i\omega_kt}\label{fourcom}$$ with $$c_k = \half (a_kib_k)\nonumber$$ Using the complex formulation allows you to find the coefficients easily. Take $\ref{fourcom}$ and multiply both sides by $exp(i\omega_nt)$ where $n$ is some integer, and integrate over a cycle: $$\begin{align} \int_0^T f(t)e^{i\omega_nt}dt &= \int_0^T\sum_{k=\infty}^{+\infty}c_ke^{i\omega_kt}e^{i\omega_nt}dt\nonumber\\ &=\sum_{k=\infty}^{+\infty}c_k\int_0^Te^{i\omega_kt}e^{i\omega_nt}dt\nonumber \end{align}$$ where we have interchanged the integral and sum on the right hand side, and brought $c_k$ out of the integrand since it does not depend on time. That integral on the right hand side vanishes for all values of $k$ and $n$ except when $n=k$, in which case it integrates to $T$, giving us the formula for calculating $c_k$: $$c_k = \frac{1}{T}\int_0^T f(t)e^{i\omega_kt}dt \label{cftcom}$$ This brings up a curious thing  since the coefficients $a_k$ and $b_k$ are associated with a frequency $\omega_k$, in the complex notation we have the same number of constants but we seem to have doubled the number of frequencies! This is a confusing thing about Fourier series  there are "negative frequencies". If you ask 12 engineers why that is you might get 13 answers, but basically, those negative frequencies are washed away at the end when you take the real part of any complex amplitude.
Any continuous input function of time $f(t)$ can be analyzed, not necessarily periodic. Imagine you have a function between two times, like the following:
We can start with equation $\ref{cftcom}$, and define $c_n = (1/T_0)F_n$ so that we don't have to worry about the period just yet. This gives us: $$F_n = \lim_{T_0\to\infty}\int_{T_0/2}^{T_0/2} f(t)e^{in\omega_0t}dt\label{dft0}$$ where we have written $\omega_n = n\omega_0 = 2\pi n/T_0$ and we are integrating from $T_0/2$ to $+T_0/2$ instead of $0$ to $T_0$. We would then have $$F_n(\omega_n) = \lim_{T_0\to\infty} \int_{T_0/2}^{T_0/2}f(t)e^{in\omega_0t}dt\nonumber$$ Taking the limit, we can set the limit as $\omega_0\to 0$ of $n\omega_0$ as a continuous variable $\omega$, the limits of the integrand go from $\infty$ to $+\infty$. Our function $F_n(\omega_n)$ is now a function of the continuous variable $\omega$, giving: $$F(\omega) = \int_{\infty}^{+\infty}f(t)e^{i\omega t}dt\label{fourtran}$$ This is called a Fourier transform, and here it is of a continuous function, so we call this a Continuous Fourier Transform, or CFT.
The Fourier coefficient $c_n$ is still a function of $n$ and $\omega_0$, so we can write $$c_n = \frac{1}{T_0}F(n\omega_0)\nonumber$$ We can then write the expansion for a function called $f_0(t)$ as $$f_0(t) = \frac{1}{T_0}\sum_{n=\infty}^{+\infty} F(n\omega_0)e^{in\omega_0t}\nonumber$$ and take the limit as $T_0\to\infty$ (or $\omega_0\to 0$) to get $f(t)$. Note that in doing so, the sum will become an integral, and the $1/T_0$ will go to $\omega_0/2\pi$, and so as $\omega_0\to 0$ we can write $\omega_0\to dt$: $$f(t) = \frac{1}{2\pi}\int_{\infty}^{+\infty} F(\omega)e^{i\omega t}d\omega\label{fourtinv}$$ This is called the continuous inverse Fourier transform. We will need this in the next section. It is important to note that the Fourier transform says that for every function in time space, there's an equivalent function in frequency space.
One interesting example: what is the Fourier transform for a continuous sine wave of constant frequency and amplitude, or $f(t) = A\cos\omega_c t$ where $\omega_c$ is a constant. Using equation $\ref{fourtran}$, we have $$\begin{align} F(\omega) &= \int_{\infty}^{+\infty}f(t)e^{i\omega t}dt \nonumber \\ &= \int_{\infty}^{+\infty}A\sin\omega_c te^{i\omega t}dt\nonumber\\ &= \frac{1}{2}\int_{\infty}^{+\infty}A\big(e^{i\omega_ct}+e^{i\omega_ct} \big)e^{i\omega t}dt\nonumber\\ &= \frac{A}{2}\int_{\infty}^{+\infty}e^{i(\omega\omega_c)t}dt+ \frac{A}{2}\int_{\infty}^{+\infty}e^{i(\omega+\omega_c)t}dt\nonumber\\ &= \frac{A}{2}\big( \delta(\omega\omega_c) + \delta(\omega+\omega_c)\big)\nonumber\\ \end{align}$$ where $\delta(x)$ is the famous Dirac delta function, always zero except at $x=0$ in which case it is infinite. This says that a pure incoming sine wave with frequency $\omega_c$ in frequency space will be a single "spike" at $\omega=\omega_c$. Actually, here you see explicitly the negative frequency come in, from the $\delta(\omega+\omega_c)$ term. There is no real information in the negative frequency, however. This will be important to remember below.
It is worth mentioning that the Fourier transform has some interesting properties:
If you have a function $h(t) = h_1(t) + h_2(t)$, then the transform of $h(t)$ is given by $F(\omega) = F_1(\omega) + F_2(\omega)$
The inverse transform of the function $H(a\omega)$ is given by $H(a\omega) = h(t/a)/a$ and the transform of the function $h(at)$ is given by $h(at)= H(\omega/a)/a$
$h(tt_0) = H(\omega)e^{i\omega t_0}$ and $H(\omega\omega_0) = h(t)e^{i\omega_0 t}$
Oftentimes, the incoming $h(t)$ is not completely continuous, but is instead sampled at discrete times before analyzed, producing a function $h(t_n)$ where $n$ labels the discrete data. This means that $h(t_n)$ will be a sequence of numbers, and the time variable $t_n$ will be given by the sequence index $n$ times the size of the time element measurements) $\delta t$ over which the measurement is made: $$t_n = n\delta t\label{dft_tn}$$ We can start with equation $\ref{fourtran}$ and "discretize" it. That means the integral becomes a sum, $dt\to \delta t$, $\omega\to\omega_n$ and $t\to k\delta t$: $$F(\omega_k) = \sum_{n=1}^N f(t_n)e^{i\omega_k t_n}\delta t\nonumber$$ where we are summing the $N$ measurements over the index $n$ to find the amount of power at a particular frequency $\omega_k$. The product of $\omega_k t_n$ becomes $$\omega_k t_n = (2\pi k/N\delta t)n\delta t = 2\pi kn/N\nonumber$$ which gives us the Discrete Fourier Transform (DFT) $$F(\omega_k) = \sum_{n=1}^N f(t_n)e^{i2\pi nk/N}\delta t\nonumber$$ Similary, we can write the inverse transform from equation $\ref{fourtinv}$ as: $$F(t_n) = \frac{1}{2\pi}\sum_{k=1}^N F(\omega_k)e^{i2\pi nk/N}\delta\omega\nonumber$$ where $\delta\omega = 2\pi/(N\delta t)$ is the frequency associated with 1 digitization step, $\delta t$. We can combine the $\frac{1}{2\pi}$ with the $\delta\omega$ term to get $$F(t_n) = \frac{1}{N\delta t}\sum_{k=1}^N F(\omega_k)e^{i2\pi nk/N}\nonumber$$ The fact that we have a $\delta t$ in the numerator for the equation for $F(\omega_k)$, and in the denominator for the above equation allows us to absorb it into the definition to give the final set of DFT equations: $$F(\omega_k) = \sum_{n=1}^N f(t_n)e^{i\omega_k t_n}\label{edft}$$ $$F(t_n) = \frac{1}{N}\sum_{k=1}^N F(\omega_k)e^{i2\pi nk/N}\label{edftinv}$$
The time $\delta t$ is your sampling time, and $1/\delta t$ is the sampling rate. If you sample over a total time $T=N\delta$, where $N$ is the number of samples, then the smallest frequency you could measure would be when the wave has a period $T$, or $f_{min}=1/(N\delta t)$. The largest frequency you could measure would be where the period is the smallest, however that smallest period is given by $T_{min}=2\delta t$ because you need to sample twice to see a full cycle of the wave. This means that the largest frequency you could measure (we call this $f_c$) would be given by $$f_c = \frac{1}{2\delta t}\label{nyquist}$$ This frequency, $f_c$, is called the "Nyquist" frequency, or "critical" frequency (hence the subscript). This is a very important parameter for digital transmission of time dependent signals. For instance, it is usually the case that modern day transmitters will only transmit in a limited range of frequencies, called the "bandwidth". If transmitters are bandwidth limited (usually by intention or due to electronics limitations), then by setting $\delta$ appropriately, you can recover the incoming signal frequencies exactly using the DFT algorithms below. What happens when you transmit signals with higher frequencies than $f_c$ (or in other words, you digitize at a slower time than would be necessary to see all of these higher frequencies) is that the DFT algorithms will not be able to see those higher frequencies, and they will show up at lower (beat) frequencies. This is called "aliasing". In video signals, it will manifest as distortions in the signal  you will miss the higher frequency components, and some lower frequency components will have more amplitude than they should. The cure of "aliasing" is to limit the bandwidth of the incoming signal, or better yet, match $\delta$ to what the transmitter is doing.
In fact, one can prove that if you have an incoming signal $h(t)$ that is bandwidth limited such that all frequences are less than $f_c$, then if you set the digitization time accordingly using $\delta t= 1/2f_c$, you will reproduce the incoming signal exactly, and the incoming signal $h(t)$ (before digitization) can be reproduced as $$h(t) = \delta t \sum_{k=\infty}^{+\infty} h(t_k) \frac{\sin \omega_c (tk\delta t)}{\pi(tk\delta t)}\nonumber$$ where it is understood that $h(t_k)$ means the incoming list of digitized values and $\omega_c = 2\pi f_c$.
Let's illustrate with an example. Imagine you are sampling the voltage across some load, producing a series of measurements $V_n$, $n=1,N$. If you were to plot these, you would see the the baseline voltage is $V=0$, with the sampling voltages $V_n$ swinging positive and negative with some period. The figure below plots the incoming signal, sampled every $\delta t=10ms$, oscillating at $4$Hz, and we make 100 samples ($N=100$). What we want to find is the incoming frequency and phase.
Before doing the DFT numerically, we can show that it gives the right answer because we can write the incoming signal as a cosine wave with a known frequency $f$ (or $\omega$) and phase $\phi$: $$V_n = V_0\cos(\omega t_n + \phi)\nonumber$$ and plug this into equation $\ref{edft}$ to get $$\begin{align} F(\omega_k) &= \sum_{n=1}^NV_ne^{i\omega_k t_n}\nonumber\\ &= \sum_{n=1}^N V_0\cos(\omega t_n + \phi)e^{i\omega_k t_n}\nonumber\\ &= \frac{V_0}{2}\sum_{n=1}^N (e^{i(\omega t_n+\phi)}+e^{i(\omega t_n+\phi})e^{i\omega_k t_n}\nonumber\\ &= \frac{V_0}{2}e^{i\phi}\sum_{n=1}^N e^{i(\omega \omega_k)t_n} + \frac{V_0}{2}e^{i\phi}\sum_{n=1}^N e^{i(\omega +\omega_k)t_n} \end{align}$$ Both of these sums will vanish except when the exponent is zero. You can see that the 1st one has the condition $\omega_k = \omega$ and the 2nd one has the condition $\omega_k = \omega$. We can ignore the 2nd one, as per the arguments above that the negative frequencies do not add any information here. This means that the final Fourier coefficients for $F_k=F(\omega_k)$ are all zero except for one special one that satisfies the condition that $\omega_k=\omega$, or: $$\omega_k = \frac{2\pi k}{N\delta t} = \omega=2\pi f\nonumber$$ Solving for $k$ and using $\omega = 2\pi/f$ where $f$ is the incoming frequence we want to calculate, we get $$k = N\delta t f\nonumber$$ Here we have $N=100$, $f=4$, $\delta_t=0.01$, giving $k=4$ as the only nonzero amplitude, and the frequency given by $f=4/N\delta t = 4$Hz.
Of course, you don't know that the incoming signal is going to be a nicely behaved cosine, so you have to do the Fourier sums and see what happens. To do that, you have to go back to using sines and cosines, and write equation $\ref{edft}$ and calculate all of the possible $F_k$ values: $$F(\omega_k) = \sum_{n=1}^NV_n(\cos\omega_k t_ni\sin\omega_k t_n)\nonumber$$ This will give you two numbers, the one of the left multiplying the $\cos\phi_k$ and the one on the right multiplying $\sin\phi_k$, so taking the ratio gives you $\tan\phi_k$ for each $k$.
In the simulation below, you will see a horizontal line below 3 buttons. What you can do is use the mouse (or your fingers on an iPhone or iPad) and draw a waveform. The black horizontal line is the basline, so if you draw a waveform above it, you are putting in positive values, below would be negative. Draw any waveform you like, but do it in a continual motion from left to right. Hit the "Digitize" button and it will turn the waveform into 100 data points and show them in yellow as a bar chart. This is how you input data into a DFT algorithm. Hit the "Fourier" button, and it will calculate all of the Fourier coefficients $F(\omega_k)$ and display it in a plot to the right. Hit the "Log" checkbox and it will display in a log scale. The plot will be of the 50 normalized real data points, meaning that each coefficient is divided by the sum of all coefficients. This allows you to see which coefficients are dominating. For instance, if you try to draw a perfect sine wave of 1 cycle that oscillates above and below the baseline, you should see the coefficient for $\omega_0$ close to 0, and the coefficient for $\omega_1$ dominate all coefficients, which tells you that you are seeing 1 wavelength. If you try to draw 2 wavelengths, then $\omega_2$ should dominate. You can draw any waveform you like, and see what frequencies are present. Hit the "Reset" button in between waveforms to clear the waveform canvas.
Log 
Let's take a pure sine wave, digitize it, and apply the DFT. We will start with an infinite sine wave with a known frequency, say $f=10$ Hz, or a period of $0.1$sec. Let's sample fine enough so that we get several measurements per period, so maybe sample every $\delta t=0.01$sec, or a sampling rate of $100$Hz, or in other words, around 10 samples per period. The wave over a total time of $1.0$sec will look like the following: