Histograms

A histogram is a very useful tool for displaying data that arrives as a set of numbers. like the age of all the students in your class, or the height of all the people in your town.

Let's call this data $x$ and say that $x_i$ is the $i^{th}$ data point (first, second, 16th, etc). You can take the average (call this $\bar x$) using: $$\bar x = \frac{\sum_{i=1}^N x_i}{N}$$ The sum is over all of the $x_i$ values, and you divide by the number of values $N$ to get the average. But $\bar x$ doesn't tell you everything you need to know about how the $x_i$ are distributed. For instance, the following two sets of data have the same average: $$x_i = 0, 10, 20$$ $$y_i = 9, 10, 11$$ Clearly $x_i$ has a "wider" distribution than does $y_i$. To characterize this we need some kind of measure that quantifies how far each member of the set is away from the average, and a sensible measure is the "standard deviation", $\sigma$, defined as: $$\sigma = \frac{\sum_{i=1}^N (\bar x - x_i)^2}{N}$$ However, even knowing $\bar x$ and $\sigma$ is sometimes not enough, and you need to see how the values $x_i$ are actually distributed. But ploting $x_i$ is not like the usual plot that shows the relationship between 2 variables, like $x$ and $y$. What we want to do, then, is to do the following, and this is what a histogram does:

Once we have the binning, and the rules for counting, then we go through all of the data, increment the appropriate bin, and then plot $b_i$ vs, usually, the low edge of bin $i$. This is a histogram.

If you then take each bin and divide the number of counts by the total number of data points $x_i$, then each bin is a number between 0 and 1 and the integral adds to 1 exactly. For instance, say $N$ is the number of data points, and $n$ is the number of bins, then $$N = \sum_{i=1}^n b_i$$ So if you defined $p_i = b_i/N$, then $\sum {i=1}^n p_i = 1$. Then you can interpret the $p_i$ as the probability that $L_i \le x_i \t R_i$. Then when you plot $p_i$ vs say $L_i$, you are plotting the probability distribution of $x_i$. Voila!

To make it easy to produce such a histogram, you can enter the data $x_i$ in the window below. Data should be a list separated by a space, or a newline:

Next, enter the number of bins $n=$ , the low edge of the first bin $L_0=$ , and the upper edge of the last bin $R_n =$ . Note that the bin width $w$ is derived from the relation $n\cdot w = R_n - L_0$. Then each bin edge $L_i = n*L_0$ and $R_i = L_i + w$.

You can also specify the:
histogram title
title for the horizontal axis

You can also specify the number of tick marks along the horizontal on the plot (an integer) here:

Histogram bin contents are of course subject to fluctuations, which are characterized by being from a Poisson distribution. That means that the uncertainty in the number of counts in each bin is given by the square root of those bin counts: $\delta n_i = \sqrt n_i$. To have these uncertainties (also known as "error bars") drawn, click here:

To see the histogram, hit this button:


Email Drew Baden for further info. (26-Apr-2022)