WHY, WHAT AND HOW OF CENTRAL LIMIT THEOREM

Let’s say you work for a Shipping company as a business analyst and on one fine day your manager calls you to tell you that one of the loyal customers has asked for emergency delivery, the only information they have given is that consignment is of 36 boxes and we have only one plane left which can carry up to 2630 lbs. Your manager asks you to find out the chances that we can ship it, what will you do?

Where most of your counterparts would have taken the decision based on the average and favourable chances are that they would have replied to the manager that it isn’t possible, you are special because you would be having the knowledge of what central limit theorem is and how to apply it, in your arsenal at the end of this article.

So, let us start from the scratch and eventually solve this business problem.

POPULATION AND SAMPLE

POPULATION

When we talk about population, we mean the number of people in our city, state or country and so forth with their representative characteristics like gender, race, caste, etc. In statistics, when we talk about population it is the total number of members we are studying or about which we are collecting data. The population will have its number of observations(N), the mean(μ) and standard deviation(σ).

SAMPLE

A sample is a part of the population, a slice of it. It is a scientifically drawn random subset of the population which has all the characteristics of it. The beauty of the random sample is that you can generalize it to the population you are studying. The power it brings in statistics is amazing, so if you have height 1000 people you can generalize it for 10000 people, you don’t need to study the entire population of the study. Like population, each sample too would have its own number of observation, mean(μ) and standard deviation(σ).

No business problem like ours can be solved with one sample, we need to have as many random samples as we can collect, the bigger the number of observations in the sample the better results we will get, at the end we should never forget that we are trying to generalise the data for bigger scale so collecting a large number of observations should be one of the top priority of ours.

SAMPLING DISTRIBUTION

After collecting a large number of random samples, if we take a mean of each and every sample and plot them we will get a sampling distribution.

CENTRAL LIMIT THEOREM

The central limit theorem states that on plotting a sampling distribution we will get a normal distribution regardless of the type of the population, what kind of distribution it has and with this, we have scratched the surface of the central limit theorem.

The power of this theorem lies in the simplicity of it, it can be used in almost all the aspect of our lives. Let’s take an example of a book. If we count the total number of words in the book and think it of as our population, each page becomes our sample and if we take an average of words each page has and plot them we will get a normal distribution.

If you are still confused about what you just read let’s move one step further and visualize it. There is a fantastic tool available online on which you can visualize any population of any type.

In the image, the first plot is of the population and by looking it can easily be understood that by no means its a normal distribution, but if you see the third plot which is the sampling distribution of the first(provided number of samples are large) it is almost a normal distribution.

68-95-99.7 Rule of a Central limit theorem

I am pretty sure that you must have heard 68-95-99.7 rule in association with normal distribution and central limit theorem or if you haven’t you will hear it sooner or later. All the business problems, even ours uses this rule.

The rule says that 68% of the data is within 1 standard deviation, 95%  data is within 2 standard deviations and 99.7% data is within 3 standard deviations.

But how we got this figures?

For that, we have to investigate population density function.

For standard normal distribution mean =0 and standard deviation =1, on populating population density function with these we are left with

On integrating this function within the limit -1 to 1 we get the area which is 0.68 or 68% and similarly for -2 to 2 and -3 to 3 we get 95% and 99.7% respective values.

Z SCORE

Z score is the measure of standard deviation. It tells us how standard deviations we are far away from the mean. For example, our tool measures the Z score to be +2.5 it is interpreted as +2.5 standard deviations away from the mean.

Z score is calculated with the formula: Z = (X – μ)/σ

With the help of Z score table, we can find out the actual probability of our problem, we will come on that in some time.

So finally with this, we come back to our problem with which we started this post. Being an old customer, your company must be having the data about the previous shipments it has done for this customer. It’s not rocket science for you to know the mean and standard deviation from the data.

Let’s say you get to know that this customer has a mean of 72 and standard deviation of 3 lbs. from here things are pretty straightforward. Let us find out the standard deviation for this shipment.

σx=3/ √36

=0.5

Still, the X in Z score formula is unknown to us, which is nothing but a mean or critical mass (the max weight plane can carry) in this situation with 36 boxes.

Xcritical mass =2640/36

=73.06 lbs/box

finally placing the value of X, μ and σx in Z function.

Z = (73.06-72)/0.5

Z=2.12

Now let’s take out our Z table and trace Z=2.12 in it and this comes out to be 0.98 that means there is a 98% chance that the plane can be safely loaded and transported.

You can call your manager and make him aware of the situation and now it’s up to the manager and customer to take a final call.

As far I can remember there is no other theorem like the central limit theorem in statistics who has made my life simpler and decision making efficient, with the hope that you will be to be benefitted by it GOODBYE for now.