A statistical hypothesis is an assumption regarding a population parameter. Hypothesis testing is the methodology or procedure used to accept or reject the hypothesis. In this article we will learn Hypothesis testing for a single mean in Microsoft Excel.
Since it is practically not possible to test or examine the entire population, so a random sample from the population is examined. There are two statistical hypothesis:
- Null Hypothesis: This is hypothesis of no difference or equality between population parameters. It is denoted by H0 and represents the hypothesis that sample observations occurred purely by chance.
- Alternative Hypothesis: This is the hypothesis opposite to null hypothesis supporting a statistically significant result. It is denoted by H1 and represents the hypothesis that some non-random cause influenced the sample observations.
For instance, a null hypothesis for population mean may be that that the population mean is equal to zero, and the alternative hypothesis would be that the population mean is not equal to zero in case of a two-tail test or, is greater than or less than zero in case of a one-tail test.
The process of hypothesis testing consists of below steps:
- State the hypotheses: This involves stating our null hypothesis and alternative hypothesis in such a way that they are mutually exclusive, i.e. if one is true, then the other must be false.
- Set the significance level: This is the probability that the null hypothesis would be rejected. It is denoted by α and is usually set at 5% or 1% in most cases. Significance level= (1- Confidence level). So it is a measure of how confident we are about our conclusion.
- Find the value of test statistic and calculate the p-value. P-value is the probability of obtaining a test statistic atleast as extreme as the observed test statistic when the null hypothesis is true. A low p-value indicates a stronger support for alternative hypothesis.
- Interpret results and draw conclusion: If p-value is less than significance level, then the null hypothesis is rejected.
How to use Z.TEST for a single mean in Microsoft Excel?
Z test is a hypothesis test function which is used to test mean of a population against a known mean, or mean of two populations when variance is known and sample size is large (>=30). This test assumes that the population from which the sample is taken is normally distributed. We know that normal distribution is a probability distribution which is symmetric about the mean, indicating that data points near mean value occur more frequently than data points which are far from mean value.
The general formula for z test statistic is:
Z= (X-µ)/ (σ/ √n), where X is sample mean, µ is population mean, and (σ/ √n) is the population standard deviation.
One sample z test
Let us see below how this test can be useful in quality control so as to select or reject the sample lots.
Example1: Let us say we have a population dataset having life span of some 100W light bulbs manufactured by a particular company following normal distribution with a standard deviation of 80 hours. Now a random sample of 15 bulbs is selected from this:
900, 804, 960, 840, 827, 758, 862, 740, 890, 720, 880, 905, 855, 920, 790
With a 1% (0.01) significance level, we wish to test the hypothesis that sample lot is coming from a population whose life is guaranteed under warranty for a minimum of 800 hours. If yes, then the sample lot is accepted, else it is rejected:
H0: µ >= 800
H1: µ < 800
Now to test this hypothesis, we use Z.TEST in Excel to find the one-sided p-value. Let us see below how the function works in Excel:
Z.TEST is a statistical function in Excel that calculates one-tailed p-value of a z test.
General Syntax for Z.TEST function in Excel is as follows:
Z.TEST (array, x, [sigma])
The Z.TEST function syntax has following arguments:
Array: Required, represents the array or range of cells against which hypothesized sample mean is to be tested. The range of cells correspond to the location of sample data in our spreadsheet
x: Required, represents the hypothesized value to be tested, i.e. the value of mean that we are testing (800 in this case)
sigma: Optional, represents the known population standard deviation
NOTE: Some things that are to be kept in mind while working with this function are:
- P-value obtained from the function is one-tail or one-sided. In case of a two-tail test, the value must be doubled.
- The one-tail p-value output assumes that the value of sample mean is greater than the value of µ which we are testing. In case sample mean is less than value of second argument to the function, then output of the function is subtracted from 1 in order to get the true p-value.
- The last argument to the function: sigma is optional. If it is not entered, then then it is automatically replaced by sample standard deviation. In this case, a t-test should be used.
- If the first argument to the function or the range of cells is empty, then the function returns a #N/A error value.
- In case the first or third argument to the function (i.e. x or sigma) is non-numeric, then #VALUE! Is returned by the function.
- If value of sigma supplied is 0, then #NUM! is returned by the function.
Now let us see below how the function is used to test our required hypothesis: Let the sample values be stored in cells: A2: A10 as follows :
Now in order to obtain the one-tail p-value, enter =Z.TEST(A2:A10,800,80) in the cell B2 where we wish to obtain the result:
So the resultant p-value is 0.0178164:
Conclusion: Since our p-value is coming out to be more than the significance level, i.e. 0.0178164>0.01, so we fail to reject our null hypothesis thus concluding that the sample lot is coming from a population whose life is guaranteed under warranty for a minimum of 800 hours. Thus, the sample lot is accepted.
So we are 99% confident (since confidence level=1-significance level) that our sample data is from a population whose life is guaranteed under warranty for a minimum of 800 hours.
Example2: Let us say we have following data from a simple random sample selected from a population with average IQ scores of some Indians (we know that heights, blood pressures, IQ scores, birth weights etc. are some real life examples of a normal distribution, i.e. they follow normal distribution since most of the values cluster around same central value) with unknown mean and a standard deviation of 9:
90, 84, 96, 88, 87, 110, 112, 105, 89
We know that most (70%) of the Indian adults have an IQ between 85 and 115.
With a 5% (0.05) significance level, we wish to test the hypothesis that sample data is from population with a mean IQ greater than 85:
H0: µ = 85
H1: µ > 85
Now to test this hypothesis, let us use Z.TEST the same way as used in above example to find the p-value.
Conclusion: Since our p-value is coming out to be less than the significance level, i.e. 0.0001886<0.05, so we reject our null hypothesis and support the alternative hypothesis thus concluding that the sample data is from a population with mean IQ greater than 85.
So we are 95% confident (since confidence level=1-significance level) that our sample data is from a population with mean IQ greater than 85.
Now in case our alternative hypothesis would have been: H1: µ ≠ 85, we would have to calculate the two-tail p-value, which would be calculated as follows:
This p-value would then be compared with the significance level.
So we have seen in this article how to use the function Z.TEST for a one-sample z test in Microsoft Excel. In case of a two sample z test where we wish to test equality of two population means, we can use the ‘Data Analysis’ option.
If you want to learn more click here…
If you want to-learn Linear Regression in Excel click here….