Measures of dispersion measure how spread out a set of data is.
Standard Deviation
- The standard deviation is the square root of the sample variance.
- Defined so that it can be used to make inferences about the population variance.
- Calculated using the formula:
- The values computed in the squared term, xi - xbar, are anomalies, which is discussed in another section.
- Not restricted to large sample datsets, compared to the root mean square anomaly discussed later in this section.
- Provides significant information into the distribution of data around the mean, approximating normality.
- The mean ± one standard deviation contains approximately 68% of the measurements in the series.
- The mean ± two standard deviations contains approximately 95% of the measurements in the series.
- The mean ± three standard deviations contains approximately 99.7% of the measurements in the series.
Example
Find the variance and standard deviation of the following numbers: 1, 3, 5, 5, 6, 7, 9, 10 .
The mean = 46/ 8 = 5.75
(Step 1): (1 - 5.75), (3 - 5.75), (5 - 5.75), (5 - 5.75), (6 - 5.75), (7 - 5.75), (9 - 5.75), (10 - 5.75)
= -4.75, -2.75, -0.75, -0.75, 0.25, 1.25, 3.25, 4.25
= -4.75, -2.75, -0.75, -0.75, 0.25, 1.25, 3.25, 4.25
(Step 2): 22.563, 7.563, 0.563, 0.563, 0.063, 1.563, 10.563, 18.063
(Step 3): 22.563 + 7.563 + 0.563 + 0.563 + 0.063 + 1.563 + 10.563 + 18.063
= 61.504
= 61.504
(Step 4): n = 8, therefore variance = 61.504/ 8 = 7.69 (3sf)
(Step 5): standard deviation = 2.77 (3sf)
WATCH THIS VIDEOđź””
The Inter-quartile Range
The inter-quartile range is a measure that indicates the extent to which the central 50% of values within the dataset are dispersed. It is based upon, and related to, the median.
In the same way that the median divides a dataset into two halves, it can be further divided into quarters by identifying the upper and lower quartiles. The lower quartile is found one quarter of the way along a dataset when the values have been arranged in order of magnitude; the upper quartile is found three quarters along the dataset. Therefore, the upper quartile lies half way between the median and the highest value in the dataset whilst the lower quartile lies halfway between the median and the lowest value in the dataset. The inter-quartile range is found by subtracting the lower quartile from the upper quartile.
For example, the examination marks for 20 students following a particular module are arranged in order of magnitude.
The median lies at the mid-point between the two central values 10th and 11th)
= half-way between 60 and 62 = 61
The lower quartile lies at the mid-point between the 5th and 6th values
= half-way between 52 and 53 = 52.5
The upper quartile lies at the mid-point between the 15th and 16th values
= half-way between 70 and 71 = 70.5
The inter-quartile range for this dataset is therefore 70.5 - 52.5 = 18 whereas the range is: 80 - 43 = 37.
The lower quartile lies at the mid-point between the 5th and 6th values
= half-way between 52 and 53 = 52.5
The upper quartile lies at the mid-point between the 15th and 16th values
= half-way between 70 and 71 = 70.5
The Range
The range is the most obvious measure of dispersion and is the difference between the lowest and highest values in a dataset. In figure 1, the size of the largest semester 1 tutorial group is 6 students and the size of the smallest group is 4 students, resulting in a range of 2 (6-4). In semester 2, the largest tutorial group size is 7 students and the smallest tutorial group contains 3 students, therefore the range is 4 (7-3).
- The range is simple to compute and is useful when you wish to evaluate the whole of a dataset.
- The range is useful for showing the spread within a dataset and for comparing the spread between similar datasets.
An example of the use of the range to compare spread within datasets is provided in table 1. The scores of individual students in the examination and coursework component of a module are shown.
≫To find the range in marks the highest and lowest values need to be found from the table. The highest coursework mark was 48 and the lowest was 27 giving a range of 21. In the examination, the highest mark was 45 and the lowest 12 producing a range of 33. This indicates that there was wider variation in the students’ performance in the examination than in the coursework for this module.
Since the range is based solely on the two most extreme values within the dataset, if one of these is either exceptionally high or low (sometimes referred to as outlier) it will result in a range that is not typical of the variability within the dataset. For example, imagine in the above example that one student failed to hand in any coursework and was awarded a mark of zero, however they sat the exam and scored 40. The range for the coursework marks would now become 48 (48-0), rather than 21, however the new range is not typical of the dataset as a whole and is distorted by the outlier in the coursework marks. In order to reduce the problems caused by outliers in a dataset, the inter-quartile range is often calculated instead of the rang.
QUESTIONS ↧
1. Find the variance and standard deviation for the
following data
2, 3, 6, 8, 10, 13, 16
2. Calculate the variance and
standard deviation of the frequency distribution below :
VALUE X
|
6
|
7
|
8
|
9
|
10
|
11
|
FREQUENCY
|
4
|
6
|
10
|
11
|
8
|
1
|
No comments:
Post a Comment