Averages-Measures of Central Tendency and its Types- Mean, Median, Mode
None of the picture used in the blog belongs to me. Credit goes to their respective owners.
A single
number called a "measure of central tendency" can be used to describe
a whole set of observations of the same kind and different sizes. It calculates
the dataset's center of gravity. When a collection of data has a range of
observations, how can we draw any conclusions about it? This is the application
of central tendency. Say someone asks you, "What time do you wake up in
the morning?" as an example.
What if, on various days, you get up at 11 a.m.,
11:15 a.m., 11:30 a.m., 11:05 a.m., 11:20 a.m., etc. and you're not sure what
time to say? This is the point at which we can draw conclusions using central
tendency. This complete collection can be represented as a representative by a
central value. If the central tendency is found to be 11 a.m., assuming that 11
a.m. is the central tendency, we can conclude that this is the typical time
that you wake up. This might not seem like a big deal in the grand scheme of
things, but when you factor in significant scenarios and a lot of data, it
becomes complicated and crucial.
Assume
that there is a dart board and that our goal is to toss the dart into the
center. Not one player among the three is an expert. As a result, none of their
throws strike the center; instead, they go around it. Assume, for example, that
the nearest circle scores 9, then 8, and so on, while the center scores 10.
Let's say one of the judges is asked, "How good are the players?"
Because of the differentiability, it could therefore become a little
challenging to say. However, player A is generally closer to the center than
the other players. However, what steps do we follow to calculate the
average?
The mean, median, and mode are the three main
metrics used to assess central tendency. But which measurement to pick? We must
go over and fully comprehend a few things in order to do this.
These are the desired characteristics of the
measure of central tendency:
- · It should be strictly defined. A fixed formula or method of calculation should exist. It shouldn't differ from person to person and should be the same for everyone who calculates it. If x and y (two people/systems) have the identical observations, then both x and y should use the same formula and obtain the same outcome.
- It should be based on all the data. The calculating method must take into account each of the 1500 observations if we are to determine the average for a group of 1500 people. It should be founded on each observation in order to be a representative sample of all the observations.
- Easy calculation and understanding are required. The formula itself should be simple to compute and comprehend, as should the derived result.
- Extreme values or outliers of a series are values that are far out from the rest of the values in the series. They should have the least impact on it. Furthermore, these extreme numbers have the potential to significantly alter our measure, produce misleading results, and cause us to interpret the data incorrectly.
- It should be least impacted by sampling fluctuations. A population is the total set of observations used as the basis for our study; that is, the entire set of observations. A sample is a subset of the population that demonstrates the characteristics of the population. A sample may only be considered good if it possesses every attribute of the population. A population is made up of numerous samples of different sizes, and each investigator may choose a different sample. Consequently, the measure ought to be least impacted by it and ought to provide almost identical results for various samples.
- ·The central tendency
measure should be amenable to additional mathematical processing. This
means that it should be able to be used with a variety of other
statistical tools, enabling the completion of additional analysis.
1.1. Simple Arithmetic Mean
The average, or arithmetic mean, is calculated by dividing the total number of observations by the sum of all the observations.
"For ungrouped data: Assume that n is the total number of observations and that xi represents the ith observation of the collection.
Assume the following for grouped data: fi is the frequency of the ith class, xi is the mid-value of the ith class, and n is the total number of observations. The total of the frequencies is N.
The
arithmetic mean is used for numerical variables that are not categorized.
• It is
defined strictly.
• Every
observation serves as its foundation.
• It is easy
to calculate and understand.
• It can
perform additional mathematical functions.
• Extreme
values have a significant effect on it.
• It cannot
be computed for grouped data with open-ended classes.
• Both the
shift in origin and the shift in scale determine this.
•Zero is the
total departure from the arithmetic mean. Subtracting the value from the AM
yields the deviation from AM, or the amount a point deviates from the AM.
Application:
Assume Mr.
X, the proprietor of a boutique, want to introduce a new clothing line before
to the start of the next fiscal year. To that end, he conducted research to
determine the target age range. Assume the age data set consisted of 20, 25,
18, 19, 17, 21, 23, for ease of calculation. He therefore determined the
Arithmetic Mean and determine that 20 years is the answer. which is extremely
fitting, and in line with that, he began producing clothing for his intended
market. In this case, AM has assisted him in learning more about the clients.
Now, two years later, he conducted a survey to learn what customers thought of his items. In order to do that, he created a Likert scale-based question and assigned a number between 1 and 5. He then gave it to his staff, investors, clients, and so on. The answer from customers is more valuable to him than that of investors, and then everything else. However, how can one obtain the knowledge from A.M.? Here, comes the Weighted A.M. to the rescue.
1.2.Weighted Arithmetic Mean
We can give the values appropriate weights if not every observation under consideration is of the same or equal relevance. One can give more weights to those who are more important than the others. And the weighted average is the average that was determined for them.
Assume n is the total number of observations and xi is the ith observation of the set, wi is the observation's weight.
1.3.Geometric Mean
For
ungrouped data, the GM is determined by letting x1, x2,.. xn be a sequence of n
observations.
"(x1.
x2... xn) 1/n
Let x1, x2,..., xn be a sequence of n observations
for the grouped data, and let f1, f2,..., fn be the associated frequencies.
Next, the GM is mathematically defined by
"(x1f1,
x2f2,... xnfn ) 1/N"
- It has strict definitions.
- It is predicated on every observation.
- It is appropriate for additional mathematical analysis.
- The least impact is caused by excessive values.
- It is challenging to compute and comprehend.
- Based on the identical data, its outcome would always be lower than A.M.'s.
- It cannot be computed for a series that has a single observation with a value of 0 or a negative value.
Application:
- It has strict definitions.
- It is predicated on every observation.
- It is not much impacted by sample fluctuations.
- Extreme values don't really effect it that much.
- It is difficult to compute and comprehend.
Application:
1.5.Median
But is it always sufficient to use mean? Without a doubt not. A single figure that reflects the complete dataset is used to calculate any kind of central tendency measure. Assuming a class of thirty students, it is possible that two of them received zero and one hundred marks, respectively, resulting in an average of eighty. However, if one were to only look at the average, both of them would be regarded as having achieved scores of about 80. Therefore, "One represents all and all represent one" can be stated to be the average.
These outliers can be very problematic, but they are easily fixed by utilizing the median function, which returns the series' middle value once all of the observations have been sorted into ascending or descending order. The dataset is divided into two equal portions by the median. if there are five observations, specifically 1, 5, 3, 8, and 6.
Then, in order to calculate the median, we must first sort the data into ascending or descending order, as follows: 8,6,5,3,1. Next, we choose the intermediate value, which in this case is 5.
In the case of ungrouped data, the median is determined by taking an n-series of observations (x1, x2,..., xn).
where,
l = lower class boundary of the median class.
h = size of the median class
f= frequency corresponding to the median class
N = sum of frequencies
C= cumulative frequency of the class preceding the median class.
The class whose cumulative frequency is marginally higher than N/2 is known as the median class.
- It has a strict definition.
- Not every observation is used to calculate the median.
- Extreme values have little effect on it.
- It is simple to compute or comprehend.
Application:
The term "median" refers to the average income or wage of a group of individuals. A group of people's monthly incomes can range from less than 1000 to 30,000, with some making more than 3 lakh. There would therefore be a wide range, and as the median is least impacted by extreme numbers, it is highly helpful in these situations. Outliers can be found and removed using it. Handling data that is Likert scaled, etc.
1.6.Mode
One of the most used metrics for analyzing central tendency is the mode. It provides us with answers to issues like: Where do the most crimes occur? Which age group is most vulnerable to a specific illness? etc. The value or values that appear the highest in a given series, or the value or interval that occurs the most frequently, is the mode. Assume that we gather information about park visitors and categorize them based on their age. Moreover, we discover that the age of 16 has the greatest number of observations when compared to the others, indicating that the mode of this series is 18 years.
where,
l = lower limit of the modal class
h= size of the modal class
f1= frequency of the modal class
f0= frequency of the class preceding the modal class
f2= frequency of the class succeeding the modal class
The class with the highest frequency is called the modal class.
- It has a strict definition.
- Not every observation is used to calculate the median.
- Extreme values have little effect on it.
- It is simple to compute or comprehend.
Application:
Mode is used to give information about the value or event that occurred the most times and to summarize data.
1.7.Examples of Measures of Central Tendency
Let us consider the following example and calculate
the measures of central tendency.
Class-limits (C.L) |
mid-value (x) |
Frequency(f) |
0-10 |
5 |
11 |
10-20 |
15 |
12 |
20-30 |
25 |
10 |
30-40 |
35 |
5 |
40-50 |
45 |
6 |
A histogram can be used to calculate mode. Here, determining the highest bin on the histogram is the first step. Then, we draw a line from the highest bin's upper right corner to the bin before it and the highest bin's lower left corner to the bin after it. From the point where the two lines connect to the x-axis, a straight line is drawn. The mode is the matching value at the point where the line and the X-axis intersect.
O-give can also be used to compute the median diagrammatically. Drawing any of the O-gives curves and then locating the N/2 value on the Y-axis is one method. From there, we draw a line to the O-give curve, and from the point where O-give intersects the X-axis, we draw a line even farther. The median is the comparable position on the X-axis.
An alternative method would be to sketch both O-gives and a straight line connecting the curves' intersection point to the X-axis. The median is the comparable position on the X-axis.
1.8.Moving Average
The "Moving Average" is the arithmetic mean of a given set of K datapoints. Clusters or a selection of datapoints are averaged. It measures the series' trend and evens out its oscillations.
Let X1, X2,….,Xn be a series of n observations then Moving Average will be equal to-
Application:
In time series data, the moving average is primarily used to identify trends. It's also employed to complete data gaps. Assume for example that we are tracking the rate of inflation over time, or that we are looking at trends in stock prices, crimes against women, etc. A moving average can be quite helpful in these situations.
Year |
Closing Price |
2001 |
1220.5 |
2002 |
1230.6 |
2003 |
1300.5 |
2004 |
1120.3 |
2005 |
1221.2 |
Therefore, Moving Average = 1218.62
Comments
Post a Comment
If you have any doubt or suggestion kindly let me know. Happy learning!