Averages-Measures of Central Tendency and its Types- Mean, Median, Mode

 

None of the picture used in the blog belongs to me. Credit goes to their respective owners.

A single number called a "measure of central tendency" can be used to describe a whole set of observations of the same kind and different sizes. It calculates the dataset's center of gravity. When a collection of data has a range of observations, how can we draw any conclusions about it? This is the application of central tendency. Say someone asks you, "What time do you wake up in the morning?" as an example. 


What if, on various days, you get up at 11 a.m., 11:15 a.m., 11:30 a.m., 11:05 a.m., 11:20 a.m., etc. and you're not sure what time to say? This is the point at which we can draw conclusions using central tendency. This complete collection can be represented as a representative by a central value. If the central tendency is found to be 11 a.m., assuming that 11 a.m. is the central tendency, we can conclude that this is the typical time that you wake up. This might not seem like a big deal in the grand scheme of things, but when you factor in significant scenarios and a lot of data, it becomes complicated and crucial.

Assume that there is a dart board and that our goal is to toss the dart into the center. Not one player among the three is an expert. As a result, none of their throws strike the center; instead, they go around it. Assume, for example, that the nearest circle scores 9, then 8, and so on, while the center scores 10. Let's say one of the judges is asked, "How good are the players?" Because of the differentiability, it could therefore become a little challenging to say. However, player A is generally closer to the center than the other players.  However, what steps do we follow to calculate the average?



The mean, median, and mode are the three main metrics used to assess central tendency. But which measurement to pick? We must go over and fully comprehend a few things in order to do this.

 

These are the desired characteristics of the measure of central tendency:

 

  • · It should be strictly defined. A fixed formula or method of calculation should exist. It shouldn't differ from person to person and should be the same for everyone who calculates it. If x and y (two people/systems) have the identical observations, then both x and y should use the same formula and obtain the same outcome.
  • It should be based on all the data. The calculating method must take into account each of the 1500 observations if we are to determine the average for a group of 1500 people. It should be founded on each observation in order to be a representative sample of all the observations.

  •  Easy calculation and understanding are required. The formula itself should be simple to compute and comprehend, as should the derived result.

  •  Extreme values or outliers of a series are values that are far out from the rest of the values in the series. They should have the least impact on it. Furthermore, these extreme numbers have the potential to significantly alter our measure, produce misleading results, and cause us to interpret the data incorrectly.

  • It should be least impacted by sampling fluctuations. A population is the total set of observations used as the basis for our study; that is, the entire set of observations. A sample is a subset of the population that demonstrates the characteristics of the population. A sample may only be considered good if it possesses every attribute of the population. A population is made up of numerous samples of different sizes, and each investigator may choose a different sample. Consequently, the measure ought to be least impacted by it and ought to provide almost identical results for various samples.

  • ·The central tendency measure should be amenable to additional mathematical processing. This means that it should be able to be used with a variety of other statistical tools, enabling the completion of additional analysis.

 

1.1. Simple Arithmetic Mean

 

The average, or arithmetic mean, is calculated by dividing the total number of observations by the sum of all the observations.

 

"For ungrouped data: Assume that n is the total number of observations and that xi represents the ith observation of the collection.

Assume the following for grouped data: fi is the frequency of the ith class, xi is the mid-value of the ith class, and n is the total number of observations. The total of the frequencies is N.

 


The arithmetic mean is used for numerical variables that are not categorized.

• It is defined strictly.

• Every observation serves as its foundation.

• It is easy to calculate and understand.

• It can perform additional mathematical functions.

• Extreme values have a significant effect on it.

• It cannot be computed for grouped data with open-ended classes.

• Both the shift in origin and the shift in scale determine this.

•Zero is the total departure from the arithmetic mean. Subtracting the value from the AM yields the deviation from AM, or the amount a point deviates from the AM.

 

Application:

Assume Mr. X, the proprietor of a boutique, want to introduce a new clothing line before to the start of the next fiscal year. To that end, he conducted research to determine the target age range. Assume the age data set consisted of 20, 25, 18, 19, 17, 21, 23, for ease of calculation. He therefore determined the Arithmetic Mean and determine that 20 years is the answer. which is extremely fitting, and in line with that, he began producing clothing for his intended market. In this case, AM has assisted him in learning more about the clients.

Now, two years later, he conducted a survey to learn what customers thought of his items. In order to do that, he created a Likert scale-based question and assigned a number between 1 and 5. He then gave it to his staff, investors, clients, and so on. The answer from customers is more valuable to him than that of investors, and then everything else. However, how can one obtain the knowledge from A.M.? Here, comes the Weighted A.M. to the rescue. 

 

1.2.Weighted Arithmetic Mean

We can give the values appropriate weights if not every observation under consideration is of the same or equal relevance. One can give more weights to those who are more important than the others. And the weighted average is the average that was determined for them.


Assume n is the total number of observations and xi is the ith observation of the set, wi is the observation's weight.

 

1.3.Geometric Mean

 

For ungrouped data, the GM is determined by letting x1, x2,.. xn be a sequence of n observations.

"(x1. x2... xn) 1/n

Let x1, x2,..., xn be a sequence of n observations for the grouped data, and let f1, f2,..., fn be the associated frequencies. Next, the GM is mathematically defined by

"(x1f1, x2f2,... xnfn ) 1/N"

  • It has strict definitions.
  • It is predicated on every observation.
  • It is appropriate for additional mathematical analysis.
  • The least impact is caused by excessive values.
  • It is challenging to compute and comprehend.
  • Based on the identical data, its outcome would always be lower than A.M.'s.
  • It cannot be computed for a series that has a single observation with a value of 0 or a negative value.



Application:


In the financial market, it is quite well-liked. Since money is a continuous variable, calculating AM on it could be deceptive. For this reason, geometric mean is used in calculations like growth rate and return. People put their money in banks as recurrent increments or in stocks as systematic investments (SIPs), which results in compound interest that is determined by using a geometric mean


1.4.Harmonic Mean

The reciprocal of the average (A.M.) of the reciprocals of the observations is known as the "harmonic mean."

For ungrouped data, the HM is given by: Let x1, x2, xn be a sequence of n observations.

Let x1, x2,..., xn be a sequence of n observations for the grouped data, and let f1, f2,..., fn be the associated frequencies.


 Then, the HM is given by-

  • It has strict definitions.
  • It is predicated on every observation.
  • It is not much impacted by sample fluctuations.
  • Extreme values don't really effect it that much.
  • It is difficult to compute and comprehend.

Application:

The rate of change is mostly calculated using the harmonic mean. Similar to how the harmonic mean can be used to determine how quickly the price of a specific commodity is rising. It's also employed in the computation of different ratios, etc. 

1.5.Median

But is it always sufficient to use mean? Without a doubt not. A single figure that reflects the complete dataset is used to calculate any kind of central tendency measure. Assuming a class of thirty students, it is possible that two of them received zero and one hundred marks, respectively, resulting in an average of eighty. However, if one were to only look at the average, both of them would be regarded as having achieved scores of about 80. Therefore, "One represents all and all represent one" can be stated to be the average.

These outliers can be very problematic, but they are easily fixed by utilizing the median function, which returns the series' middle value once all of the observations have been sorted into ascending or descending order. The dataset is divided into two equal portions by the median. if there are five observations, specifically 1, 5, 3, 8, and 6.


Then, in order to calculate the median, we must first sort the data into ascending or descending order, as follows: 8,6,5,3,1. Next, we choose the intermediate value, which in this case is 5.

In the case of ungrouped data, the median is determined by taking an n-series of observations (x1, x2,..., xn).

 


Let x1, x2,..., xn be a sequence of n observations for the grouped data, and let f1, f2,..., fn be the associated frequencies. Next, the following is the Median:



where,

l = lower class boundary of the median class.

h = size of the median class

f= frequency corresponding to the median class

N = sum of frequencies

C= cumulative frequency of the class preceding the median class.

 

The class whose cumulative frequency is marginally higher than N/2 is known as the median class.


  •  It has a strict definition.
  • Not every observation is used to calculate the median.
  • Extreme values have little effect on it.
  • It is simple to compute or comprehend.

 

Application:

The term "median" refers to the average income or wage of a group of individuals. A group of people's monthly incomes can range from less than 1000 to 30,000, with some making more than 3 lakh. There would therefore be a wide range, and as the median is least impacted by extreme numbers, it is highly helpful in these situations. Outliers can be found and removed using it. Handling data that is Likert scaled, etc.

1.6.Mode

One of the most used metrics for analyzing central tendency is the mode. It provides us with answers to issues like: Where do the most crimes occur? Which age group is most vulnerable to a specific illness? etc. The value or values that appear the highest in a given series, or the value or interval that occurs the most frequently, is the mode. Assume that we gather information about park visitors and categorize them based on their age. Moreover, we discover that the age of 16 has the greatest number of observations when compared to the others, indicating that the mode of this series is 18 years.



where,

l = lower limit of the modal class

h= size of the modal class

f1= frequency of the modal class

f0= frequency of the class preceding the modal class

f2= frequency of the class succeeding the modal class


The class with the highest frequency is called the modal class.


  • It has a strict definition.
  • Not every observation is used to calculate the median.
  •  Extreme values have little effect on it.
  •  It is simple to compute or comprehend.

Application:

Mode is used to give information about the value or event that occurred the most times and to summarize data.

1.7.Examples of Measures of Central Tendency

Let us consider the following example and calculate the measures of central tendency.

Class-limits (C.L) 

mid-value (x)         

Frequency(f)

0-10

5

11

10-20

15

12

20-30

25

10

30-40

35

5

40-50

45

6

 





A histogram can be used to calculate mode. Here, determining the highest bin on the histogram is the first step. Then, we draw a line from the highest bin's upper right corner to the bin before it and the highest bin's lower left corner to the bin after it. From the point where the two lines connect to the x-axis, a straight line is drawn. The mode is the matching value at the point where the line and the X-axis intersect.




 

O-give can also be used to compute the median diagrammatically. Drawing any of the O-gives curves and then locating the N/2 value on the Y-axis is one method. From there, we draw a line to the O-give curve, and from the point where O-give intersects the X-axis, we draw a line even farther. The median is the comparable position on the X-axis.

 


 

An alternative method would be to sketch both O-gives and a straight line connecting the curves' intersection point to the X-axis. The median is the comparable position on the X-axis.

 



1.8.Moving Average

 

The "Moving Average" is the arithmetic mean of a given set of K datapoints. Clusters or a selection of datapoints are averaged. It measures the series' trend and evens out its oscillations.

 

Let X1, X2,….,Xn be a series of n observations then Moving Average will be equal to-


Application:

In time series data, the moving average is primarily used to identify trends. It's also employed to complete data gaps. Assume for example that we are tracking the rate of inflation over time, or that we are looking at trends in stock prices, crimes against women, etc. A moving average can be quite helpful in these situations.

Year

Closing Price

2001

1220.5

2002

1230.6

2003

1300.5

2004

1120.3

2005

1221.2

 

Therefore, Moving Average = 1218.62




Comments

Popular posts from this blog

WHY STATISTICS?

Everyone is a born Statistician!

STORY TELLING WITH STATISTICS