Statistics 101: Measures of central tendency

More Than Mortal · November 15, 2015, 07:04:15 AM

Maths is one of those subjects where you either love it, or hate it. I tend to hate it, but I can get along with it well enough if it has some kind of practical application. Yet, despite many people hating it, a lot of people also wish they were better at it; it's obviously a skill people desire.

Since I do statistics (econometrics, actually, but it's pretty broad) I figured I'd combine the desire to be better at maths by explaining it's practical application in terms of a specific area of study. So, if you want to know more about statistics, then hopefully this will be a pretty decent guide. It will follow the same trajectory as my lectures have taken, and I will be using old notes and my textbook as a way of guiding myself. Also, me being able to explain certain things will probably help me.

Measures of Central Tendency

So, we're starting with the really basic stuff. Let's get some simple notation down:

- Observations of a variable are denoted by a letter such as X (or Y, or Z).

- The index "i" denotes a generic observation of that variable. i takes on the value of 1, 2, 3, 4 and so on so forth. So, X₁ would be the first observation. X₄ the fourth.

When it comes to measures of central tendency, we are concerned with what the typical value of X is in a given data set. The most common answer is to compute the mean of the variable's observations, which is denoted by X with a bar above it. Or, when typed, as Xbar.

As most of you probably know, the mean is defined as the sum of all the given values (or observations) divided by the overall quantity of those values.

Written mathematically, Xbar = ΣX_i / n.

The E-looking letter is a capital sigma, which is a summation notation. All it means is that we sum every given instance of X in the data set. Then, of course, we divide it by n which is the quantity of observations.

The summation notation is a very useful tool; it can help organise calculations into a much more manageable layout. Say we have some constant, "a" (usually, constants are denoted by Greek letters such as alpha, but I can't be fucked to copy-paste it every time). If, for instance, we have Σ a X_i, this is essentially the equivalent to aX₁ + aX₂ + aX₃ + . . . + aX_n.

This, however, can be re-arranged into the much more manageable aΣX_i. This saves you having to compute every instance of aX_i individually. This is called simplifying the expression. Knowing how to rearrange equations in order to simplify them can be very useful and time saving, as I will later demonstrate. But, for now, have a go at rearranging some yourself and I'll put the answers in spoilers.

A) Simplify the expression Σ(8 + 3X_i + 7Y_i - 5Z_i).

Spoiler

First of all, the summation notation can be placed in front of each part of the expression. Thus, it becomes:

- Σ8 + Σ3X_i +Σ7Y_i - Σ5Z_i.

It can thus be simplified further to:

- 8n + 3ΣX_i + 7ΣY_i - 5ΣZ_i.

Remember, Σ8 simply means we need to sum 8 for every instance of X_i, which is denoted by n. Accordingly, Σ8 can simply be reduced to "an eight for each individual observation"; or, 8n. It can be difficult to remember than the summation notation includes the entire range of observations involved (unless denoted otherwise). In order to make this clearer, it is acceptable to write Σ with a subscript "i". This makes it clear you are summing for all instances.

For the final three parts of the simplification, it is worth moving the constant (either 3, 7 or 5 in this case) to before the summation notation. Allow me to prove they are equivalent, if you cannot see the logic:

Say we have three observations on variable X, and their values are 1, 2 and 3. And, we have a constant: 3.

- Written as "Σ3X_i", we are essentially performing this calculation: (1 x 3) + (2 x 3) + (3 x 3) = 3 + 6 + 9 = 18.

Or, we can simply move the constant to before the summation to make it (1 + 2 + 3) x 3 which again equals 18. This saves you multiplying every instance of X by 3, and allows you to simply multiply the entire summation.

So, let's return to our definition of the mean:

Xbar = ΣX_i / n.

This, however, is not the only measure of central tendency. The other common answer is the median, which is simply the middle value of a ranked set of observations. The mean is used more commonly than the median, but it's important to remember that sometimes the latter may be preferable; the mean is more easily distorted by extreme values.

For instance, say you have some data on income in a given town and you want to find the typical value. Yet, unfortunately, Donald Trump lives in this town. The mean would be skewed upwards due to the large value of Trump's income, whereas the median would remain the same as the middle observation remains the middle observation regardless of how high Trump's income may be in a given set of values.

If n (the number of observations) is odd, then the median is as follows: M = X_{(n + 1) / 2}. Say n = 11, then M = X_{(11 + 1) / 2} = X₆. The median, therefore, is the sixth observation of the variable.

If n is even, then M = X_{n / 2} + X_{(n / 2) + 1} / 2. If n = 126, then n / 2 = 63 and (n / 2) + 1 = 64. Therefore, M = X₆₃ + X₆₄ / 2. Or, the 63rd and 64th observations of the variable divided by 2.

Now that you've read through all of that, try some questions:

B) The percentage marks of a class of 12 students is as follows: 80, 16, 11, 71, 85, 95, 12, 71, 8, 15 31, 25. Calculate both the mean and the median.

C) The amount of benefits, X, received by fifteen individuals in a given street, in a given week, in a given currency is: 67.73, 121.36, 54.32, 36.24, 176.56, 201.34, 97.26, 168.93, 35.61, 145.57, 76.58, 213.06, 232.55, 69.47 and 215.95. Calculate the mean and the median.

I'll wait for somebody to hit on the correct answers before posting them in a spoiler, so don't be lazy cunts. Next post, whenever it is, will deal with measures of variance and dispersion.

Anonymous (User Deleted) · November 15, 2015, 08:46:15 AM

>mfw

Spoiler

also bump so a qualified individual notices this thread

rC · November 15, 2015, 09:26:55 AM

i love me some stats nigga, let's take the natural log of our data and analyze that shit with the normal model cuz

Oh · November 15, 2015, 02:11:52 PM

Spoiler

43.333..., 28,127.50, 121.36

Assuming I didn't shit things up when I put it into my calculator.

Although I'm going maths at uni, so it's sort of cheating given I already know all that.