Today is the day we finally talk about the normal distribution! The normal distribution is incredibly important in statistics because distributions of means are normally distributed even if populations aren't. We'll get into why this is so - due to the Central Limit Theorem - but it's useful because it allows us to make comparisons between different groups even if we don't know the underlying distribution of the population being studied.
With all due respect, I must take issue with your comment that the mean, medium and mode are the same. To use your illustration of income, if you have a few people who make very high incomes, or very low incomes, ie CEO's, neurosurgeons, for example, or people who make minimum wage, and you have a larger number of people who earn less than very wages or very low wages, these will skew the mean highly in the first example, or lower in the second example. Needless to say, the standard deviation will be skewed as well. If you use the median, this is less influenced by extreme values and the standard deviation is less skewed as well. This statistic will give a much better average income. The mode, on the other hand, give a clumps of incomes that occur > 2 times. Just for the record, I am a sociologist who concentrated on methodology, lots of stats.

i have to join other commentors, a very interesting video but a couple of concepts or arguments require more background intel or a little deeper dive into.
This is a lot of concepts in one video. They're related, but I feel like some of the topics should maybe have been split up to go into more depth.

This is like five weeks of content crammed into 11 minutes. Slow down plz.

who is the audience for this? someone like me that's already understood all of this at some point? it better not be for someone who's new to it, cause this is how you kill interest and make people feel dumb

The sample distribution can't still be perfectly normal, though. To use your example, if you measure gross income instead of net income, it is never negative no matter how many samples you take. For a true normal distribution, for any arbitrarily extreme value, if you keep drawing samples you will almost surely get something at least that extreme; for gross incomes, you will never find a sample with a negative mean no matter how big it is.
I haven't done the math to see how exactly that fails to contradict the central limit theorem, but my guess is just that the standard deviation gets smaller faster than the sample distribution approaches normal, or something like that.

I still don't understand what a sample meanS is in practical terms other than a pair of dice. Where do we in real life ONLY analyse for sample meanS not sample meaN? Makes no sense the way it was presented.

Thank you, that makes it so much clearer.

In real life we nearly always take one sample, which has a single mean. But theoretically, if we were to re-sample the same population ten times, we'd have ten samples, each with its own mean (hence "sample means", plural).
The Central Limit Theorem tells us these sample means are normally distributed around the true population mean; the standard deviation of this distribution of sample means is what we call the standard error. The standard error can be estimated from the variation in our sample, and estimating it gives us a measure of how certain we are that our sample mean represents the true mean. If the standard error is small relative to the sample mean, that tells us we've got a decent estimate. If the standard error is large, we might need a larger sample.

What stops a population from having a nonstandard distribution?

I teach this stuff to 16 year olds and i think they could have been able to follow everything up to here, but why pick up the pace like this here? Its too much info with too few 'quirky' examples imho. There are lots of people who can do math, but very few that can make videos like you can. Remember what you're good at!

Mohammed Shafei sample means I’m referring to the sampling distribution of means. Where you take the mean of a sample, plot that point and do that for all your samples, then you take the data from that distribution.

Perhaps you can make me understand then what she means by "sample means" and why not sample mean? I use statistics in real life if not deeply and I never came across a plural sample means.

Very confusing video. Was there even an explanation of what the "normal distribution" even is? The video starts off talking about it and why it's useful, but I kept waiting for what it actually is mathematically. Also, many other terms and ideas are thrown out with no explanation at all. I feel like instead of 11 minutes, this topic needed to be much longer, and deal with these topics a bit slower and more thoroughly.

Could not understand a single thing. What is the point of crash course if it just like any other boring lecture. You keep introducing new concepts but fail to explain ongoing ones. CrashCourse Statistics will not do well like this

The sample means will fall near the sample means and therefore the sample means will be mean to to poor sample means

I remember Derek's video about regression to the mean. :)

*_...so in practice, if 100 voters cast 'randomly' (e.g. uninformed) they'll pass any Bill 50% of the time without meaning-to (i.e. uninformed)—and,—to reduce that to 5%, requires a Vote minimum of 58—but also, we can estimate that any Vote within the ±7 of their mean 50, is indistinguishable from 'random' (the 'drunk-walk' vote however-much they're informed)..._*
*_...so if statistics is worth anything it is that it tells us there is no game won by a majority..._*
*_...if athletes are drug-tested to prove they're not-'drunk-walking', Senators should be too..._*

I think it's far more likely that politician simply vote along party lines rather than randomly. The deviation amount from the party line would be very small.

Normal? I have heard it described as taking both ends of a spectrum, and dividing it accordingly. Subjective bias often skews perfectly good data.

Gauss for the win!

A lot of concepts in this video that probably could have used some more room to breathe...

Statistics is the subject they chose. This one should have said more about standard error, and how 15 is close enough to 16 that it doesn't matter.
15 is way too far from 16 in this example, BTW.

Statistics is not a simple subject. I have watched entire classrooms grind to a halt over tiny details for half an hour. CC does not have the luxury of being able to address every issue every viewer has with it. This series was always going to suffer from such a feeling.

By "more room to breathe" I don't mean more pauses. I mean there is too much stuff crammed in and glossed over.

Nah I didn't need to pause. I've just heard better explanations elsewhere. The point of Crash Course (as far as I was aware) was to give you an overview of a concept so you can grasp the important foundational concepts. This video doesn't really do that. It's the weakest video in this series.

just pause whenever you need to! .. I know that's what I did. xD

"Normal" "sample" "distribution" and "mean" are all getting to that point where the word feels like it isn't real anymore.

I wish it were "sample mean", that's understandable. But sample means, as in the sample of many means of many pairs of dice... which applies to nothing else but dice... It's silly and meaningless.

Still, I have a problem with using standard deviation where the standard deviation is large compared to the mean and you get a large probability that some quantities that cannot be less than zero are. Example: if have a mean height of 2 m and a standard deviation of 5 m then there is a significant portion of the curve is less than zero meters in height. What is negative height? It is nonsense.

fatsquirrel75 Here is where u need to assess your data to see if you have an outlier. A series of 1's and an 11 suggests something the 11 may be an accident and may be thrown out with consensus.

fatsquirrel75 "It wouldn't be possible to have a mean near zero and large variance and still be normally distributed." this is simply not true.

pw where did I say otherwise? I said they will typically normalize the data. This is the process by which you shift and shape the data for easier comparison. Only way for length to end up in the negative would be to shift it there.
Picture a sample 1,1,1,1,1,1,1,1,1,11 mean =2, Variance 10.
Does that look very normal to you? It wouldn't be possible to have a mean near zero and large variance and still be normally distributed.
If it isn't normal you don't have to worry about the existence of a symmetrical tail on the negative side.

as fatsquirrel75 said, your distribution isn't normal. problem solved

Just like when solving for polynomial roots, you plug in the answer to see if it makes sense.
Sometimes you get a negative polynomial root that would not be a the answer in real life. So you just throw it out.
Same with your example. If negative lengths don't make sense, throw them out.

Why should a hiring manager hire single mothers? Single mothers don't spend enough time at work, and are a bad investment.

For a cordial society, either have a requirement to hire indiscriminately, or extract taxes from the owners of capital to support universal basic services. Leaving everyone to their own devices trends toward feudalism, balanced on a revolutionary knife edge.

Great explanation. Thanks!

Their fate will be in their own hands as they decide whether to share or to shaft.

Crash Course Game Theory?

Not at all clear -- especially for a Crash Course video.

Yes, but you can rewatch it again and again and at your own pace which is still better than a lecture lol

All I could hear is "sample means, sample means." Why sample means not sample mean?

They seem to be doing alright with examples and visual cues to make the information a bit easier to relate to. The issue I am having is the video is just drowning in jargon to the point of sickness. Take a drink every time "mean", "median", or "normal" are said without much levity. Except do not do that as you will be dead by 2 minutes in from alcohol poisoning. It is a fact that people just filter jargon, which is why much of traditional "boot camp" style lectures like this simply do not work. People chime in and out naturally in listening and when you have something this information dense using terminology that people are naturally less likely to listen too...well you get drop off and people stop caring. CC is about making people care about things and introducing them to subjects unfamiliar to them in an entertaining well. Nothing here is particularly entertaining or interesting. It is just being talked at. Ms. Hill is obviously knowledgeable and seems to be doing her best with the script, but holy hell...the script needed to be re-done with the audience in mind.

The Statistics series has been suffering from confusing writing in many places, I think. Concepts are introduced in obtuse ways, or applied without explanation; complex examples are shown first and simpler examples later; mathematical notation is avoided altogether, and when it *is* shown it's barely explained.
There's a concept hiding just behind the examples and concepts discussed in this video that never *quite* comes across: that the normal distribution is the result of the sum of many small contributing processes. Your height is the result of many small genetic and environmental factors; these factors might have all sorts of distributions, but sum them all together, and you get a trait that's normally distributed. (This is true for many or most traits you can measure in nature, which is why the distribution is called "normal".) You can easily simulate this by imagining, for example, each gene that contributes to height as a coin toss (where heads is an allele that makes you taller, and tails is an allele that doesn't). Have each person in a large group toss a coin ten times and sum their tosses, and you'll end up with a fairly normal distribution of sums (with a mean of 5, since that's the most likely result of ten coin tosses). This is what the Central Limit Theorem says.
We see this again in the die rolls: the die results have a uniform distribution, but once you start adding dice together you approximate a normal distribution (the more dice, the more normal). But for some reason the video foregoes this simpler perspective and dives straight into distributions of sample means, which is a higher level of abstraction that makes *no* sense to a viewer who hasn't already grasped what a normal distribution is and why it happens. It's frustrating.
They also mistakenly said, again, that the standard deviation represents the "average distance from the mean". It doesn't. :(

I was wondering if I missed an episode or three. I feel like we need an episode 18.5.

Is this college or high school level?

Moritz Ernst Jacob ah du bist von Deutschland. Dann ist es ja genau wie bei mir

Moritz Ernst Jacob theres a 13th grade in US?

Highschool 12th and 13th grade. If you have a higher maths course in highschool you might even integrate the polar coordinate gauss curve. At least we did that...

