[Math Review] Statistics Basic: Estimation

Two Types of Estimation

One of the major applications of statistics is estimating population parameters from sample statistics. There are types of estimation:

  • Point Estimate: the value of sample statistics

Point estimates of average height with multiple samples (Source: Zhihu)

  • Confidence Intervals: intervals constructed using a method that contains the population parameter a specified proportion of the time.

95% confidence interval of average height with multiple samples (Source: Zhihu)

Confidence Interval for the Mean

Population Variance is known

Suppose that M is the mean of N samples X1, X2, ......, Xn, i.e.

According to Central Limit Theorem, the the sampling distribution of the mean M is

where μ and σ2 are the mean and variance of the population respectively. If repeated samples were taken and the 95% confidence interval computed for each sample, 95% of the intervals would contain the population mean. So the 95% confidence interval for M is the inverval that is symetric about the point estimate μ so that the area under normal distribution is 0.95.

That is,

Since we don‘t know the mean of population, we could use the sample mean  instead.

Population Variance is Unknown

Dregree of Freedom

The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based. In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question. 

If the variance in a sample is used to estimate the variance in a population, we couldn‘t calculate the sample variace as

That‘s because we have two parameters to estimate (i.e., sample mean and sample variance). The degree of freedom should be N-1, so the previous formula underestimates the variance. Instead, we should use the following formula

where s2 is the estimate of the variance and M is the sample mean. The denominator of this formula is the degree of freedom.

Student‘s t-Distribution

Suppose that X is a random variable of normal distribution, i.e., X ~ N(μ, σ2)

is sample mean and

is sample deviation.

is a random variable of normal distribution.

is a random variable of student‘s t distribution.

The probability density function of T is

where  is the degree of freedom,  is a gamma function.

The t distribution is very similar to the normal distribution when the estimate of variance is based on many degrees of freedom, but has relatively more scores in its tails when there are fewer degrees of freedom. Here are t distributions with 2, 4, and 10 degrees of freedom and the standard normal distribution. Notice that the normal distribution has relatively more scores in the center of the distribution and the t distribution has relatively more in the tails.

The t distribution is therefore leptokurtic. The t distribution approaches the normal distribution as the degrees of freedom increase. 

Confidence Interval of t Distribution

Now consider the case in which you have a normal distribution but you do not know the standard deviation. You sample N values and compute the sample mean (M) and estimate the standard error of the mean (σM) with sM. What is the probability that M will be within 1.96 sM of the population mean (μ)? This is a difficult problem because there are two ways in which M could be more than 1.96 sM from μ: (1) M could, by chance, be either very high or very low and (2) sM could, by chance, be very low. Intuitively, it makes sense that the probability of being within 1.96 standard errors of the mean should be smaller than in the case when the standard deviation is known (and cannot be underestimated).

Luckily, however, we can prove that random variable T will be student‘s t distribution. So we can use t distribution to estimate the mean of a normal distribution population in situations where the sample size is small and population standard deviation is unknown. For 90% confidence interval, it can be calculated as

where A is value of T that contains 90% of the area of the t distribution for n-1 degree of freedom. We can calculate A through the t table.

原文地址:https://www.cnblogs.com/sherrydatascience/p/10354428.html

时间: 2024-08-30 08:34:37

[Math Review] Statistics Basic: Estimation的相关文章

[Math Review] Statistics Basic: Sampling Distribution

Inferential Statistics Generalizing from a sample to a population that involves determining how far sample statistics are likely to vary from each other and from the population parameter. Sampling Distribution The sampling distribution of a statistic

[Math Review] Statistics Basics: A/B Testing

I learned A/B testing from a Youtube vedio. The link is https://www.youtube.com/watch?v=Bu7OqjYk0jM. Actual Predicted T (H1) F (H0) T (H1) TP FP (α) F (H0) FN (β) TN P = TP/(TP+FN) R = 1-β =TP/(TP+FN) 原文地址:https://www.cnblogs.com/sherrydatascience/p/

[Math Review] Linear Algebra for Singular Value Decomposition (SVD)

Matrix and Determinant Let C be an M × N matrix with real-valued entries, i.e. C={cij}mxn Determinant is a value that can be computed from the elements of a square matrix. The determinant of a matrix A is denoted det(A), det A, or |A|. In the case of

FAQ: Automatic Statistics Collection (文档 ID 1233203.1)

In this Document   Purpose   Questions and Answers   What kind of statistics do the Automated tasks collect   How do I revert to a previous set of statistics?   Does the automatic statistic collection jobs populate CHAIN_CNT?   11g+ Automatic Mainten

Mathematics Review

When I want to learn data structures and algorithms, I find that mathematics is important for it.Therefore, I begin to review several basic knowledges of mathematics. 1.Exponents This picture lists some of the basic formulas we need to memorize. 2.Lo

Machine and Deep Learning with Python

Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstitions cheat sheet Introduction to Deep Learning with Python How to implement a neural network How to build and run your first deep learning network Neur

图像处理与机器视觉行业分析

图像处理与机器视觉 一 行业分析 数字图像处理是对图像进行分析.加工.和处理,使其满足视觉.心理以及其他要求的技术.图像处理是信号处理在图像域上的一个应用.目前大多数的图像是以数字形式 存储,因而图像处理很多情况下指数字图像处理.此外,基于光学理论的处理方法依然占有重要的地位. 数字图像处理是信号处理的子类, 另外与计算机科学.人工智能等领域也有密切的关系. 传统的一维信号处理的方法和概念很多仍然可以直接应用在图像处理上,比如降噪.量化等.然而,图像属于二维信号,和一维信号相比,它有自己特殊的一

推荐一个多线程信号处理的文章

The odd thing about signals in UNIX is that, although they're everywhere, their arrival≍by its very nature≍is always a bit of surprise. (Well, that's a bit of an exaggeration. When we're told that the furniture delivery person will be at our house be

[book]awesome-machine-learning books

https://github.com/josephmisiti/awesome-machine-learning/blob/master/books.md Machine-Learning / Data Mining An Introduction To Statistical Learning - Book + R Code Elements of Statistical Learning - Book Probabilistic Programming & Bayesian Methods