加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 2 Random sampling with and without replacement

Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。

PDF笔记下载(Academia.edu)

Summary

  • Independent $$P(A\cap B)=P(A)\cdot P(B)$$
  • Binomial Distribution $$C_{n}^{k}\cdot p^k\cdot(1-p)^{n-k}$$ R function:
    dbinom(k, n, p)

UNGRADED EXERCISE SET A

PROBLEM 1

I toss a coin 4 times. Find the chance of getting:

1A the sequence $HTHT$

1B 2 heads

1C more heads than tails

Solution

1A) $$P(\text{HTHT})=\frac{1}{2^4}=0.0625$$

1B) Binomial distribution $n=4, k=2, p=0.5$: $$P(\text{two heads of four tosses})=C_{n}^{k}\cdot p^k\cdot (1-p)^{n-k}=C_{4}^{2}\times0.5^4=0.375$$ R code:

> dbinom(x = 2, size = 4, prob = 0.5)
[1] 0.375

1C) Binomial distribution $n=4, k=3,4, p=0.5$: $$P(\text{more heads than tails})=P(\text{3 heads of 4 tosses})+P(\text{4 heads of 4 tosses})$$ $$=\sum_{k=3}^{4}C_{4}^{k}\cdot 0.5^k\cdot (1-0.5)^{4-k}=0.25+0.0625=0.3125$$ R code:

> sum(dbinom(x = 3:4, size = 4, prob = 0.5))
[1] 0.3125

PROBLEM 2

A random number generator draws at random with replacement from the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Find the chance that the digit 5 appears on more than 11% of the draws, if:

2A 100 draws are made

2B 1000 draws are made

Solution

2A) Binomial distribution $n=100, k=12:100, p=0.1$: $$P(\text{digit 5 appears on more than 11% of 100 draws})$$ $$=\sum_{k=12}^{100}C_{100}^{k}\cdot 0.1^k\cdot (1-0.1)^{100-k}=0.2969669$$ R code:

> sum(dbinom(x = 12:100, size = 100, prob = 0.1))
[1] 0.2969669
> # alternativel using "pbinom" function
> pbinom(q = 100, size = 100, p = 0.1) - pbinom(q = 11, size = 100, p = 0.1)
[1] 0.2969669

2B) Binomial distribution $n=1000, k=111:1000, p=0.1$: $$P(\text{digit 5 appears on more than 11% of 1000 draws})$$ $$=\sum_{k=111}^{1000}C_{100}^{k}\cdot 0.1^k\cdot (1-0.1)^{1000-k}=0.1347765$$ R code:

> sum(dbinom(x = 111:1000, size = 1000, prob = 0.1))
[1] 0.1347765
> # Alternatively
> pbinom(q = 1000, size = 1000, p = 0.1) - pbinom(q = 110, size = 1000, p = 0.1)
[1] 0.1347765

PROBLEM 3

A die is rolled 12 times. Find the chance that the face with six spots appears once among the first 6 rolls, and once among the next 6 rolls.

Solution

The first six rolls and the second six rolls are independent, and each of them is binomial distribution $n=6, k=1, p=\frac{1}{6}$: $$P(\text{once among first 6 rolls & once among second 6 rolls})$$ $$=P(\text{once among first 6 rolls})\times P(\text{once among second 6 rolls})$$ $$=C_{6}^{1}\times\frac{1}{6}\times(1-\frac{1}{6})^5\times C_{6}^{1}\times\frac{1}{6}\times(1-\frac{1}{6})^5=0.1615056$$ R code:

> dbinom(x = 1, size = 6, prob = 1/6) ^ 2
[1] 0.1615056

PROBLEM 4

A quiz consists of 20 true-false questions. The score for each question is 1 point if it is answered correctly, and 0 otherwise.

4A Suppose a student guesses the answer to Question 1 on the test by tossing a coin: if the coin lands Heads, she answers True, and if it lands Tails, she answers False. What is the chance that she gets the right answer?

4B Suppose a student guesses the answers to both Questions 1 and 2 as described in 4A, using a different toss for each question. Are the events “gets the right answer to Question 1” and “gets the right answer to Question 2” independent?

4C To get an A grade on the test, you need a total score of more than 16 points. One of the students knows the correct answer to 6 of the 20 questions. The rest she guesses at random by tossing a coin (one toss per question, as in 4B). What is the chance that she gets an A grade on the test?

Solution

4A) No matter what the right answer is, the chance that the coin picks that answer is $\frac{1}{2}$.

4B) Yes, they are independent. No matter what the pair of correct answers is $(TT, TF, FT, TT)$, the chance that the students gets both right is $$P(\text{Q1 & Q2 are right})=\frac{1}{4}=\frac{1}{2}\times\frac{1}{2}=P(\text{Q1 is right})\cdot P(\text{Q2 is right})$$

4C) From the remaining 14 questions she needs to get at least 11 points. Binomial distribution $n=14, k=11:14, p=0.5$: $$P(\text{at least 11 are right among 14 questions})$$ $$=\sum_{k=11}^{14}C_{14}^{k}\cdot0.5^k\cdot(1-0.5)^{14-k}=0.02868652$$ R code:

> sum(dbinom(x = 11:14, size = 14, prob = 0.5))
[1] 0.02868652

PROBLEM 5

A die has one red face, two blue faces, and three green faces. It is rolled 5 times. Find the chance that the red face appears on one of the rolls and the remaining rolls are green. [Careful what you multiply. The most straightforward method is to follow the derivation of the binomial formula.]

Solution

This can be seen as a derivation of binomial distribution: $C_{n}^{k}\cdot {p_1}^k\cdot {p_2}^{n-k}$, where $n=5, k=1, p_1=\frac{1}{6}, p_2=\frac{3}{6}$: $$P(\text{1 red and 4 green among 5 rolls})=C_{5}^{1}\times\frac{1}{6}\times(\frac{3}{6})^4=0.05208333$$ R code:

> choose(5, 1) * (1/6) * (3/6)^4
[1] 0.05208333

Summary

  • Hypergeometric Distribution $$\frac{C_{m}^{x}\cdot C_{n}^{k-x}}{C_{m+n}^{k}}$$ R function:

    dhyper(x, m, n, k)
  • Geometric Distribution $$p\cdot(1-p)^x$$ R function:
    dgeom(x, p)

UNGRADED EXERCISE SET B

PROBLEM 1

A poker hand consists of 5 cards dealt at random without replacement from a standard deck of 52 cards of which 26 are red and the rest black. A poker hand is dealt. Find the chance that the hand contains three red cards and two black cards.

Solution

Hypergeometric distribution $x=3, m=26, n=26, k=5$: $$P(\text{3 red and 2 black})=\frac{C_{m}^{x}\cdot C_{n}^{k-x}}{C_{m+n}^{k}}=\frac{C_{26}^{3}\cdot C_{26}^{2}}{C_{52}^{5}}=0.3251301$$ R code:

> dhyper(x = 3, m = 26, n = 26, k = 5)
[1] 0.3251301

PROBLEM 2

In a population of 500 voters, 40% belong to Party X. A simple random sample of 60 voters is taken. What is the chance that a majority (more than 50%) of the sampled voters belong to Party X?

Solution

Hypergeometric distribution $x=31:60, m=200, n=300, k=60$: $$P(\text{majority voters belong to Party X})$$ $$=\frac{C_{m}^{x}\cdot C_{n}^{k-x}}{C_{m+n}^{k}}=\frac{\sum_{x=31}^{60}C_{200}^{x}\cdot C_{300}^{60-x}}{C_{500}^{60}}=0.0348151$$ R code:

> sum(dhyper(x = 31:60, m = 200, n = 300, k = 60))
[1] 0.0348151

PROBLEM 3

In an egg carton there are 12 eggs, of which 9 are hard-boiled and 3 are raw. Six of the eggs are chosen at random to take to a picnic (yes, the draws are made without replacement). Find the chance that at least one of the chosen eggs is raw.

Solution

Hypergeometric distribution $x=1:3, m=3, n=9, k=6$: $$P(\text{at least one is raw})=\frac{C_{m}^{x}\cdot C_{n}^{k-x}}{C_{m+n}^{k}}=\frac{\sum_{x=1}^{3}C_{3}^{x}\cdot C_{9}^{6-x}}{C_{12}^{6}}=0.9090909$$ R code:

> sum(dhyper(x = 1:3, m = 3, n = 9, k = 6))
[1] 0.9090909

PROBLEM 4

A box contains 8 dark chocolates, 8 white chocolates, and 8 milk chocolates. I choose chocolates at random (yes, without replacement; I’m eating them). What is the chance that I have chosen 20 chocolates and still haven’t got all the dark ones?

Solution

Hypergeometric distribution $x=0:7, m=8, n=16, k=20$: $$P(\text{less than 8 dark chocolates})=\frac{C_{m}^{x}\cdot C_{n}^{k-x}}{C_{m+n}^{k}}=\frac{\sum_{x=0}^{7}C_{8}^{x}\cdot C_{16}^{20-x}}{C_{24}^{20}}=0.828722$$ R code:

> sum(dhyper(x = 0:7, m = 8, n = 16, k = 20))
[1] 0.828722
> 1-dhyper(x = 8, m = 8, n = 16, k = 20)
[1] 0.828722

PROBLEM 5

I throw darts repeatedly. Assume that on each throw I have a 1% chance of hitting the bullseye, independently of all other throws. (Note that this implies for example that repetition doesn’t help my aim get any better; in my case that might not be such a bad assumption.) Find the chance that it takes me more than 100 throws to hit the bullseye.

Solution

At least 101 throws including 100 fails and 1 success. so $$P(\text{more than 100 throws to hit the bullseye})=(1-0.01)^{100}=0.3660323$$ Alternatively, we can consider that "doesn‘t hit the bullseye within 100 throws" (geometric distribution $x=0:99, p=0.01$): $$P(\text{more than 100 throws to hit the bullseye})$$ $$=1-P(\text{at most 100 throws to hit the bullseye})$$ $$=1-\sum_{x=0}^{99}(1-0.01)^x\cdot0.01=0.3660323$$ R code:

> 1 - sum(dgeom(x = 0:99, prob = 0.01))
[1] 0.3660323

PROBLEM 6

If you bet on “red” at roulette, you have chance 18/38 of winning. (There will be more on roulette later in the course; for now, just treat it as a generic gambling game.) Suppose you make a sequence of independent bets on “red” at roulette, with the decision that you will stop playing once you’ve won 5 times. What is the chance that after 15 bets you are still playing?

Solution

After 15 bets you are still playing means "there are at most winning 4 times within 15 bets", hence it is binomial distribution that $n=15, k=0:4, p=\frac{18}{38}$: $$P(\text{at most winning 4 times within 15 bets})$$ $$=\sum_{k=0}^{4}C_{15}^{k}\cdot(\frac{18}{38})^k\cdot(1-\frac{18}{38})^{15-k}=0.08739941$$ R code:

> sum(dbinom(x = 0:4, size = 15, prob = 18/38))
[1] 0.08739941

PROBLEM 7

A school is running a raffle. There are 100 tickets, of which 3 are winners. You can assume that tickets are sold by drawing at random without replacement from the available tickets. Teacher X buys 10 raffle tickets, and so does Teacher Y. Find the chance that one of those two teachers gets all three winning tickets.

Solution

Hypergeometric distribution $x=3, m=3, n=97, k=10$: $$P(\text{teacher X or teacher Y gets all three winning tickets})$$ $$=P(\text{teacher X gets three winning tickets})+P(\text{teacher Y gets three winning tickets})$$ $$=2\times\frac{C_{m}^{x}\cdot C_{n}^{k-x}}{C_{m+n}^{k}}=2\times\frac{C_{3}^{3}\cdot C_{97}^{7}}{C_{100}^{10}}=0.00148423$$ R code:

> 2 * dhyper(x = 3, m = 3, n = 97, k = 10)
[1] 0.00148423
时间: 2024-10-08 22:49:45

加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 2 Random sampling with and without replacement的相关文章

加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 1 The Two Fundamental Rules (1.5-1.6)

Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Academia.edu) Summary Bayes Theorem $$P(A_i|B)=\frac{P(B|A_i)\cdot P(A_i)}{\sum_{j}P(B|A_j)\cdot P(A_j)}$$ where $$P(B)=\sum_{j}P(B|A_j)\cdot P(A_j)$$ GRA

加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem

Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Academia.edu) Summary Standard Error The standard error of a random variable $X$ is defined by $$SE(X)=\sqrt{E((X-E(X))^2)}$$ $SE$ measures the rough size

加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values

Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Academia.edu) Summary Law of Large Numbers As the number of trials increases, the chance that the proportion of successes is in the range $$p\pm\text{a fi

加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Midterm

Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Academia.edu) PRACTICE PROBLEMS FOR THE MIDTERM PROBLEM 1 In a group of 5 high school students, 2 are in 9th grade, 2 are in 10th grade, and 1 is in 12th

加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final

Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Academia.edu) ADDITIONAL PRACTICE FOR THE FINAL PROBLEM 1 A box contains 8 dark chocolates, 8 milk chocolates, and 8 white chocolates. (It’s amazing how t

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL

Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Academia.edu) ADDITIONAL PRACTICE FOR THE FINAL In the following problems you will be asked to choose one of the four options (A)-(D). The options are sta

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses

Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Academia.edu) Summary Test of Hypotheses $$\text{Null}: H_0$$ $$\text{Alternative}: H_A$$ Assuming the null is true, the chance of getting data like the d

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 4 Dependent Samples

Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Academia.edu) Summary Dependent Variables (paired samples) SD of the difference is $$\sqrt{\sigma_x^2+\sigma_y^2-2\cdot r\cdot\sigma_x\cdot\sigma_y}$$ whe

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World

Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Academia.edu) Summary Chi-square test Random sample or not / Good or bad $$H_0: \text{Good model}$$ $$H_A: \text{Not good model}$$ Based on the expected p