Homework 8

Table of Contents

Problem 1

Let XX be a Bernoulli random variable with success probability pp. Consider the null hypothesis H0 ⁣:p=12H_0\colon p = \frac{1}{2} and alternative hypothesis H1 ⁣:p=13H_1\colon p = \frac{1}{3}. For a sample size n=5n = 5, find CC, the best critical region of size α=0.1875\alpha = 0.1875. Find the power of the test associated with CC.

Solution.

To find best critical regions, we need to:

  1. Write down the likelihood ratio L(θ0)L(θ1)\frac{L\p{\theta_0}}{L\p{\theta_1}}.

  2. Rewrite the inequality L(θ0)L(θ1)k\frac{L\p{\theta_0}}{L\p{\theta_1}} \leq k in terms of ucu \leq c or ucu \geq c, where u=u(X1,,Xn)u = u\p{X_1, \ldots, X_n} is some statistic (for example, i=1nXi\sum_{i=1}^n X_i). We typically take logs at this step.

  3. If given α\alpha, you need to solve for cc in

    P(uc|H0)=αorP(uc|H0)=α\P\p{u \leq c \given H_0} = \alpha \quad\text{or}\quad \P\p{u \geq c \given H_0} = \alpha

    using the distribution of uu. The direction of the inequality is exactly the same as what you got in step 2.

First, let's write down the likelihood ratio.

L(12)L(13)=i=15(12)xi(112)1xii=15(13)xi(113)1xi=(12)5(13)i=15xi(23)5i=15xi.\frac{L\p{\frac{1}{2}}}{L\p{\frac{1}{3}}} = \frac{\prod_{i=1}^5 \p{\frac{1}{2}}^{x_i} \p{1 - \frac{1}{2}}^{1-x_i}}{\prod_{i=1}^5 \p{\frac{1}{3}}^{x_i} \p{1 - \frac{1}{3}}^{1-x_i}} = \frac{\p{\frac{1}{2}}^5}{\p{\frac{1}{3}}^{\sum_{i=1}^5 x_i} \p{\frac{2}{3}}^{5-\sum_{i=1}^5 x_i}}.

I will use the notation y=i=15xiy = \sum_{i=1}^5 x_i for brevity. Next, we take logs on both sides of L(12)L(13)k\frac{L\p{\frac{1}{2}}}{L\p{\frac{1}{3}}} \leq k, which gives

(12)5(13)i=15xi(23)5i=15xik    5ln(12)yln(13)(5y)ln(23)lnk    yln2lnk5ln(12)+5ln(23)()    ylnk5ln(12)+5ln(23)ln2=c.\begin{aligned} \frac{\p{\frac{1}{2}}^5}{\p{\frac{1}{3}}^{\sum_{i=1}^5 x_i} \p{\frac{2}{3}}^{5-\sum_{i=1}^5 x_i}} \leq k &\iff 5\ln\p{\frac{1}{2}} - y \ln\p{\frac{1}{3}} - \p{5 - y} \ln\p{\frac{2}{3}} \leq \ln k \\ &\iff y\ln 2 \leq \ln k - 5\ln\p{\frac{1}{2}} + 5\ln\p{\frac{2}{3}} && \p{*} \\ &\iff y \leq \frac{\ln k - 5\ln\p{\frac{1}{2}} + 5\ln\p{\frac{2}{3}}}{\ln 2} = c. \end{aligned}

This means that a best critical region is of the form Y=i=15XicY = \sum_{i=1}^5 X_i \leq c. Recall that a sum of 55 independent Bernoulli trials (with the same success probability) has distribution Bin(5,p)\operatorname{Bin}\p{5, p}, so we need to solve

P(Yc|p=12)=0.1875\P\p{Y \leq c \given p = \frac{1}{2}} = 0.1875

for cc. Using the binomial cdf table, we get

c=1.\boxed{c = 1}.

The power is

K(13)=P(Y1|p=13)=k=01(5k)(13)k(113)5k=0.4609.\begin{aligned} K\p{\frac{1}{3}} &= \P\p{Y \leq 1 \given p = \frac{1}{3}} \\ &= \sum_{k=0}^1 \binom{5}{k} \p{\frac{1}{3}}^k \p{1 - \frac{1}{3}}^{5-k} \\ &= \boxed{0.4609}. \end{aligned}

Problem 2

Let XX have an exponential distribution with a mean of θ\theta; that is, the pdf of X is

f(x;θ)=1θexθ,0<x<.f\p{x; \theta} = \frac{1}{\theta} e^{-\frac{x}{\theta}}, \quad 0 < x < \infty.

Let X1,,XnX_1, \ldots, X_n be a random sample from this distribution.

  1. Show that a best critical region for testing H0 ⁣:θ=3H_0\colon \theta = 3 against H1 ⁣:θ=5H_1\colon \theta = 5 can be based on the statistic i=1nXi\sum_{i=1}^n X_i.
  2. If n=12n = 12, use the fact that 2θi=112Xi\frac{2}{\theta} \sum_{i=1}^{12} X_i is χ2(24)\chi^2\p{24} to find a best critical region of size α=0.1\alpha = 0.1.
Solution.
  1. The likelihood is

    L(θ)=i=1n1θexiθ=1θne1θi=1nxi.L\p{\theta} = \prod_{i=1}^n \frac{1}{\theta} e^{-\frac{x_i}{\theta}} = \frac{1}{\theta^n} e^{-\frac{1}{\theta} \sum_{i=1}^n x_i}.

    Thus,

    L(3)L(5)=(53)nexp((13+15)i=1nxi).\frac{L\p{3}}{L\p{5}} = \p{\frac{5}{3}}^n \exp\p{\p{-\frac{1}{3} + \frac{1}{5}} \sum_{i=1}^n x_i}.

    We get

    (53)nexp((13+15)i=1nxi)k    nln(53)+(13+15)i=1nxilnk    (13+15)i=1nxilnknln(53)    i=1nxilnknln(53)13+15=c.\begin{aligned} \p{\frac{5}{3}}^n \exp\p{\p{-\frac{1}{3} + \frac{1}{5}} \sum_{i=1}^n x_i} \leq k &\iff n\ln\p{\frac{5}{3}} + \p{-\frac{1}{3} + \frac{1}{5}} \sum_{i=1}^n x_i \leq \ln k \\ &\iff \p{-\frac{1}{3} + \frac{1}{5}} \sum_{i=1}^n x_i \leq \ln k - n\ln\p{\frac{5}{3}} \\ &\iff \sum_{i=1}^n x_i \geq \frac{\ln k - n\ln\p{\frac{5}{3}}}{-\frac{1}{3} + \frac{1}{5}} = c. \end{aligned}

    Note that because 13+15<0-\frac{1}{3} + \frac{1}{5} < 0 that the inequality flips.

  2. From part 1, we need to solve

    P(i=1nXic|θ=3)=0.1.\P\p{\sum_{i=1}^n X_i \geq c \given \theta = 3} = 0.1.

    Here, we don't know the distribution of i=112Xi\sum_{i=1}^{12} X_i, but we do know the distribution of 2θi=112Xi\frac{2}{\theta} \sum_{i=1}^{12} X_i, so we write

    P(i=112Xic|θ=3)=P(23i=112Xi2c3|θ=3)=0.1    P(23i=112Xi2c3|θ=3)=0.9.\begin{gathered} \P\p{\sum_{i=1}^{12} X_i \geq c \given \theta = 3} = \P\p{\frac{2}{3} \sum_{i=1}^{12} X_i \geq \frac{2c}{3} \given \theta = 3} = 0.1 \\ \implies \P\p{\frac{2}{3} \sum_{i=1}^{12} X_i \leq \frac{2c}{3} \given \theta = 3} = 0.9. \end{gathered}

    When using the chi-square table, you have to read it carefully. Even though this is equal to 0.90.9, we will need to use χ0.12(24)=33.20\chi^2_{0.1}\p{24} = 33.20, so

    c=3χ0.12(24)2=49.8.c = \frac{3 \chi^2_{0.1}\p{24}}{2} = 49.8.

    Thus, a best critical region is

    i=112Xi49.8.\boxed{\sum_{i=1}^{12} X_i \geq 49.8}.

Problem 3

(If you finished your homework early, note that the problem was changed to use a two-sided alternative and one of the parts was removed.)

Let X1,,XnX_1, \ldots, X_n be a random sample of size nn from a normal distribution N(μ,100)\mathcal{N}\p{\mu, 100}.

  1. To test H0 ⁣:μ=230H_0\colon \mu = 230 against H1 ⁣:μ230H_1\colon \mu \neq 230, what is the critical region specified by the likelihood ratio test?
  2. If a random sample of n=16n = 16 yielded x=232.6\mean{x} = 232.6, is H0H_0 accepted at a significance level of α=0.1\alpha = 0.1?
Solution.
  1. The likelihood is

    L(μ)=i=1n12πσ2exp(12σ2(xiμ)2)=1(2πσ2)n2exp(12σ2i=1n(xiμ)2).\begin{aligned} L\p{\mu} &= \prod_{i=1}^n \frac{1}{\sqrt{2\pi \sigma^2}} \exp\p{-\frac{1}{2\sigma^2} \p{x_i - \mu}^2} \\ &= \frac{1}{\p{2\pi\sigma^2}^{\frac{n}{2}}} \exp\p{-\frac{1}{2\sigma^2} \sum_{i=1}^n \p{x_i - \mu}^2}. \end{aligned}

    To perform a likelihood ratio test, we need to optimize it, so we essentially need to find the MLE. We can optimize the log-likelihood like usual.

    lnL(μ)=n2ln(2πσ2)12σ2i=1n(xiμ)2    μlnL(μ)=1σ2i=1n(xiμ).\ln L\p{\mu} = -\frac{n}{2} \ln\p{2\pi \sigma^2} - \frac{1}{2\sigma^2} \sum_{i=1}^n \p{x_i - \mu}^2 \\ \implies \frac{\partial}{\partial \mu} \ln L\p{\mu} = \frac{1}{\sigma^2} \sum_{i=1}^n \p{x_i - \mu}.

    Setting it equal to 00 and solving gives μ^=x\widehat{\mu} = \mean{x}. Thus,

    λ=L(230)L(x)=exp(12σ2i=1n((xi230)2(xix)2))=exp(12σ2i=1n(xi230(xix))(xi230+(xix)))=exp(12σ2(x230)i=1n(2xi230x))=exp(12σ2(x230)(2nx230nnx))=exp(n2σ2(x230)2).\begin{aligned} \lambda &= \frac{L\p{230}}{L\p{\mean{x}}} \\ &= \exp\p{-\frac{1}{2\sigma^2} \sum_{i=1}^n \p{\p{x_i - 230}^2 - \p{x_i - \mean{x}}^2}} \\ &= \exp\p{-\frac{1}{2\sigma^2} \sum_{i=1}^n \p{x_i - 230 - \p{x_i - \mean{x}}} \p{x_i - 230 + \p{x_i - \mean{x}}}} \\ &= \exp\p{-\frac{1}{2\sigma^2} \p{\mean{x} - 230} \sum_{i=1}^n \p{2x_i - 230 - \mean{x}}} \\ &= \exp\p{-\frac{1}{2\sigma^2} \p{\mean{x} - 230} \p{2n\mean{x} - 230n - n\mean{x}}} \\ &= \exp\p{-\frac{n}{2\sigma^2} \p{\mean{x} - 230}^2}. \end{aligned}

    We get

    λk    exp(n2σ2(x230)2)k    n2σ2(x230)2lnk    (x230)22σ2lnkn    x2302σ2lnkn=c.\begin{aligned} \lambda \leq k &\iff \exp\p{-\frac{n}{2\sigma^2} \p{\mean{x} - 230}^2} \leq k \\ &\iff -\frac{n}{2\sigma^2} \p{\mean{x} - 230}^2 \leq \ln k \\ &\iff \p{\mean{x} - 230}^2 \geq -\frac{2\sigma^2 \ln k}{n} \\ &\iff \abs{\mean{x} - 230} \geq \sqrt{-\frac{2\sigma^2 \ln k}{n}} = c'. \end{aligned}

    (I write cc' here because we're going to replace the constant one more time.)

    Note that the inequality flips and that lnk0-\ln k \geq 0, so the negative in the square root looks funny, but isn't actually a problem. This tells us that our critical region has the form

    X230c.\abs{\mean{X} - 230} \geq c'.

    Recall that under H0H_0,

    Z=X23010/16N(0,1)Z = \frac{\mean{X} - 230}{10/\sqrt{16}} \sim \mathcal{N}\p{0, 1}

    and that X230c\abs{\mean{X} - 230} \geq c' is equivalent to Z=X23010/16c10/16=c\abs{Z} = \frac{\abs{\mean{X} - 230}}{10/\sqrt{16}} \geq \frac{c'}{10/\sqrt{16}} = c. Thus, to ensure that the test has significance level α\alpha, we need

    P(X230c)=P(Zc)=α    c=zα2,\begin{gathered} \P\p{\abs{\mean{X} - 230} \geq c'} = \P\p{\abs{Z} \geq c} = \alpha \\ \implies c = z_{\frac{\alpha}{2}}, \end{gathered}

    so the critical region is

    Zzα2.\boxed{\abs{Z} \geq z_{\frac{\alpha}{2}}}.
  2. From the zz-table, we have z0.052=1.645z_{\frac{0.05}{2}} = 1.645, so we reject if z1.645\abs{z} \geq 1.645. The observed zz-statistic is

    z=232.623010/16=1.04,\abs{z} = \frac{\abs{232.6 - 230}}{10/\sqrt{16}} = 1.04,

    so we fail to reject H0H_0 at α=0.1\alpha = 0.1.

Problem 4

Let XX equal the number of female children in a three-child family. We shall use a chi-square goodness-of-fit statistic to test the null hypothesis that the distribution of XX is Bin(3,12)\operatorname{Bin}\p{3, \frac{1}{2}}.

  1. Define the test statistic and critical region, using an α=0.05\alpha = 0.05 significance level.
  2. Among students who were taking a statistics course, 5252 came from families with three children. Let x=0,1,2x = 0, 1, 2 and 33 represent the number of female children, and for these 5252 families the frequencies for each possible outcome are 5,17,245, 17, 24, and 66, respectively. Calculate the value of the test statistic and state your conclusion.
Solution.
  1. Testing the goodness-of-fit of this model means we need to test the hypothesis

    H0 ⁣:pi=pi0,pi0=P(Bin(3,12)=i),0i3.H_0\colon p_i = p_{i0}, \quad p_{i0} = \P\p{\operatorname{Bin}\p{3, \frac{1}{2}} = i}, \quad 0 \leq i \leq 3.

    These are

    p00=18,p10=38,p20=38,p30=18.p_{00} = \frac{1}{8}, \quad p_{10} = \frac{3}{8}, \quad p_{20} = \frac{3}{8}, \quad p_{30} = \frac{1}{8}.

    There are 44 probabilities to test, so k=4k = 4. The test statistic is

    Qk1=Q3=i=03(xinpi0)2npi0,Q_{k-1} = Q_3 = \sum_{i=0}^3 \frac{\p{x_i - np_{i0}}^2}{np_{i0}},

    and we reject if q3χ0.052(3)=7.815q_3 \geq \chi^2_{0.05}\p{3} = 7.815.

  2. Our observed test statistic is

    q3=(55218)25218+(175238)25238+(245238)25238+(65218)25218=1.7436,q_3 = \frac{\p{5 - 52 \cdot \frac{1}{8}}^2}{52 \cdot \frac{1}{8}} + \frac{\p{17 - 52 \cdot \frac{3}{8}}^2}{52 \cdot \frac{3}{8}} + \frac{\p{24 - 52 \cdot \frac{3}{8}}^2}{52 \cdot \frac{3}{8}} + \frac{\p{6 - 52 \cdot \frac{1}{8}}^2}{52 \cdot \frac{1}{8}} = 1.7436,

    so we fail to reject H0H_0 at α=0.05\alpha = 0.05.

Problem 5

In the Michigan Lottery Daily3 Game, twice a day a three-digit integer is generated one digit at a time. Let pip_i denote the probability of generating digit ii, i=0,1,,9i = 0, 1, \ldots, 9. Let α=0.05\alpha = 0.05, and use the following 50 digits to test H0 ⁣:p0=p1==p9=110H_0\colon p_0 = p_1 = \cdots = p_9 = \frac{1}{10}.

16993850674759465644480932154573214671344886161288\begin{array}{rrrrrrrrrr} 1 & 6 & 9 & 9 & 3 & 8 & 5 & 0 & 6 & 7 \\ 4 & 7 & 5 & 9 & 4 & 6 & 5 & 6 & 4 & 4 \\ 4 & 8 & 0 & 9 & 3 & 2 & 1 & 5 & 4 & 5 \\ 7 & 3 & 2 & 1 & 4 & 6 & 7 & 1 & 3 & 4 \\ 4 & 8 & 8 & 6 & 1 & 6 & 1 & 2 & 8 & 8 \end{array}
Solution.

We have 1010 probabilities to test, so k=10k = 10. Thus, our test statistic is Qk1=Q9Q_{k-1} = Q_9 and we reject if q9χ0.052(9)=16.92q_9 \geq \chi^2_{0.05}\p{9} = 16.92. From counting the numbers in the list, our observed values are given by the following table.

x0123456789count2634957464\begin{array}{r|rrrrrrrrrr} x & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\ \text{count} & 2 & 6 & 3 & 4 & 9 & 5 & 7 & 4 & 6 & 4 \end{array}

For each pip_i, the expected number of observations is 55, so

q9=(25)25+(65)25++(45)25=7.6.q_9 = \frac{\p{2 - 5}^2}{5} + \frac{\p{6 - 5}^2}{5} + \cdots + \frac{\p{4 - 5}^2}{5} = 7.6.

Thus, we fail to reject H0H_0 at α=0.05\alpha = 0.05.