Distributions in R

In addition to the d and p functions mentioned in class, R includes q and r functions. What do these do? Explore the help files.

q: quantiles; returns the inverse of p.

Example: qnorm(0.2) gives you the value \(x\) such that 20% of the values of a standard normal are <= \(x\).

qnorm(0.2)

## [1] -0.842

pnorm(-0.84)

## [1] 0.2

r: provide random number generation from a distribution

Austria had a birth rate of approximately 9.9 births per 1000 people in 2019. Assume this rate is constant over the year, and that all births are independent.

For a randomly selected cohort of 1000 people, what is the probability of observing exactly 11 births over a 1-year period?

You can model this as a binomial process with \(n=1000,p=0.0099\), or Poisson, with \(\lambda = 9.9\).

c(binom = dbinom(11, 1000, 0.0099),
pois = dpois(11, 9.9))

## binom  pois 
## 0.113 0.113

What about observing 11 or more births?

Lots of ways to arrive at the same answer

# the same logic here applies for binom

c(
   ## ppois(10, ...) is the prob of 10 or less events
   ## 1 - ppois(10, ...) gives the prob of more than 10
   ppois1 = 1 - ppois(10, 9.9),  
   ## we can achieve the same thing with the lower.tail argument
   ppois2 = ppois(10, 9.9, lower.tail=FALSE),
   ## the poisson is discrete, so we can also sum the mass function
   dpois1 = 1 - sum(dpois(0:10, 9.9)),
   ## technically we need to go to infinity, but the error is small
   dpois2 = sum(dpois(11:1e6, 9.9))
)

## ppois1 ppois2 dpois1 dpois2 
##  0.404  0.404  0.404  0.404

Is a probability density the same as a probability?

Human height is normally distributed within populations. From 1980-1994 within 20 wealthy countries, mean female height was 164.7 cm, with a standard deviation of 7.1 cm. What is the maximum probability density of this normal distribution, and what is the x-value \(x_{max}\) at which maximum probability density occurs?

The maximum density of the normal is at the mean, so \(x_{max}=164.7\)

xmax = 164.7
mean_ht = 164.7
sd_ht = 7.1
dnorm(xmax, mean = mean_ht, sd = sd_ht)

## [1] 0.0562

What is the probability that a female in this time period has a height exactly equal to \(x_{max}\)
If the maximum probability density and the \(pr(x_{max})\) are not the same, why not? Do these answers make sense?

The probability is zero, because \(x\) is continuous. We computed the density in part a. We can compute the actual probability (probability mass) by integrating; clearly it will be zero, because the integral has a width of zero.

integrate(dnorm, lower = xmax, upper = xmax, mean = mean_ht, sd = sd_ht)

## 0 with absolute error < 0

What is the probability that a woman has a height in the range \(x_{max} \pm 3\)

There is nonzero mass between two different values. We can either integrate as before, or use pnorm**

c(pnorm = pnorm(xmax + 3, mean_ht, sd_ht) - pnorm(xmax - 3, mean_ht, sd_ht),
integrate = integrate(dnorm, lower = xmax - 3, upper = xmax + 3, mean = mean_ht, sd = sd_ht)["value"])

## $pnorm
## [1] 0.327
## 
## $integrate.value
## [1] 0.327

For the same distribution, what is the 40th percentile for height? In other words, what is the value \(x\) such that the probability of observing x or less is 0.4? — \(pr(X \le x) = 0.4\)

qnorm(0.4, 164.7, 7.1)

## [1] 163

What is \(x\) if \(pr(X > x) = 0.4\)?

qnorm(0.4, 164.7, 7.1, lower.tail=FALSE)

## [1] 166

qnorm(0.6, 164.7, 7.1, lower.tail=TRUE)

## [1] 166

Distributions in R

21.11.2024