Regression problems

We can draw a line through the points that minimizes the sum of squared residuals $\sum_i (y_i - \mathbb{E}[y|x_i])^2$
$\mathbb{E}[y|x_i]$ is the expectation of $y$ conditional on x
- what do we expect $y$ to be, on average, for any value of $x$
This line has two parameters, an intercept $a$ and a slope $b$
- $\mathbb{E}(y|x_i) = a + bx_i$
$y \sim \mathcal{N}(a + bx, s)$
- “y is distributed as”

Regression problems

We can draw a line through the points that minimizes the sum of squared residuals $\sum_i (y_i - \mathbb{E}[y|x_i])^2$
$\mathbb{E}[y|x_i]$ is the expectation of $y$ conditional on x
- what do we expect $y$ to be, on average, for any value of $x$
This line has two parameters, an intercept $a$ and a slope $b$
- $\mathbb{E}(y|x_i) = a + bx_i$
$y \sim \mathcal{N}(a + bx, s)$
- “y is distributed as”
Key assumptions:
- $y$ is iid
- $y$ has constant variance with respect to $x$
- the residuals are normally distributed (or $y|x$ is normally distributed)

Graphing the model

It is always a good idea to graph your model
The graph represents knowns and unknowns as nodes, with relationships as edges
Variables (nodes) may be either stochastic or deterministic
Your model must estimate all unknowns
- stochastic unknowns are usually parameters
- deterministic unknowns must be computed somewhere in your model
- stochastic knowns usually appear as the first parameter is a probability statement like dnorm

Graphing the model

Bayesian models that can be fit in Stan are directed acyclic graphs
Can draw them by hand, or use igraph and ggnetwork

library("igraph")
library("ggnetwork")

# Here we define the edges
# -+ is an arrow you want to draw (+ is the arrowhead)
# Nodes are created implicitly; name it here and it will be added
gr = graph_from_literal(a-+"E(y)", b-+"E(y)", s-+y, x-+"E(y)", "E(y)"-+y)

# Here we define two attributes, type and source
# These are assigned to nodes using the V (for Vertex) function
# The order is defined by the order the nodes appear when the graph is created
V(gr)$type = c("stochastic", "deterministic", "stochastic", 
               "stochastic", "stochastic", "deterministic")
V(gr)$source = c("unknown", "unknown", "unknown", 
                 "unknown", "known", "known")

# Here we define an optional layout, telling ggnetwork where to put each node
# The matrix should have one row per node, and columns are x and y coordinates
# If you skip this, ggnetwork will try to make a pretty graph by itself
layout = matrix(c(0.5,2,  0.5,1,  1,2,  1.5,2,  1.25,0, 0,2), 
                byrow=TRUE, ncol=2)


n = ggnetwork(gr, layout=layout)
(pl = ggplot(n, aes(x = x, y = y, xend = xend, yend = yend)) + 
   geom_edges(colour="gray50", arrow=arrow(length = unit(6, "pt"), 
                                type = "closed")) + 
   theme_blank() + geom_nodes(aes(color=type, shape = source), size=6) + 
   geom_nodelabel(aes(label = name), fontface = "bold", nudge_x=-0.1))

Using the graph

You can use the graph to help you design your Stan code!

Start with an empty model
Simple guideline: everything in your graph must either:
- Appear on the left side of the ~ symbol (stochastic nodes)
- Appear on the left side of the = symbol (deterministic unknowns)
- Appear in the data block (everything else)
- Stochastic variables that are also measured will be in data and on the left side of ~

data {

}
parameters {

}
transformed parameters {

}
model {

}

Using the graph

Start at the bottom, define the probabilistic model for y. I will define the distribution of y (place it on the left side of ~).
- In our graph, y has two dependencies (eta and s); these should appear on the right side of the expression for y.
- I also need to declare the variable (i.e., tell Stan what kind of variable y is: a vector with length n)

data {
    vector [n] y;
}
parameters {

}
transformed parameters {

}
model {
    y ~ normal(eta, s);
}

Using the graph

Everything I used to declare/define y now needs it’s own declaration and definition.
- Here we do s and n, which have no dependencies.

data {
    int <lower = 1> n;
    vector [n] y;
}
parameters {
    real <lower = 0> s;
}
transformed parameters {

}
model {
    y ~ normal(eta, s);
    s ~ exponential(0.1);
}

Using the graph

Iterate, following each dependency until everything on the graph is in the model

data {
    int <lower = 1> n;
    vector [n] y;
}
parameters {
    real <lower = 0> s;
}
transformed parameters {
    vector [n] eta;
    eta = a + b * x;
}
model {
    y ~ normal(eta, s);
    s ~ exponential(0.1);
}

Using the graph

Iterate, following each dependency until everything on the graph is in the model

data {
    int <lower = 1> n;
    vector [n] y;
    vector [n] x;
}
parameters {
    real <lower = 0> s;
    real a;
    real b;
}
transformed parameters {
    vector [n] eta;
    eta = a + b * x;
}
model {
    y ~ normal(eta, s);
    s ~ exponential(0.1);
    a ~ normal(0, 10);
    b ~ normal(0, 5);
}

What to put where??

These are guidelines not strict rules…

data {
    // declaration of all knowns
}
parameters {
    // declaration of all stochastic unkowns
}
transformed parameters {
    // declaration and definition of deterministic unknowns
}
model {
    // definition of all unknowns
}

Fitting to the Penguins

data {
    int <lower = 1> n;
    vector [n] bill_depth;
    vector [n] bill_len;
}
parameters {
    real <lower = 0> s;
    real a;
    real b;
}
transformed parameters {
    vector [n] eta;
    eta = a + b * bill_len;
}
model {
    bill_depth ~ normal(eta, s);
    s ~ exponential(0.1);
    a ~ normal(0, 10);
    b ~ normal(0, 5);
}

Fitting to the Penguins

library(rstan)
peng_lm = stan_model("stan/penguin_lm.stan")

penguins = as.data.frame(palmerpenguins::penguins)
penguins = subset(penguins, complete.cases(penguins) & species == "Gentoo")
peng_dat = with(penguins, list(
    bill_len = bill_length_mm,
    bill_depth = bill_depth_mm,
    n = length(bill_length_mm)
))
peng_fit = sampling(peng_lm, data = peng_dat, refresh = 0)
print(peng_fit, pars = c("a", "b", "s"))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##   mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
## a 5.02    0.03 1.07 2.91 4.28 5.02 5.76  7.17  1184    1
## b 0.21    0.00 0.02 0.16 0.19 0.21 0.23  0.25  1174    1
## s 0.76    0.00 0.05 0.67 0.72 0.76 0.79  0.87  1572    1
## 
## Samples were drawn using NUTS(diag_e) at Fri Nov 29 12:29:40 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

More linear models

Our linear model used a single x-variable: $\mathbb{E}(y) = a + bx$
It is trivial to add additional predictors to a model: $\mathbb{E}(y) = a + b_1x_1 + b_2x_2 + \dots + b_nx_n$

Adding a Curve

Our linear model used a single x-variable: $\mathbb{E}(y) = a + bx$
It is trivial to add additional predictors to a model: $\mathbb{E}(y) = a + b_1x_1 + b_2x_2 + \dots + b_nx_n$
One reason to do this is to add a curve: we could add a new variable called $x^2$

\[ \mathbb{E}(y) = a + b_1x + b_2x^2 \]

Dataset: Nancy Howell (Univ. Toronto) | Measurements of the Dobe !Kung people | Republished in McElreath 2020

library(data.table)
kung = fread("https://github.com/rmcelreath/rethinking/raw/master/data/Howell1.csv")
(pl = ggplot(data = kung, aes(x = weight, y = height)) + geom_point(col = "#555555aa", size = 0.9) +
    geom_smooth(method = "lm", col = "blue", linewidth = 1.5, se = FALSE) +
    geom_smooth(method = "lm", col = "red", linewidth = 0.8, lty = 2, formula = y~x+I(x^2), se = FALSE) +
    theme_minimal())

Adding a Curve

Our linear model used a single x-variable: $\mathbb{E}(y) = a + bx$
It is trivial to add additional predictors to a model: $\mathbb{E}(y) = a + b_1x_1 + b_2x_2 + \dots + b_nx_n$
One reason to do this is to add a curve: we could add a new variable called $x^2$

\[ \mathbb{E}(y) = a + b_1x + b_2x^2 \]

Use caution: curves can predict silly things
- does it make sense that height decreases after a certain weight?
This is still a linear model: $\mathbb{E}(y)$ is linear with respect to transformations of x

Interactions

If we multiply predictors, we produce interactions

Important: Always remember main effects if including interactions

\[ \mathbb{E}(height) = a + b_1 \times weight + b_2 \times age + b_3 \times weight \times age \]

Categorical Variables

Categorical variables also fit into this scheme
However, we need to consider how we want to model the categories!
- Not as simple as assigning numbers to each category and creating the model

\[ depth = a + b_1 \times length + b_2 \times species?? \]

Categorical Variables

Categorical variables also fit into this scheme
However, we need to consider how we want to model the categories!
We use dummy coding
First assign a reference category - here we will use Adelie
- The intercept becomes the intercept only for this category
- So Adelie penguins have an intercept = a
Then create binary columns indicating the other two categories
- “Intercept” for each category will be $a + $ some offset
- E.g., for Chinstrap, the intercept is $a + b_2$

penguins$chinstrap = ifelse(penguins$species == 'Chinstrap', 1, 0)
penguins$gentoo = ifelse(penguins$species == 'Gentoo', 1, 0)

\[ depth = a + b_1 \times length + b_2 \times chinstrap + b_3 \times gentoo \]

Categorical Variables

Categorical variables also fit into this scheme
However, we need to consider how we want to model the categories!
We use dummy coding
First assign a reference category - here we will use Adelie
- The intercept becomes the intercept only for this category
- So Adelie penguins have an intercept = a
Then create binary columns indicating the other two categories
- “Intercept” for each category will be $a + $ some offset
- E.g., for Chinstrap, the intercept is $a + b_2$
We can use interactions to give each category it’s own slope

penguins$chinstrap = ifelse(penguins$species == 'Chinstrap', 1, 0)
penguins$gentoo = ifelse(penguins$species == 'Gentoo', 1, 0)

\[ depth = a + b_1 \times length + b_2 \times chinstrap + b_3 \times gentoo + b_4 \times chinstrap \times length + b_5 \times gentoo \times length \]

Matrix Notation

Since our model is a linear system, we can use linear algebra to describe it
This will greatly simplify the Stan code!
X is a matrix of predictors, one row per observation, one column per variable
- Include transformations! If you have $x$ and $x^2$, these are separate columns!
- Dummy coded categories also get one category each except the reference category
B is a vector of parameters, one parameter per variable
XB (using matrix multiplication) returns $b_1x_1 + b_2x_2 + \dots$
Now you can change the predictors in your model without changing the model code!

\[ \eta = \mathbb{E}(y) = a + \mathbf{XB} \]

data {
    int <lower = 1> k; // number of predictor variables
    int <lower = 1> n;
    matrix [n, k] X;
    vector [n] y;
}
parameters {
    real <lower = 0> s;
    vector [k] B;
}
transformed parameters {
    vector eta = a + X * B;
}
model {
    y ~ normal(eta, s);
    s ~ exponential(0.1);
    a ~ normal(0, 10);
    B ~ normal(0, 5); // prior here is vectorised, we use the same for each B!
}

Further generalising our model

A motivating example: Tsuga canadensis mortality in eastern North America.
- Hemlock seems to prefer moderate temperatures and high precipitation.

Research question: Does mortality decrease with increasing precipitation?

Dataset: Talluto et al 2017

tsuga = readRDS("../vu_advstats_students/data/tsuga.rds")
head(tsuga[, 4:8])
##   born died n annual_mean_temp tot_annual_pp
## 1    1    0 0             3.41          1085
## 2    0    3 5             3.85          1003
## 3    3    0 6             3.45          1076
## 4    1    0 3             3.62          1099
## 5    0    4 4             4.60           989
## 6    4    0 3             4.24          1116

Further generalising our model

A motivating example: Tsuga canadensis mortality in eastern North America.
- Hemlock seems to prefer moderate temperatures and high precipitation.

Research question: Does mortality decrease with increasing precipitation?

We can visualise this by computing the proportion of trees that died in each plot.

Dataset: Talluto et al 2017

Mortality in Tsuga

Research question: Does mortality decrease with increasing precipitation?

We already used a binomial likelihood to model mortality. What if we modify our linear model this way?
- Our respnose variable is $d$, the number of trees dying, which must be conditioned on $n$, the number of trees that could die.

Mortality in Tsuga

Research question: Does mortality decrease with increasing precipitation?

We already used a binomial likelihood to model mortality. What if we modify our linear model this way?
- Our respnose variable is $d$, the number of trees dying, which must be conditioned on $n$, the number of trees that could die.
- For the binomial distribution, we already saw that $\mathbb{E}(d|n) = \frac{d}{n} = p$, where $p$ is the probability of an event

Mortality in Tsuga: Graphical model

Research question: Does mortality decrease with increasing precipitation?

We already used a binomial likelihood to model mortality. What if we modify our linear model this way?
- Our respnose variable is $d$, the number of trees dying, which must be conditioned on $n$, the number of trees that could die.
- For the binomial distribution, we already saw that $\mathbb{E}(d|n) = \frac{d}{n} = p$, where $p$ is the probability of an event
- We can simply model this with a linear function as before, right?

\[ \begin{aligned} \eta_i & = \mathbb{E}(d_i|n_i) = a + b \times precip_i \\ d_i & \sim \mathrm{Binomial}(n_i, \eta_i) \end{aligned} \]

Mortality in Tsuga: Stan code

Research question: Does mortality decrease with increasing precipitation? \[ \begin{aligned} \eta_i & = \mathbb{E}(d_i|n_i) = a + b \times precip_i \\ d_i & \sim \mathrm{Binomial}(n_i, \eta_i) \end{aligned} \]

data {
    int <lower = 1> nobs;
    vector [nobs] precip;
    int <lower = 1> n [nobs];
    int <lower = 0> died [nobs];
}
parameters {
    real a;
    real b;
}
transformed parameters {
    vector [nobs] eta = a + b * precip;
}
model {
    died ~ binomial(n, eta);
    a ~ normal(0, 10);
    b ~ normal(0, 10);
}

Mortality in Tsuga: Running the model

# drop NAs from the dataset, drop plots with no trees
tsuga_stan_dat = with(tsuga[n > 0 & complete.cases(tsuga), ], list(
    died = died,
    n = n,
    precip = tot_annual_pp
))
tsuga_stan_dat$nobs = length(tsuga_stan_dat$n)
tsuga_fit1 = sampling(mort_binom1, data = tsuga_stan_dat, refresh = 0)
## Error in eval(expr, envir, enclos) : Initialization failed.
## character(0)
## error occurred during calling the sampler; sampling not done


tsuga_fit1 = sampling(mort_binom1, data = tsuga_stan_dat, refresh = 0, 
                      init = list(list(a = 0.46, b = -2.5e-4)), chains=1)
## Warning: There were 871 divergent transitions after warmup. See
## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: The largest R-hat is 1.51, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#r-hat
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#tail-ess

Improving the fit

Research question: Does mortality decrease with increasing precipitation?

Problem: This system cannot be linear!

\[ \begin{aligned} \eta_i & = \mathbb{E}(d_i|n_i) = a + b \times precip_i \\ d_i & \sim \mathrm{Binomial}(n_i, \eta_i) \end{aligned} \]

Improving the fit: adding a link function

Research question: Does mortality decrease with increasing precipitation?

Problem: This system cannot be linear!

Solution: Transform the linear equation to fit the response using a link function

\[ \begin{aligned} \eta & = a + b \times precip \\ \mathcal{f}(p) & = \mathcal{f}(\mathbb{E}[d|n]) = \eta & \mathrm{or} \\ p & = \mathbb{E}[d|n] = \mathcal{f}^{-1}(\eta) \\ d & \sim \mathrm{Binomial}(n, p) \end{aligned} \]

Improving the fit: adding a link function

Research question: Does mortality decrease with increasing precipitation?

Problem: This system cannot be linear!

Solution: Transform the linear equation to fit the response using a link function

Here we need a sigmoid function. A common one for binomial problems is the log odds, or logit function.

\[ \begin{aligned} \ln \frac{p}{1-p} & = \eta = a + b \times precip \\ p & = \frac{e^{\eta}}{1 + e^{\eta}} & \\ d & \sim \mathrm{Binomial}(n, p) \end{aligned} \]

Improving the fit: Stan round 2

Research question: Does mortality decrease with increasing precipitation? \[ \begin{aligned} \ln \frac{p}{1-p} & = \eta = a + b \times precip \\ p & = \frac{e^{\eta}}{1 + e^{\eta}} & \\ d & \sim \mathrm{Binomial}(n, p) \end{aligned} \]

data {
    int <lower = 1> nobs;
    vector [nobs] precip;
    int <lower = 1> n [nobs];
    int <lower = 0> died [nobs];
}
parameters {
    real a;
    real b;
}
transformed parameters {
    vector [nobs] eta = a + b * precip;
    vector <lower = 0, upper = 1> [nobs] p = inv_logit(eta);
}
model {
    died ~ binomial(n, p);
    a ~ normal(0, 10);
    b ~ normal(0, 10);
}

Improving the fit: Stan round 2

tsuga_fit2 = sampling(mort_binom2, data = tsuga_stan_dat, refresh = 0)

The Generalised Linear Model

For an observed variable $y$ and a matrix of predictors $\mathbf{X}$:

We can define the conditional expectation of $y$:

\[ \mathbb{E}(y | \mathbf{X}) = \xi \]

The Generalised Linear Model

For an observed variable $y$ and a matrix of predictors $\mathbf{X}$:

We can define the conditional expectation of $y$.
This expectation is connected to a linear predictor via a link function $\mathcal{f}$
- With optional additional parameters $\theta_{\mathcal{f}}$

\[ \begin{aligned} \mathbb{E}(y | \mathbf{X}) & = \xi \\ \mathcal{f}(\xi, \theta_{\mathcal{f}}) & = \eta \\ \eta & = \mathbf{XB}\\ \end{aligned} \]

The Generalised Linear Model

For an observed variable $y$ and a matrix of predictors $\mathbf{X}$:

We can define the conditional expectation of $y$.
This expectation is connected to a linear predictor via a link function $\mathcal{f}$
- With optional additional parameters $\theta_{\mathcal{f}}$
Randomness in our observation of $y$ is generated by some distribution function $\mathcal{D}$
- $\mathcal{D}$ has optional free parameters $\theta_{\mathcal{D}}$
- Often $\theta_{\mathcal{D}}$ will include dispersion parameters

\[ \begin{aligned} \mathbb{E}(y | \mathbf{X}) & = \xi \\ \mathcal{f}(\xi, \theta_{\mathcal{f}}) & = \eta \\ \eta & = \mathbf{XB}\\ \\ y & \sim \mathcal{D}(\xi, \theta_{\mathcal{D}}) \end{aligned} \]

Example: The Gaussian model

For an observed variable $y$ and a matrix of predictors $\mathbf{X}$:

We can define the conditional expectation of $y$.
The link function is the identity function
- Which has no additional parameters ($\theta_{\mathcal{f}} = \varnothing$)
Randomness in our observation of $y$ is generated by the Gaussian (i.e., Normal) PDF
- Which has the additional free standard deviation parameter $s$: $\theta_{\mathcal{D}} = \{s\}$

This is identical to our linear model from earlier! Multiple regression is a subset of the GLM.

\[ \begin{aligned} \mathbb{E}(y | \mathbf{X}) & = \xi \\ \mathcal{I}(\xi) & = \eta \\ \eta & = \mathbf{XB} \\ & \therefore\\ \mathbb{E}(y | \mathbf{X}) & = \mathbf{XB}\\ \\ y & \sim \mathcal{N}(\mathbf{XB}, s) \end{aligned} \]

Count Data: Poisson

For counts where the number of trials is unkown or nonsensical, where we speak of a rate of occurrence rather than probability
- Number of species/indiviuduals in a plot
- Incidence of a disease with an (effectively) infinite population
The expectation $\lambda$ is the rate of observation
Log link function

\[ \begin{aligned} \lambda & = \exp(a + \mathbf{BX}) \\ y & \sim \mathrm{Poisson}(\lambda) \end{aligned} \]

Assumption: $\sigma^2_{y|\mathbf{X}} = \mathbb{E}(y|\mathbf{X}) = \lambda$

Sometimes, exposure varies by observation.
- Different size plots
- Different observation times
- We add an exposure variable in this case, u
- In traditional modelling, $\log u$ is called the offset

\[ y \sim \mathrm{Poisson}(u\lambda) \]

Overdispersed counts: Negative Binomial

For counts where the mean and variance are not equal
Typical in ecological count data (e.g., abundances)

\[ \begin{aligned} \xi & = \exp(a + \mathbf{BX}) \\ y & \sim \mathrm{NB}(\xi, \phi) \end{aligned} \]

$\phi$ is called the dispersion parameter, and is related to the variance:

\[ \sigma^2_{y|\mathbf{X}} = \xi + \frac{\xi^2}{\phi} \]

Proportions: Beta

When we have true proportions (i.e., a number of trials is unavailable)
- Proportion cover
Also used for Binomial problems where the number of trials is missing/unrecorded
If number of trials is available, always prefer Binomial GLM

\[ \begin{aligned} \xi & = \mathrm{logit}^{-1}(a_\rho + \mathbf{B_\rho X_\rho}) \\ \phi & = \exp(a_\phi + \mathbf{B_\phi X_\phi}) & \mathrm{\small optionally} \\ \alpha & = \xi\phi \\ \beta & = (1 - \xi)\phi \\ y & \sim \mathrm{Beta}(\alpha, \beta) \end{aligned} \]

Caveats & Hints

$\phi$ is the precision; $\sigma^2_{y|\mathbf{X}} = \frac{\xi(1-\xi)}{\phi + 1}$
All observations of $y$ must be on (0, 1). Values of exactly 0 or 1 are not allowed
Finite mixtures can be used if the data contain 0s and 1s.
$\alpha$: expected number of successes when sampling $\alpha + \beta$ trials
$\beta$: expected number of failures

Continuous, strictly positive: Gamma

Often used for costs, waiting times
- ecosystem services modelling
- animal movement
- life expectancy
Generally assume the coefficient of variation is constant
- can relax this assumption by modelling $\phi$ as a function of covariates

\[ \begin{aligned} \xi & = \frac{1}{(a + \mathbf{B X})} & OR \\ \xi & = \exp(a + \mathbf{B X}) \\ \\ \alpha & = \frac{\xi^2}{\phi} \\ \\ \beta & = \frac{\xi}{\phi} \\ \\ y & \sim \mathrm{Gamma}(\alpha, \beta) \end{aligned} \]

Regression & the Generalised Linear Model

Regression problems

Regression problems

Regression problems

Regression problems

Regression problems

Graphing the model

Graphing the model

Using the graph

Using the graph

Using the graph

Using the graph

Using the graph

What to put where??

Fitting to the Penguins

Fitting to the Penguins

More linear models

Adding a Curve

Adding a Curve

Interactions

Categorical Variables

Categorical Variables

Categorical Variables

Matrix Notation

Further generalising our model

Further generalising our model

Mortality in Tsuga

Mortality in Tsuga

Mortality in Tsuga: Graphical model

Mortality in Tsuga: Stan code

Mortality in Tsuga: Running the model

Improving the fit

Improving the fit: adding a link function

Improving the fit: adding a link function

Improving the fit: Stan round 2

Improving the fit: Stan round 2

The Generalised Linear Model

The Generalised Linear Model

The Generalised Linear Model

Example: The Gaussian model

Count Data: Poisson

Overdispersed counts: Negative Binomial

Proportions: Beta

Caveats & Hints

Continuous, strictly positive: Gamma