Skip to contents

We denote by the raw empirical moment by mj=1ni=1nxij, m_j = \frac1n \sum_{i=1}^n x_i^j, by the centered empirical moment by μj=1ni=1n(xijm1). \mu_j = \frac1n \sum_{i=1}^n (x_i^j-m_1). Starting values are computed in R/util-startarg.R. We give below the starting values for discrete and continuous distributions and refer to the bibliograhy sections for details.

1. Discrete distributions

1.1. Base R distribution

1.1.1. Geometric distribution

The MME is used p̂=1/(1+m1)\hat p=1/(1+m_1).

1.1.2. Negative binomial distribution

The MME is used n̂=m12/(μ2m1)\hat n = m_1^2/(\mu_2-m_1).

1.1.3. Poisson distribution

Both the MME and the MLE is λ̂=m1\hat \lambda = m_1.

1.1.4. Binomial distribution

The MME is used Var[X]/E[X]=1pp̂=1μ2/m1. Var[X]/E[X] = 1-p \Rightarrow \hat p = 1- \mu_2/m_1. the size parameter is n̂=max(maxixi,m1/p̂). \hat n = \lceil\max(\max_i x_i, m_1/\hat p)\rceil.

1.2. logarithmic distribution

The expectation simplifies for small values of ppE[X]=1log(1p)p1p1pp1p=11p. E[X] = -\frac{1}{\log(1-p)}\frac{p}{1-p} \approx -\frac{1}{-p}\frac{p}{1-p} =\frac{1}{1-p}. So the initial estimate is p̂=11/m1. \hat p = 1-1/m_1.

1.3. Zero truncated distributions

This distribution are the distribution of X|X>0X\vert X>0 when XX follows a particular discrete distributions. Hence the initial estimate are the one used for base R on sample x1x-1.

1.4. Zero modified distributions

The MLE of the probability parameter is the empirical mass at 0 p̂0=1ni1xi=0\hat p_0=\frac1n \sum_i 1_{x_i=0}. For other estimators we use the classical estimator with probability parameter 1p̂01-\hat p_0.

1.5. Poisson inverse Gaussian distribution

The first two moments are E[X]=μ,Var[X]=μ+ϕμ3. E[X]=\mu, Var[X] = \mu+\phi\mu^3. So the initial estimate are μ̂=m1,ϕ̂=(μ2m1)/m13. \hat\mu=m_1, \hat\phi = (\mu_2 - m_1)/m_1^3.

2. Continuous distributions

2.1. Normal distribution

The MLE is the MME so we use the empirical mean and variance.

2.2. Lognormal distribution

The log sample follows a normal distribution, so same as normal on the log sample.

2.3. Beta distribution (of the first kind)

The density function for a beta e(a,b)\mathcal Be(a,b) is fX(x)=Γ(a)Γ(b)Γ(a+b)xa1(1x)b1. f_X(x) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)} x^{a-1}(1-x)^{b-1}. The initial estimate is the MME â=m1δ,b̂=(1m1)δ,δ=m1(1m1)μ21,(#eq:betaguessestimator)\begin{equation} \hat a = m_1 \delta, \hat b = (1-m_1)\delta, \delta = \frac{m_1(1-m_1)}{\mu_2}-1, (\#eq:betaguessestimator) \end{equation}

2.4. Other continuous distribution in actuar

2.4.1. Log-gamma

Use the gamma initial values on the sample log(x)\log(x)

2.4.2. Gumbel

The distribution function is F(x)=exp(exp(xαθ)). F(x) = \exp(-\exp(-\frac{x-\alpha}{\theta})). Let q1q_1 and q3q_3 the first and the third quartiles. $$ \left\{\begin{array} -\theta\log(-\log(p_1)) = q_1-\alpha \\ -\theta\log(-\log(p_3)) = q_3-\alpha \end{array}\right. \Leftrightarrow \left\{\begin{array} -\theta\log(-\log(p_1))+\theta\log(-\log(p_3)) = q_1-q_3 \\ \alpha= \theta\log(-\log(p_3)) + q_3 \end{array}\right. \Leftrightarrow \left\{\begin{array} \theta= \frac{q_1-q_3}{\log(-\log(p_3)) - \log(-\log(p_1))} \\ \alpha= \theta\log(-\log(p_3)) + q_3 \end{array}\right.. $$ Using the median for the location parameter α\alpha yields to initial estimate θ̂=q1q3log(log(4/3))log(log(4)),α̂=θ̂log(log(2))+q2. \hat\theta= \frac{q_1-q_3}{\log(\log(4/3)) - \log(\log(4))}, \hat\alpha = \hat\theta\log(\log(2)) + q_2.

2.4.3. Inverse Gaussian distribution

The moments of this distribution are E[X]=μ,Var[X]=μ3ϕ. E[X] = \mu, Var[X] = \mu^3\phi. Hence the initial estimate are μ̂=m1\hat\mu=m_1, ϕ̂=μ2/m13\hat\phi=\mu_2/m_1^3.

2.4.4. Generalized beta

This is the distribution of θX1/τ\theta X^{1/\tau} when XX is beta distributed e(a,b)\mathcal Be(a,b) The moments are E[X]=θβ(a+1/τ,b)/β(a,b)=θΓ(a+1/τ)Γ(a)Γ(a+b)Γ(a+b+1/τ), E[X] = \theta \beta(a+1/\tau, b)/\beta(a,b) = \theta \frac{\Gamma(a+1/\tau)}{\Gamma(a)}\frac{\Gamma(a+b)}{\Gamma(a+b+1/\tau)}, E[X2]=θ2Γ(a+2/τ)Γ(a)Γ(a+b)Γ(a+b+2/τ). E[X^2] = \theta^2 \frac{\Gamma(a+2/\tau)}{\Gamma(a)}\frac{\Gamma(a+b)}{\Gamma(a+b+2/\tau)}. Hence for large value of τ\tau, we have E[X2]/E[X]=θΓ(a+2/τ)Γ(a+b+2/τ)Γ(a+b+1/τ)Γ(a+1/τ)θ. E[X^2] /E[X] = \theta \frac{\Gamma(a+2/\tau)}{\Gamma(a+b+2/\tau)} \frac{\Gamma(a+b+1/\tau)}{\Gamma(a+1/\tau)} \approx \theta. Note that the MLE of θ\theta is the maximum We use τ̂=3,θ̂=m2m1maxixi1m2>m1+m1m2maxixi1m2m1. \hat\tau=3, \hat\theta = \frac{m_2}{m_1}\max_i x_i 1_{m_2>m_1} +\frac{m_1}{m_2}\max_i x_i 1_{m_2\geq m_1}. then we use beta initial estimate on sample (xiθ̂)τ̂(\frac{x_i}{\hat\theta})^{\hat\tau}.

2.5. Feller-Pareto family

The Feller-Pareto distribution is the distribution X=μ+θ(1/B1)1/γX=\mu+\theta(1/B-1)^{1/\gamma} when BB follows a beta distribution with shape parameters α\alpha and τ\tau. See details at https://doi.org/10.18637/jss.v103.i06 Hence let Y=(Xμ)/θY = (X-\mu)/\theta, we have Y1+Y=Xμθ+Xμ=(1B)1/γ. \frac{Y}{1+Y} = \frac{X-\mu}{\theta+X-\mu} = (1-B)^{1/\gamma}. For γ\gamma close to 1, Y1+Y\frac{Y}{1+Y} is approximately beta distributed τ\tau and α\alpha.

The log-likelihood is (μ,θ,α,γ,τ)=(τγ1)ilog(xiμθ)(α+τ)ilog(1+(xiμθ)γ)+nlog(γ)nlog(θ)nlog(β(α,τ)).(#eq:fellerparetologlik).\begin{equation} \mathcal L(\mu, \theta, \alpha, \gamma, \tau) = (\tau \gamma - 1) \sum_{i} \log(\frac{x_i-\mu}\theta) - (\alpha+\tau)\sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma) + n\log(\gamma) - n\log(\theta) -n \log(\beta(\alpha,\tau)). (\#eq:fellerparetologlik). \end{equation} The MLE of μ\mu is the minimum.

The gradient with respect to θ,α,γ,τ\theta, \alpha, \gamma, \tau is (μ,θ,α,γ,τ)=((τγ1)ixiθ(xiμ)+(α+τ)ixiγ(xiμθ)γ1θ2(1+(xiμθ)γ)n/θilog(1+(xiμθ)γ)n(ψ(τ)ψ(α+τ))(τ1)ilog(xiμθ)(α+τ)i(xiμθ)γ1+(xiμθ)γlog(xiμθ)+n/γ(γ1)ilog(xiμθ)ilog(1+(xiμθ)γ)n(ψ(τ)ψ(α+τ))).(#eq:fellerparetogradient)\begin{equation} \nabla \mathcal L(\mu, \theta, \alpha, \gamma, \tau) = \begin{pmatrix} -(\tau \gamma - 1) \sum_{i} \frac{x_i}{\theta(x_i-\mu)} + (\alpha+\tau)\sum_i \frac{x_i\gamma(\frac{x_i-\mu}\theta)^{\gamma-1}}{\theta^2(1+(\frac{x_i-\mu}\theta)^\gamma)} - n/\theta \\ - \sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma) -n(\psi(\tau) - \psi(\alpha+\tau)) \\ (\tau - 1) \sum_{i} \log(\frac{x_i-\mu}\theta) - (\alpha+\tau)\sum_i \frac{(\frac{x_i-\mu}\theta)^\gamma}{ 1+(\frac{x_i-\mu}\theta)^\gamma}\log(\frac{x_i-\mu}\theta) + n/\gamma \\ (\gamma - 1) \sum_{i} \log(\frac{x_i-\mu}\theta) - \sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma) -n (\psi(\tau) - \psi(\alpha+\tau)) \end{pmatrix}. (\#eq:fellerparetogradient) \end{equation} Cancelling the first component of score for γ=α=2\gamma=\alpha=2, we get (2τ1)ixiθ(xiμ)+(2+τ)ixi2(xiμ)θ3(1+(xiμθ)2)=nθ(2τ1)θ21nixixiμ+(2+τ)1nixi2(xiμ)(1+(xiμθ)2)=θ2 -(2\tau - 1) \sum_{i} \frac{x_i}{\theta(x_i-\mu)} + (2+\tau)\sum_i \frac{x_i 2(x_i-\mu)}{\theta^3(1+(\frac{x_i-\mu}\theta)^2)} = \frac{n}{\theta} \Leftrightarrow -(2\tau - 1)\theta^2\frac1n \sum_{i} \frac{x_i}{x_i-\mu} + (2+\tau) \frac1n\sum_i \frac{x_i 2(x_i-\mu)}{(1+(\frac{x_i-\mu}\theta)^2)} = \theta^2 (2+τ)1nixi2(xiμ)1+(xiμθ)2=(2τ1)θ2(1nixixiμ1)(2+τ)1nixi2(xiμ)1+(xiμθ)2(2τ1)(1nixixiμ1)=θ. \Leftrightarrow (2+\tau) \frac1n\sum_i \frac{x_i 2(x_i-\mu)}{1+(\frac{x_i-\mu}\theta)^2} = (2\tau - 1)\theta^2\left(\frac1n \sum_{i} \frac{x_i}{x_i-\mu} -1\right) \Leftrightarrow \sqrt{ \frac{(2+\tau) \frac1n\sum_i \frac{x_i 2(x_i-\mu)}{1+(\frac{x_i-\mu}\theta)^2} }{(2\tau - 1)\left(\frac1n \sum_{i} \frac{x_i}{x_i-\mu} -1\right)} } = \theta. Neglecting unknown value of τ\tau and the denominator in θ\theta, we get with μ̂\hat\mu set with (@ref(eq:pareto4muinit)) θ̂=1nixi2(xiμ̂)1+(xiμ̂)2(1nixixiμ̂1).(#eq:fellerparetothetahat)\begin{equation} \hat\theta = \sqrt{ \frac{ \frac1n\sum_i \frac{x_i 2(x_i-\hat\mu)}{1+(x_i-\hat\mu)^2} }{\left(\frac1n \sum_{i} \frac{x_i}{x_i-\hat\mu} -1\right)} }. (\#eq:fellerparetothetahat) \end{equation} Initial value of τ,α\tau,\alpha are obtained on the sample (zi)i(z_i)_izi=yi/(1+yi),yi=(xiμ̂)/θ̂, z_i = y_i/(1+y_i), y_i = (x_i - \hat\mu)/\hat\theta, with initial values of a beta distribution which is based on MME (@ref(eq:betaguessestimator)).

Cancelling the last component of the gradient leads to (γ1)1nilog(xiμθ)1nilog(1+(xiμθ)γ)=ψ(τ)ψ(α+τ)(γ1)1nilog(xiμθ)=ψ(τ)ψ(α+τ)+1nilog(1+(xiμθ)γ). (\gamma - 1) \frac1n\sum_{i} \log(\frac{x_i-\mu}\theta) - \frac1n\sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma) = \psi(\tau) - \psi(\alpha+\tau) \Leftrightarrow (\gamma - 1) \frac1n\sum_{i} \log(\frac{x_i-\mu}\theta) = \psi(\tau) - \psi(\alpha+\tau) +\frac1n\sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma) . Neglecting the value γ\gamma on the right-hand side we obtain γ̂=1+ψ(τ)ψ(α+τ)+1nilog(1+(xiμθ))1nilog(xiμθ).(#eq:fellerparetogammahat)\begin{equation} \hat\gamma = 1+ \frac{ \psi(\tau) - \psi(\alpha+\tau) +\frac1n\sum_i \log(1+(\frac{x_i-\mu}\theta)) }{ \frac1n\sum_{i} \log(\frac{x_i-\mu}\theta) }. (\#eq:fellerparetogammahat) \end{equation}

2.5.1. Transformed beta

This is the Feller-Pareto with μ=0\mu=0. So the first component of @ref(eq:fellerparetogradient) simplifies to with γ=α=2\gamma=\alpha=2(2τ1)ixiθ(xi)+(2+τ)i2xi2θ3(1+(xiθ)2)=nθ(2τ1)θ2+(2+τ)1ni2xi21+(xiθ)2=θ2 -(2\tau - 1) \sum_{i} \frac{x_i}{\theta(x_i)} + (2+\tau)\sum_i \frac{2x_i^2}{\theta^3(1+(\frac{x_i}\theta)^2)} = \frac{n}{\theta} \Leftrightarrow -(2\tau - 1) \theta^2 + (2+\tau)\frac1n\sum_i \frac{2x_i^2}{1+(\frac{x_i}\theta)^2} = \theta^2 θ2=2+τ2τ1ni2xi21+(xiθ)2. \theta^2=\frac{2+\tau}{2\tau}\frac1n\sum_i \frac{2x_i^2}{1+(\frac{x_i}\theta)^2}. Neglecting unknown value of τ\tau in the denominator in θ\theta, we get θ̂=1ni2xi21+xi2.(#eq:trbetathetahat)\begin{equation} \hat\theta = \sqrt{ \frac1n\sum_i \frac{2x_i^2}{1+x_i^2} }. (\#eq:trbetathetahat) \end{equation} Initial value of τ,α\tau,\alpha are obtained on the sample (zi)i(z_i)_izi=yi/(1+yi),yi=xi/θ̂, z_i = y_i/(1+y_i), y_i = x_i/\hat\theta, with initial values of a beta distribution which is based on MME (@ref(eq:betaguessestimator)). Similar to Feller-Pareto, we set γ̂=1+ψ(τ)ψ(α+τ)+1nilog(1+xiθ)1nilog(xiθ).(#eq:fellerparetogammahat)\begin{equation} \hat\gamma = 1+ \frac{ \psi(\tau) - \psi(\alpha+\tau) +\frac1n\sum_i \log(1+\frac{x_i}\theta) }{ \frac1n\sum_{i} \log(\frac{x_i}\theta) }. (\#eq:fellerparetogammahat) \end{equation}

2.5.2. Generalized Pareto

This is the Feller-Pareto with μ=0\mu=0γ=1\gamma=1. So the first component of @ref(eq:fellerparetogradient) simplifies to with γ=2\gamma=2(τ1)nθ+(2+τ)ixiθ2(1+xiθ=n/θ(τ1)θ+(2+τ)1nixi(1+xiθ=θ. -(\tau - 1) \frac{n}{\theta} + (2+\tau)\sum_i \frac{x_i}{\theta^2(1+\frac{x_i}\theta} = n/\theta \Leftrightarrow -(\tau - 1) \theta + (2+\tau)\frac1n\sum_i \frac{x_i}{(1+\frac{x_i}\theta} = \theta. Neglecting unknown value of τ\tau leads to θ̂=1nixi1+xi(#eq:generalizedparetotheta)\begin{equation} \hat\theta = \frac1n\sum_i \frac{x_i}{1+x_i} (\#eq:generalizedparetotheta) \end{equation}

Initial value of τ,α\tau,\alpha are obtained on the sample (zi)i(z_i)_izi=yi/(1+yi),yi=xi/θ̂, z_i = y_i/(1+y_i), y_i = x_i/\hat\theta, with initial values of a beta distribution which is based on MME (@ref(eq:betaguessestimator)).

2.5.3. Burr

Burr is a Feller-Pareto distribution with μ=0\mu=0, τ=1\tau=1.

The survival function is 1F(x)=(1+(x/θ)γ)α. 1-F(x) = (1+(x/\theta)^\gamma)^{-\alpha}. Using the median q2q_2, we have log(1/2)=αlog(1+(q2/θ)γ). \log(1/2) = - \alpha \log(1+(q_2/\theta)^\gamma). The initial value is α=log(2)log(1+(q2/θ)γ),(#eq:burralpharelation)\begin{equation} \alpha = \frac{\log(2)}{\log(1+(q_2/\theta)^\gamma)}, (\#eq:burralpharelation) \end{equation}

So the first component of @ref(eq:fellerparetogradient) simplifies to with γ=α=2\gamma=\alpha=2, τ=1\tau=1, μ=0\mu=0. n/θ+3i2xi(xiθ)θ2(1+(xiθ)2)=n/θθ21ni2xi(xiθ)(1+(xiθ)2)=2/3. - n/\theta + 3\sum_i \frac{2x_i(\frac{x_i}\theta)}{\theta^2(1+(\frac{x_i}\theta)^2)} = n/\theta \Leftrightarrow \theta^2\frac1n\sum_i \frac{2x_i(\frac{x_i}\theta)}{(1+(\frac{x_i}\theta)^2)} = 2/3. Neglecting unknown value in the denominator in θ\theta, we get θ̂=231ni2xi21+(xi)2.(#eq:trbetathetahat)\begin{equation} \hat\theta = \sqrt{ \frac{2}{3 \frac1n\sum_i \frac{2x_i^2}{1+(x_i)^2} } }. (\#eq:trbetathetahat) \end{equation} We use for γ̂\hat\gamma @ref(eq:fellerparetogammahat) with τ=1\tau=1 and α=2\alpha=2 and previous θ̂\hat\theta.

2.5.4. Loglogistic

Loglogistic is a Feller-Pareto distribution with μ=0\mu=0, τ=1\tau=1, α=1\alpha=1. The survival function is 1F(x)=(1+(x/θ)γ)1. 1-F(x) = (1+(x/\theta)^\gamma)^{-1}. So 11F(x)1=(x/θ)γlog(F(x)1F(x))=γlog(x/θ). \frac1{1-F(x)}-1 = (x/\theta)^\gamma \Leftrightarrow \log(\frac{F(x)}{1-F(x)}) = \gamma\log(x/\theta). Let q1q_1 and q3q_3 be the first and the third quartile. log(1/32/3)=γlog(q1/θ),log(2/31/3)=γlog(q3/θ)log(2)=γlog(q1/θ),log(2)=γlog(q3/θ). \log(\frac{1/3}{2/3})= \gamma\log(q_1/\theta), \log(\frac{2/3}{1/3})= \gamma\log(q_3/\theta) \Leftrightarrow -\log(2)= \gamma\log(q_1/\theta), \log(2)= \gamma\log(q_3/\theta). The difference of previous equations simplifies to γ̂=2log(2)log(q3/q1). \hat\gamma=\frac{2\log(2)}{\log(q_3/q_1)}. The sum of previous equations 0=γlog(q1)+γlog(q3)2γlog(θ). 0 = \gamma\log(q_1)+\gamma\log(q_3) - 2\gamma\log(\theta). θ̂=12elog(q1q3).(#eq:llogisthetahat)\begin{equation} \hat\theta = \frac12 e^{\log(q_1q_3)}. (\#eq:llogisthetahat) \end{equation}

2.5.5. Paralogistic

Paralogistic is a Feller-Pareto distribution with μ=0\mu=0, τ=1\tau=1, α=γ\alpha=\gamma. The survival function is 1F(x)=(1+(x/θ)α)α. 1-F(x) = (1+(x/\theta)^\alpha)^{-\alpha}. So log(1F(x))=αlog(1+(x/θ)α). \log(1-F(x)) = -\alpha \log(1+(x/\theta)^\alpha). The log-likelihood is (θ,α)=(α1)ilog(xiθ)(α+1)ilog(1+(xiθ)α)+2nlog(α)nlog(θ).(#eq:paralogisloglik)\begin{equation} \mathcal L(\theta, \alpha) = ( \alpha - 1) \sum_{i} \log(\frac{x_i}\theta) - (\alpha+1)\sum_i \log(1+(\frac{x_i}\theta)^\alpha) + 2n\log(\alpha) - n\log(\theta). (\#eq:paralogisloglik) \end{equation} The gradient with respect to θ\theta, α\alpha is ((α1)nθ(α+1)ixiα(xi/θ)α11+(xiθ)αn/θilog(xiθ1+(xiθ)α)(α+1)i(xiθ)αlog(xi/θ)1+(xiθ)α+2n/α). \begin{pmatrix} ( \alpha - 1)\frac{-n}{\theta} - (\alpha+1)\sum_i \frac{-x_i\alpha(x_i/\theta)^{\alpha-1}}{1+(\frac{x_i}\theta)^\alpha} - n/\theta \\ \sum_{i} \log(\frac{ \frac{x_i}\theta}{1+(\frac{x_i}\theta)^\alpha }) - (\alpha+1)\sum_i \frac{(\frac{x_i}\theta)^\alpha \log(x_i/\theta)}{1+(\frac{x_i}\theta)^\alpha} + 2n/\alpha \\ \end{pmatrix}. The first component cancels when (α+1)ixiα(xi/θ)α11+(xiθ)α=αn/θ(α+1)1ni(xi)α+11+(xiθ)α=θα. - (\alpha+1)\sum_i \frac{-x_i\alpha(x_i/\theta)^{\alpha-1}}{1+(\frac{x_i}\theta)^\alpha} = \alpha n/\theta \Leftrightarrow (\alpha+1)\frac1n\sum_i \frac{ (x_i)^{\alpha+1}}{1+(\frac{x_i}\theta)^\alpha} = \theta^\alpha. The second component cancels when 1nilog(xiθ1+(xiθ)α)=2/α+(α+1)1ni(xiθ)αlog(xi/θ)1+(xiθ)α. \frac1n\sum_{i} \log(\frac{ \frac{x_i}\theta}{1+(\frac{x_i}\theta)^\alpha }) = -2/\alpha +(\alpha+1)\frac1n\sum_i \frac{(\frac{x_i}\theta)^\alpha \log(x_i/\theta)}{1+(\frac{x_i}\theta)^\alpha}. Choosing θ=1\theta=1, α=2\alpha=2 in sums leads to 1nilog(xiθ1+xi2)1nixi2log(xi)1+xi2=2/α+(α)1nixi2log(xi)1+xi2. \frac1n\sum_{i} \log(\frac{ \frac{x_i}\theta}{1+x_i^2 }) - \frac1n\sum_i \frac{x_i^2\log(x_i)}{1+x_i^2} = -2/\alpha +(\alpha)\frac1n\sum_i \frac{x_i^2\log(x_i)}{1+x_i^2}. Initial estimators are α̂=1nilog(xi1+xi2)1nixi2log(xi)1+xi21nixi2log(xi)1+xi22,(#eq:paralogisalphahat)\begin{equation} \hat\alpha = \frac{ \frac1n\sum_{i} \log(\frac{ x_i}{1+x_i^2 }) - \frac1n\sum_i \frac{x_i^2\log(x_i)}{1+x_i^2} }{ \frac1n\sum_i \frac{x_i^2\log(x_i)}{1+x_i^2} - 2 }, (\#eq:paralogisalphahat) \end{equation}θ̂=(α̂+1)1ni(xi)α̂+11+(xi)α̂.(#eq:paralogisthetahat)\begin{equation} \hat\theta = (\hat\alpha+1)\frac1n\sum_i \frac{ (x_i)^{\hat\alpha+1}}{1+(x_i)^{\hat\alpha}}. (\#eq:paralogisthetahat) \end{equation}

2.5.6. Inverse Burr

Use Burr estimate on the sample 1/x1/x

2.5.7. Inverse paralogistic

Use paralogistic estimate on the sample 1/x1/x

2.5.8. Inverse pareto

Use pareto estimate on the sample 1/x1/x

2.5.9. Pareto IV

The survival function is 1F(x)=(1+(xμθ)γ)α, 1-F(x) = \left(1+ \left(\frac{x-\mu}{\theta}\right)^{\gamma} \right)^{-\alpha}, see ?Pareto4 in actuar.

The first and third quartiles q1q_1 and q3q_3 verify ((34)1/α1)1/γ=q1μθ,((14)1/α1)1/γ=q3μθ. ((\frac34)^{-1/\alpha}-1)^{1/\gamma} = \frac{q_1-\mu}{\theta}, ((\frac14)^{-1/\alpha}-1)^{1/\gamma} = \frac{q_3-\mu}{\theta}. Hence we get two useful relations γ=log((43)1/α1(4)1/α1)log(q1μq3μ),(#eq:pareto4gammarelation)\begin{equation} \gamma = \frac{ \log\left( \frac{ (\frac43)^{1/\alpha}-1 }{ (4)^{1/\alpha}-1 } \right) }{ \log\left(\frac{q_1-\mu}{q_3-\mu}\right) }, (\#eq:pareto4gammarelation) \end{equation}θ=q1q3((43)1/α1)1/γ((4)1/α1)1/γ.(#eq:pareto4thetarelation)\begin{equation} \theta = \frac{q_1- q_3 }{ ((\frac43)^{1/\alpha}-1)^{1/\gamma} - ((4)^{1/\alpha}-1)^{1/\gamma} }. (\#eq:pareto4thetarelation) \end{equation}

The log-likelihood of a Pareto 4 sample (see Equation (5.2.94) of Arnold (2015) updated with Goulet et al. notation) is (μ,θ,γ,α)=(γ1)ilog(xiμθ)(α+1)ilog(1+(xiμθ)γ)+nlog(γ)nlog(θ)+nlog(α). \mathcal L(\mu,\theta,\gamma,\alpha) = (\gamma -1) \sum_i \log(\frac{x_i-\mu}{\theta}) -(\alpha+1)\sum_i \log(1+ (\frac{x_i-\mu}{\theta})^{\gamma}) +n\log(\gamma) -n\log(\theta)+n\log(\alpha). Cancelling the derivate of (μ,θ,γ,α)\mathcal L(\mu,\theta,\gamma,\alpha) with respect to α\alpha leads to α=n/ilog(1+(xiμθ)γ).(#eq:pareto4alpharelation)\begin{equation} \alpha =n/\sum_i \log(1+ (\frac{x_i-\mu}{\theta})^{\gamma}). (\#eq:pareto4alpharelation) \end{equation}

The MLE of the threshold parameter μ\mu is the minimum. So the initial estimate is slightly under the minimum in order that all observations are strictly above it μ̂={(1ϵ)minixiif minixi<0(1+ϵ)minixiif minixi0.(#eq:pareto4muinit)\begin{equation} \hat\mu = \left\{ \begin{array}{ll} (1-\epsilon) \min_i x_i & \text{if } \min_i x_i <0 \\ (1+\epsilon)\min_i x_i & \text{if } \min_i x_i \geq 0 \\ \end{array} \right. . (\#eq:pareto4muinit) \end{equation} where ϵ=0.05\epsilon=0.05.

Initial parameter estimation is μ̂\hat\mu, α=2\alpha^\star = 2 , γ̂\hat\gamma from @ref(eq:pareto4gammarelation) with α\alpha^\star, θ̂\hat\theta from @ref(eq:pareto4thetarelation) with α\alpha^\star and γ̂\hat\gamma, α̂\hat\alpha from @ref(eq:pareto4alpharelation) with μ̂\hat\mu, θ̂\hat\theta and γ̂\hat\gamma.

2.5.10. Pareto III

Pareto III corresponds to Pareto IV with α=1\alpha=1. γ=log(43141)log(q1μq3μ),\begin{equation} \gamma = \frac{ \log\left( \frac{ \frac43-1 }{ 4-1 } \right) }{ \log\left(\frac{q_1-\mu}{q_3-\mu}\right) }, \label{eq:pareto3:gamma:relation} \end{equation}

θ=(13)1/γ(3)1/γq1q3.\begin{equation} \theta = \frac{ (\frac13)^{1/\gamma} - (3)^{1/\gamma} }{q_1- q_3 }. \label{eq:pareto3:theta:relation} \end{equation}

Initial parameter estimation is μ̂\hat\mu, γ̂\hat\gamma from , θ̂\hat\theta from with γ̂\hat\gamma.

2.5.11. Pareto II

Pareto II corresponds to Pareto IV with γ=1\gamma=1.

θ=(43)1/α41/αq1q3.\begin{equation} \theta = \frac{ (\frac43)^{1/\alpha} - 4^{1/\alpha} }{q_1- q_3 }. \label{eq:pareto2:theta:relation} \end{equation}

Initial parameter estimation is μ̂\hat\mu, α=2\alpha^\star = 2 , θ̂\hat\theta from with α\alpha^\star and γ=1\gamma=1, α̂\hat\alpha from with μ̂\hat\mu, θ̂\hat\theta and γ=1\gamma=1,

2.5.12. Pareto I

Pareto I corresponds to Pareto IV with γ=1\gamma=1, μ=θ\mu=\theta.

The MLE is μ̂=miniXi,α̂=(1ni=1nlog(Xi/μ̂))1.\begin{equation} \hat\mu = \min_i X_i, \hat\alpha = \left(\frac1n \sum_{i=1}^n \log(X_i/\hat\mu) \right)^{-1}. \label{eq:pareto1:alpha:mu:relation} \end{equation}

This can be rewritten with the geometric mean of the sample Gn=(i=1nXi)1/nG_n = (\prod_{i=1}^n X_i)^{1/n} as α̂=log(Gn/μ̂). \hat\alpha = \log(G_n/\hat\mu).

Initial parameter estimation is μ̂\hat\mu, α̂\hat\alpha from .

2.5.13. Pareto

Pareto corresponds to Pareto IV with γ=1\gamma=1, μ=0\mu=0. θ=(43)1/α41/αq1q3.\begin{equation} \theta = \frac{ (\frac43)^{1/\alpha} - 4^{1/\alpha} }{q_1- q_3 }. \label{eq:pareto:theta:relation} \end{equation}

Initial parameter estimation is α=max(2,2(m2m12)/(m22m12)), \alpha^\star = \max(2, 2(m_2-m_1^2)/(m_2-2m_1^2)), with mim_i are empirical raw moment of order ii, θ̂\hat\theta from with α\alpha^\star and γ=1\gamma=1, α̂\hat\alpha from with μ=0\mu=0, θ̂\hat\theta and γ=1\gamma=1.

2.6. Transformed gamma family

2.6.1. Transformed gamma distribution

The log-likelihood is given by (α,τ,θ)=nlog(τ)+ατilog(xi/θ)i(xi/θ)τilog(xi)nlog(Gamma(α)). \mathcal L(\alpha,\tau,\theta) = n\log(\tau) + \alpha\tau\sum_i \log(x_i/\theta) -\sum_i (x_i/\theta)^\tau - \sum_i\log(x_i) - n\log(Gamma(\alpha)). The gradient with respect to α,τ,θ\alpha,\tau,\theta is given by (τnψ(α))n/τ+αilog(xi/θ)i(xi/θ)τlog(xi/θ)ατ/θ+iτxiθ2(xi/θ)τ1). \begin{pmatrix} \tau- n\psi(\alpha)) \\ n/\tau + \alpha\sum_i \log(x_i/\theta) -\sum_i (x_i/\theta)^{\tau} \log(x_i/\theta) \\ -\alpha\tau /\theta +\sum_i \tau \frac{x_i}{\theta^2}(x_i/\theta)^{\tau-1} \end{pmatrix}. We compute the moment-estimator as in gamma α̂=m22/μ2,θ̂=μ2/m1. \hat\alpha = m_2^2/\mu_2, \hat\theta= \mu_2/m_1. Then cancelling the first component of the gradient we set τ̂=ψ(α̂)1nilog(xi/θ̂). \hat\tau = \frac{\psi(\hat\alpha)}{\frac1n\sum_i \log(x_i/\hat\theta) }.

2.6.2. gamma distribution

Transformed gamma with τ=1\tau=1

We compute the moment-estimator given by α̂=m22/μ2,θ̂=μ2/m1.\begin{equation} \hat\alpha = m_2^2/\mu_2, \hat\theta= \mu_2/m_1. \label{eq:gamma:relation} \end{equation}

2.6.3. Weibull distribution

Transformed gamma with α=1\alpha=1

Let m̃=1nilog(xi)\tilde m=\frac1n\sum_i \log(x_i) and ṽ=1ni(log(xi)m̃)2\tilde v=\frac1n\sum_i (\log(x_i) - \tilde m)^2. We use an approximate MME τ̂=1.2/sqrt(ṽ),θ̂=exp(m̃+0.572/τ̂). \hat\tau = 1.2/sqrt(\tilde v), \hat\theta = exp(\tilde m + 0.572/\hat \tau). Alternatively, we can use the distribution function F(x)=1e(x/σ)τlog(log(1F(x)))=τlog(x)τlog(θ), F(x) = 1 - e^{-(x/\sigma)^\tau} \Rightarrow \log(-\log(1-F(x))) = \tau\log(x) - \tau\log(\theta), Hence the QME for Weibull is τ̃=log(log(1p1))log(log(1p2))log(x1)log(x2),τ̃=x3/(log(1p3))1/τ̃ \tilde\tau = \frac{ \log(-\log(1-p_1)) - \log(-\log(1-p_2)) }{ \log(x_1) - \log(x_2) }, \tilde\tau = x_3/(-\log(1-p_3))^{1/\tilde\tau} with p1=1/4p_1=1/4, p2=3/4p_2=3/4, p3=1/2p_3=1/2, xix_i corresponding empirical quantiles.

Initial parameters are τ̃\tilde\tau and θ̃\tilde\theta unless the empirical quantiles x1=x2x_1=x_2, in that case we use τ̂\hat\tau, θ̂\hat\theta.

2.6.4. Exponential distribution

The MLE is the MME λ̂=1/m1.\hat\lambda = 1/m_1.

2.7. Inverse transformed gamma family

2.7.1. Inverse transformed gamma distribution

Same as transformed gamma distribution with (1/xi)i(1/x_i)_i.

2.7.2. Inverse gamma distribution

We compute moment-estimator as α̂=(2m2m12)/(m2m12),θ̂=m1m2/(m2m12). \hat\alpha = (2m_2-m_1^2)/(m_2-m_1^2), \hat\theta= m_1m_2/(m_2-m_1^2).

2.7.3. Inverse Weibull distribution

We use the QME.

2.7.4. Inverse exponential

Same as transformed gamma distribution with (1/xi)i(1/x_i)_i.

3. Bibliography

3.1. General books

  • N. L. Johnson, S. Kotz, N. Balakrishnan (1994). Continuous univariate distributions, Volume 1, Wiley.
  • N. L. Johnson, S. Kotz, N. Balakrishnan (1995). Continuous univariate distributions, Volume 2, Wiley.
  • N. L. Johnson, A. W. Kemp, S. Kotz (2008). Univariate discrete distributions, Wiley.
  • G. Wimmer (1999), Thesaurus of univariate discrete probability distributions.

3.2. Books dedicated to a distribution family

  • M. Ahsanullah, B.M. Golam Kibria, M. Shakil (2014). Normal and Student’s t Distributions and Their Applications, Springer.
  • B. C. Arnold (2010). Pareto Distributions, Chapman and Hall.
  • A. Azzalini (2013). The Skew-Normal and Related Families.
  • N. Balakrishnan (2014). Handbook of the Logistic Distribution, CRC Press.

3.3. Books with applications

  • C. Forbes, M. Evans, N. Hastings, B. Peacock (2011). Statistical Distributions, Wiley.
  • Z. A. Karian, E. J. Dudewicz, K. Shimizu (2010). Handbook of Fitting Statistical Distributions with R, CRC Press.
  • K. Krishnamoorthy (2015). Handbook of Statistical Distributions with Applications, Chapman and Hall.
  • Klugman, S., Panjer, H. & Willmot, G. (2019). Loss Models: From Data to Decisions, 5th ed., John Wiley & Sons.