Bootstrap simulation of uncertainty for non-censored data
bootdist.Rd
Uses parametric or nonparametric bootstrap resampling in order to simulate uncertainty in the parameters of the distribution fitted to non-censored data.
Usage
bootdist(f, bootmethod = "param", niter = 1001, silent = TRUE,
parallel = c("no", "snow", "multicore"), ncpus)
# S3 method for bootdist
print(x, ...)
# S3 method for bootdist
plot(x, main = "Bootstrapped values of parameters", enhance = FALSE,
trueval = NULL, rampcol = NULL, nbgrid = 100, nbcol = 100, ...)
# S3 method for bootdist
summary(object, ...)
Arguments
- f
An object of class
"fitdist"
, output of thefitdist
function.- bootmethod
A character string coding for the type of resampling :
"param"
for a parametric resampling and"nonparam"
for a nonparametric resampling of data.- niter
The number of samples drawn by bootstrap.
- silent
A logical to remove or show warnings and errors when bootstraping.
- parallel
The type of parallel operation to be used,
"snow"
or"multicore"
(the second one not being available on Windows), or"no"
if no parallel operation.- ncpus
Number of processes to be used in parallel operation : typically one would fix it to the number of available CPUs.
- x
An object of class
"bootdist"
.- object
An object of class
"bootdist"
.- main
an overall title for the plot: see
title
, default to"Bootstrapped values of parameters"
.- enhance
a logical to get an enhanced plot.
- trueval
when relevant, a numeric vector with the true value of parameters (for backfitting purposes).
- rampcol
colors to interpolate; must be a valid argument to
colorRampPalette()
.- nbgrid
Number of grid points in each direction. Can be scalar or a length-2 integer vector.
- nbcol
an integer argument, the required number of colors
- ...
Further arguments to be passed to generic methods
Details
Samples are drawn by parametric bootstrap (resampling from the distribution fitted by
fitdist
) or nonparametric bootstrap (resampling with replacement from the
data set). On each bootstrap sample the function
mledist
(or mmedist
, qmedist
, mgedist
according to the component f$method
of the object of class "fitdist"
) is
used to estimate bootstrapped values of parameters. When that function fails
to converge, NA
values are returned. Medians and 2.5 and 97.5 percentiles are
computed by removing NA
values.
The medians and the 95 percent confidence intervals of parameters (2.5 and 97.5
percentiles) are printed in the summary.
If inferior to the whole number of iterations, the number of iterations for which
the function converges is also printed in the summary.
By default (when enhance=FALSE
), the plot of an object of class
"bootdist"
consists in a scatterplot or a matrix
of scatterplots of the bootstrapped values of parameters.
It uses the function stripchart
when the fitted distribution
is characterized by only one parameter, the function plot
when there
are two paramters and the function pairs
in other cases.
In these last cases, it provides a representation of the joint uncertainty distribution
of the fitted parameters.
When enhance=TRUE
, a personalized plot version of pairs
is used where
upper graphs are scatterplots and lower graphs are heatmap image using image
based on a kernel based estimator for the 2D density function (using kde2d
from
MASS package).
Arguments rampcol
, nbgrid
, nbcol
can be used to customize the plots.
Defautls values are rampcol=c("green", "yellow", "orange", "red")
, nbcol=100
(see colorRampPalette()
), nbgrid=100
(see kde2d
).
In addition, when fitting parameters on simulated datasets for backtesting purposes, an
additional argument trueval
can be used to plot a cross at the true value.
It is possible to accelerate the bootstrap using parallelization. We recommend you to
use parallel = "multicore"
, or parallel = "snow"
if you work on Windows,
and to fix ncpus
to the number of available processors.
Value
bootdist
returns an object of class "bootdist"
, a list with 6 components,
- estim
a data frame containing the bootstrapped values of parameters.
- converg
a vector containing the codes for convergence obtained if an iterative method is used to estimate parameters on each bootstraped data set (and 0 if a closed formula is used).
- method
A character string coding for the type of resampling :
"param"
for a parametric resampling and"nonparam"
for a nonparametric resampling.- nbboot
The number of samples drawn by bootstrap.
- CI
bootstrap medians and 95 percent confidence percentile intervals of parameters.
- fitpart
The object of class
"fitdist"
on which the bootstrap procedure was applied.
Generic functions:
print
The print of a
"bootdist"
object shows the bootstrap parameter estimates. If inferior to the whole number of bootstrap iterations, the number of iterations for which the estimation converges is also printed.summary
The summary provides the median and 2.5 and 97.5 percentiles of each parameter. If inferior to the whole number of bootstrap iterations, the number of iterations for which the estimation converges is also printed in the summary.
plot
The plot shows the bootstrap estimates with
stripchart
function for univariate parameters andplot
function for multivariate parameters.
See also
See fitdistrplus
for an overview of the package.
fitdist
, mledist
, qmedist
, mmedist
,
mgedist
,
quantile.bootdist
for another generic function to calculate
quantiles from the fitted distribution and its bootstrap results
and CIcdfplot
for adding confidence intervals on quantiles
to a CDF plot of the fitted distribution.
References
Cullen AC and Frey HC (1999), Probabilistic techniques in exposure assessment. Plenum Press, USA, pp. 181-241.
Delignette-Muller ML and Dutang C (2015), fitdistrplus: An R Package for Fitting Distributions. Journal of Statistical Software, 64(4), 1-34.
Examples
# We choose a low number of bootstrap replicates in order to satisfy CRAN running times
# constraint.
# For practical applications, we recommend to use at least niter=501 or niter=1001.
# (1) Fit of a gamma distribution to serving size data
# using default method (maximum likelihood estimation)
# followed by parametric bootstrap
#
data(groundbeef)
x1 <- groundbeef$serving
f1 <- fitdist(x1, "gamma")
b1 <- bootdist(f1, niter=51)
print(b1)
#> Parameter values obtained with parametric bootstrap
#> shape scale
#> 1 4.016505 18.62730
#> 2 4.213469 17.35657
#> 3 4.175406 17.22451
#> 4 4.117900 16.83053
#> 5 5.011972 13.90755
#> 6 4.461217 17.21746
plot(b1)
plot(b1, enhance=TRUE)
summary(b1)
#> Parametric bootstrap medians and 95% percentile CI
#> Median 2.5% 97.5%
#> shape 4.121219 3.322919 5.117759
#> scale 18.126660 13.952363 21.349067
quantile(b1)
#> (original) estimated quantiles for each specified probability (non-censored data)
#> p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6 p=0.7 p=0.8
#> estimate 32.16365 42.32272 50.91374 59.1481 67.62286 76.87768 87.67198 101.5148
#> p=0.9
#> estimate 122.9481
#> Median of bootstrap estimates
#> p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6 p=0.7 p=0.8
#> estimate 32.70591 42.82083 50.98702 59.24409 67.58948 76.41125 87.17545 100.828
#> p=0.9
#> estimate 121.5313
#>
#> two-sided 95 % CI of each quantile
#> p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6 p=0.7 p=0.8
#> 2.5 % 27.77100 37.44132 45.72310 53.95982 62.25721 71.29931 81.64418 93.99445
#> 97.5 % 35.65307 45.22246 53.97198 62.57716 71.30748 81.30113 92.96276 107.73191
#> p=0.9
#> 2.5 % 113.9604
#> 97.5 % 130.6713
CIcdfplot(b1, CI.output = "quantile")
# (2) non parametric bootstrap on the same fit
#
b1b <- bootdist(f1, bootmethod="nonparam", niter=51)
summary(b1b)
#> Nonparametric bootstrap medians and 95% percentile CI
#> Median 2.5% 97.5%
#> shape 4.084541 3.477478 4.713645
#> scale 17.975719 15.861494 20.845535
quantile(b1b)
#> (original) estimated quantiles for each specified probability (non-censored data)
#> p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6 p=0.7 p=0.8
#> estimate 32.16365 42.32272 50.91374 59.1481 67.62286 76.87768 87.67198 101.5148
#> p=0.9
#> estimate 122.9481
#> Median of bootstrap estimates
#> p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6 p=0.7
#> estimate 32.24852 42.25393 51.01017 59.04628 67.47751 76.96222 87.66323
#> p=0.8 p=0.9
#> estimate 100.8619 121.7961
#>
#> two-sided 95 % CI of each quantile
#> p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6 p=0.7 p=0.8
#> 2.5 % 28.76945 38.75936 47.17898 55.17432 63.30956 72.15180 82.20961 95.19234
#> 97.5 % 36.50182 46.75574 55.29043 63.38309 71.64079 80.59742 91.30562 105.91106
#> p=0.9
#> 2.5 % 114.9971
#> 97.5 % 128.1216
# (3) Fit of a normal distribution on acute toxicity values of endosulfan in log10 for
# nonarthropod invertebrates, using maximum likelihood estimation
# to estimate what is called a species sensitivity distribution
# (SSD) in ecotoxicology, followed by estimation of the 5 percent quantile value of
# the fitted distribution, what is called the 5 percent hazardous concentration (HC5)
# in ecotoxicology, with its two-sided 95 percent confidence interval calculated by
# parametric bootstrap
#
data(endosulfan)
ATV <- subset(endosulfan, group == "NonArthroInvert")$ATV
log10ATV <- log10(subset(endosulfan, group == "NonArthroInvert")$ATV)
fln <- fitdist(log10ATV, "norm")
bln <- bootdist(fln, bootmethod = "param", niter=51)
quantile(bln, probs = c(0.05, 0.1, 0.2))
#> (original) estimated quantiles for each specified probability (non-censored data)
#> p=0.05 p=0.1 p=0.2
#> estimate 1.744227 2.080093 2.4868
#> Median of bootstrap estimates
#> p=0.05 p=0.1 p=0.2
#> estimate 1.811067 2.156258 2.529461
#>
#> two-sided 95 % CI of each quantile
#> p=0.05 p=0.1 p=0.2
#> 2.5 % 1.187935 1.634263 2.095273
#> 97.5 % 2.276507 2.563692 2.917189
# (4) comparison of sequential and parallel versions of bootstrap
# to be tried with a greater number of iterations (1001 or more)
#
# \donttest{
niter <- 1001
data(groundbeef)
x1 <- groundbeef$serving
f1 <- fitdist(x1, "gamma")
# sequential version
ptm <- proc.time()
summary(bootdist(f1, niter = niter))
#> Parametric bootstrap medians and 95% percentile CI
#> Median 2.5% 97.5%
#> shape 4.023072 3.464168 4.716077
#> scale 18.301770 15.441187 21.642130
proc.time() - ptm
#> user system elapsed
#> 3.244 0.092 3.227
# parallel version using snow
require(parallel)
#> Loading required package: parallel
ptm <- proc.time()
summary(bootdist(f1, niter = niter, parallel = "snow", ncpus = 2))
#> Parametric bootstrap medians and 95% percentile CI
#> Median 2.5% 97.5%
#> shape 4.016668 3.447829 4.744621
#> scale 18.318954 15.352944 21.646403
proc.time() - ptm
#> user system elapsed
#> 0.038 0.003 3.381
# parallel version using multicore (not available on Windows)
ptm <- proc.time()
summary(bootdist(f1, niter = niter, parallel = "multicore", ncpus = 2))
#> Parametric bootstrap medians and 95% percentile CI
#> Median 2.5% 97.5%
#> shape 4.037716 3.479693 4.727559
#> scale 18.192659 15.365769 21.495557
proc.time() - ptm
#> user system elapsed
#> 0.035 0.012 1.787
# }