Bootstrap simulation of uncertainty for censored data
bootdistcens.Rd
Uses nonparametric bootstrap resampling in order to simulate uncertainty in the parameters of the distribution fitted to censored data.
Usage
bootdistcens(f, niter = 1001, silent = TRUE,
parallel = c("no", "snow", "multicore"), ncpus)
# S3 method for class 'bootdistcens'
print(x, ...)
# S3 method for class 'bootdistcens'
plot(x, ...)
# S3 method for class 'bootdistcens'
summary(object, ...)
# S3 method for class 'bootdistcens'
density(..., bw = nrd0, adjust = 1, kernel = "gaussian")
# S3 method for class 'density.bootdistcens'
plot(x, mar=c(4,4,2,1), lty=NULL, col=NULL, lwd=NULL, ...)
# S3 method for class 'density.bootdistcens'
print(x, ...)
Arguments
- f
An object of class
"fitdistcens"
, output of thefitdistcens
function.- niter
The number of samples drawn by bootstrap.
- silent
A logical to remove or show warnings and errors when bootstraping.
- parallel
The type of parallel operation to be used,
"snow"
or"multicore"
(the second one not being available on Windows), or"no"
if no parallel operation.- ncpus
Number of processes to be used in parallel operation : typically one would fix it to the number of available CPUs.
- x
An object of class
"bootdistcens"
.- object
An object of class
"bootdistcens"
.- ...
Further arguments to be passed to generic methods or
"bootdistcens"
objects fordensity
.- bw, adjust, kernel
resp. the smoothing bandwidth, the scaling factor, the kernel used, see
density
.- mar
A numerical vector of the form
c(bottom, left, top, right)
, seepar
.- lty, col, lwd
resp. the line type, the color, the line width, see
par
.
Details
Samples are drawn by
nonparametric bootstrap (resampling with replacement from the data set). On each bootstrap sample the function
mledist
is used to estimate bootstrapped values of parameters. When mledist
fails
to converge, NA
values are returned. Medians and 2.5 and 97.5 percentiles are computed by removing
NA
values. The medians and the 95 percent confidence intervals of parameters (2.5 and 97.5 percentiles)
are printed in the summary.
If inferior to the whole number of iterations, the number of iterations for which mledist
converges
is also printed in the summary.
The plot of an object of class "bootdistcens"
consists in a scatterplot or a matrix of scatterplots
of the bootstrapped values of parameters.
It uses the function stripchart
when the fitted distribution
is characterized by only one parameter, and the function plot
in other cases.
In these last cases, it provides
a representation of the joint uncertainty distribution of the fitted parameters.
It is possible to accelerate the bootstrap using parallelization. We recommend you to
use parallel = "multicore"
, or parallel = "snow"
if you work on Windows,
and to fix ncpus
to the number of available processors.
density
computes the empirical density of bootdistcens
objects using the
density
function (with Gaussian kernel by default).
It returns an object of class density.bootdistcens
for which print
and plot
methods are provided.
Value
bootdistcens
returns an object of class "bootdistcens"
, a list with 6 components,
- estim
a data frame containing the bootstrapped values of parameters.
- converg
a vector containing the codes for convergence of the iterative method used to estimate parameters on each bootstraped data set.
- method
A character string coding for the type of resampling : in this case
"nonparam"
as it is the only available method for censored data.- nbboot
The number of samples drawn by bootstrap.
- CI
bootstrap medians and 95 percent confidence percentile intervals of parameters.
- fitpart
The object of class
"fitdistcens"
on which the bootstrap procedure was applied.
Generic functions:
print
The print of a
"bootdistcens"
object shows the bootstrap parameter estimates. If inferior to the whole number of bootstrap iterations, the number of iterations for which the estimation converges is also printed.summary
The summary provides the median and 2.5 and 97.5 percentiles of each parameter. If inferior to the whole number of bootstrap iterations, the number of iterations for which the estimation converges is also printed in the summary.
plot
The plot shows the bootstrap estimates with the
stripchart
function for univariate parameters andplot
function for multivariate parameters.density
The density computes empirical densities and return an object of class
density.bootdistcens
.
See also
See fitdistrplus
for an overview of the package.
fitdistcens
, mledist
, quantile.bootdistcens
for another generic function to calculate
quantiles from the fitted distribution and its bootstrap results
and CIcdfplot
for adding confidence intervals on quantiles
to a CDF plot of the fitted distribution.
References
Cullen AC and Frey HC (1999), Probabilistic techniques in exposure assessment. Plenum Press, USA, pp. 181-241.
Delignette-Muller ML and Dutang C (2015), fitdistrplus: An R Package for Fitting Distributions. Journal of Statistical Software, 64(4), 1-34, doi:10.18637/jss.v064.i04 .
Examples
# We choose a low number of bootstrap replicates in order to satisfy CRAN running times
# constraint.
# For practical applications, we recommend to use at least niter=501 or niter=1001.
# (1) Fit of a normal distribution to fluazinam data in log10
# followed by nonparametric bootstrap and calculation of quantiles
# with 95 percent confidence intervals
#
data(fluazinam)
(d1 <-log10(fluazinam))
#> left right
#> 1 0.5797836 0.5797836
#> 2 1.5263393 1.5263393
#> 3 1.9395193 1.9395193
#> 4 3.2304489 NA
#> 5 2.8061800 2.8061800
#> 6 3.0625820 NA
#> 7 2.0530784 2.0530784
#> 8 2.1105897 2.1105897
#> 9 2.7678976 2.7678976
#> 10 3.2685780 NA
#> 11 0.2041200 0.2041200
#> 12 0.6812412 0.6812412
#> 13 1.9138139 1.9138139
#> 14 2.1903317 2.1903317
f1 <- fitdistcens(d1, "norm")
b1 <- bootdistcens(f1, niter = 51)
b1
#> Parameter values obtained with nonparametric bootstrap
#> mean sd
#> 1 2.148176 1.2301856
#> 2 2.359487 1.1144722
#> 3 1.886811 0.7960468
#> 4 1.983487 0.9941790
#> 5 1.912052 0.9906398
#> 6 2.189226 0.9088450
#> 7 2.287131 1.2049569
#> 8 2.288832 0.7645444
#> 9 1.787691 1.0077846
#> 10 2.893830 1.2229467
#> 11 2.569893 0.9597859
#> 12 2.343772 1.2402711
#> 13 2.645568 1.2934746
#> 14 1.942141 0.5982854
#> 15 1.932680 1.0077309
#> 16 1.824771 1.0653955
#> 17 2.983895 1.8018944
#> 18 2.347785 1.3994097
#> 19 1.845464 0.8555560
#> 20 2.427059 1.5893095
#> 21 1.948223 0.8705864
#> 22 1.692356 1.0223265
#> 23 2.275639 0.8147514
#> 24 2.148972 1.0345423
#> 25 2.348520 1.1739100
#> 26 1.893396 1.1106869
#> 27 1.911591 1.1574565
#> 28 2.610027 1.0803468
#> 29 2.080525 1.3340362
#> 30 1.985938 0.9870137
#> 31 1.742953 1.0956522
#> 32 2.549440 1.0330325
#> 33 2.268481 0.4832085
#> 34 2.144250 1.3228431
#> 35 2.184267 1.2698264
#> 36 1.821893 1.5316162
#> 37 2.085662 1.1654912
#> 38 1.868720 1.0912928
#> 39 2.138497 1.1356628
#> 40 2.119477 0.9868753
#> 41 2.153767 1.1818298
#> 42 1.933517 0.5773863
#> 43 2.074073 0.7280150
#> 44 2.421981 1.1254148
#> 45 2.486787 0.6096348
#> 46 2.030623 1.0934793
#> 47 1.938514 1.0258803
#> 48 1.678181 1.2224439
#> 49 2.339840 1.3061770
#> 50 2.278660 0.7921537
#> 51 2.195027 1.1382020
summary(b1)
#> Nonparametric bootstrap medians and 95% percentile CI
#> Median 2.5% 97.5%
#> mean 2.144250 1.7050054 2.831765
#> sd 1.091293 0.5826111 1.574886
plot(b1)
quantile(b1)
#> (original) estimated quantiles for each specified probability (censored data)
#> p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6 p=0.7
#> estimate 0.6655064 1.179033 1.549321 1.86572 2.161449 2.457179 2.773577
#> p=0.8 p=0.9
#> estimate 3.143865 3.657392
#> Median of bootstrap estimates
#> p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6 p=0.7
#> estimate 0.7210295 1.215519 1.593624 1.854354 2.14425 2.418499 2.691487
#> p=0.8 p=0.9
#> estimate 2.961351 3.394931
#>
#> two-sided 95 % CI of each quantile
#> p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6 p=0.7
#> 2.5 % 0.1683713 0.6922166 1.066910 1.433480 1.705005 1.996046 2.241195
#> 97.5 % 1.5718878 1.8638800 2.141966 2.479624 2.831765 3.146062 3.482325
#> p=0.8 p=0.9
#> 2.5 % 2.472445 2.753590
#> 97.5 % 3.883480 4.463155
CIcdfplot(b1, CI.output = "quantile")
plot(density(b1))
#> List of 1
#> $ :List of 6
#> ..$ estim :'data.frame': 51 obs. of 2 variables:
#> .. ..$ mean: num [1:51] 2.15 2.36 1.89 1.98 1.91 ...
#> .. ..$ sd : num [1:51] 1.23 1.114 0.796 0.994 0.991 ...
#> ..$ converg: num [1:51] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ method : chr "nonparam"
#> ..$ nbboot : num 51
#> ..$ CI : num [1:2, 1:3] 2.144 1.091 1.705 0.583 2.832 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : chr [1:2] "mean" "sd"
#> .. .. ..$ : chr [1:3] "Median" "2.5%" "97.5%"
#> ..$ fitpart:List of 17
#> .. ..$ estimate : Named num [1:2] 2.16 1.17
#> .. .. ..- attr(*, "names")= chr [1:2] "mean" "sd"
#> .. ..$ method : chr "mle"
#> .. ..$ sd : Named num [1:2] 0.322 0.263
#> .. .. ..- attr(*, "names")= chr [1:2] "mean" "sd"
#> .. ..$ cor : num [1:2, 1:2] 1 0.135 0.135 1
#> .. .. ..- attr(*, "dimnames")=List of 2
#> .. .. .. ..$ : chr [1:2] "mean" "sd"
#> .. .. .. ..$ : chr [1:2] "mean" "sd"
#> .. ..$ vcov : num [1:2, 1:2] 0.1039 0.0114 0.0114 0.0692
#> .. .. ..- attr(*, "dimnames")=List of 2
#> .. .. .. ..$ : chr [1:2] "mean" "sd"
#> .. .. .. ..$ : chr [1:2] "mean" "sd"
#> .. ..$ loglik : num -20.4
#> .. ..$ aic : num 44.8
#> .. ..$ bic : num 46.1
#> .. ..$ n : int 14
#> .. ..$ censdata :'data.frame': 14 obs. of 2 variables:
#> .. .. ..$ left : num [1:14] 0.58 1.53 1.94 3.23 2.81 ...
#> .. .. ..$ right: num [1:14] 0.58 1.53 1.94 NA 2.81 ...
#> .. ..$ distname : chr "norm"
#> .. ..$ fix.arg : NULL
#> .. ..$ fix.arg.fun: NULL
#> .. ..$ dots : NULL
#> .. ..$ convergence: int 0
#> .. ..$ discrete : logi FALSE
#> .. ..$ weights : NULL
#> .. ..- attr(*, "class")= chr "fitdistcens"
#> ..- attr(*, "class")= chr "bootdistcens"
#> NULL
# (2) Estimation of the mean of the normal distribution
# by maximum likelihood with the standard deviation fixed at 1
# using the argument fix.arg
# followed by nonparametric bootstrap
# and calculation of quantiles with 95 percent confidence intervals
#
f1b <- fitdistcens(d1, "norm", start = list(mean = 1),fix.arg = list(sd = 1))
b1b <- bootdistcens(f1b, niter = 51)
summary(b1b)
#> Nonparametric bootstrap medians and 95% percentile CI
#> Median 2.5% 97.5%
#> 2.175510 1.729164 2.788799
plot(b1b)
quantile(b1b)
#> (original) estimated quantiles for each specified probability (censored data)
#> p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6 p=0.7
#> estimate 0.8527461 1.292676 1.609897 1.880951 2.134298 2.387645 2.658698
#> p=0.8 p=0.9
#> estimate 2.975919 3.415849
#> Median of bootstrap estimates
#> p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6 p=0.7 p=0.8
#> estimate 0.8939584 1.333889 1.651109 1.922163 2.17551 2.428857 2.69991 3.017131
#> p=0.9
#> estimate 3.457062
#>
#> two-sided 95 % CI of each quantile
#> p=0.1 p=0.2 p=0.3 p=0.4 p=0.5 p=0.6 p=0.7
#> 2.5 % 0.4476121 0.8875425 1.204763 1.475817 1.729164 1.982511 2.253564
#> 97.5 % 1.5072477 1.9471780 2.264399 2.535452 2.788799 3.042146 3.313200
#> p=0.8 p=0.9
#> 2.5 % 2.570785 3.010715
#> 97.5 % 3.630420 4.070351
# (3) comparison of sequential and parallel versions of bootstrap
# to be tried with a greater number of iterations (1001 or more)
#
# \donttest{
niter <- 1001
data(fluazinam)
d1 <-log10(fluazinam)
f1 <- fitdistcens(d1, "norm")
# sequential version
ptm <- proc.time()
summary(bootdistcens(f1, niter = niter))
#> Nonparametric bootstrap medians and 95% percentile CI
#> Median 2.5% 97.5%
#> mean 2.146743 1.5792689 2.877993
#> sd 1.129426 0.6853478 1.709083
proc.time() - ptm
#> user system elapsed
#> 4.688 0.001 4.693
# parallel version using snow
require("parallel")
ptm <- proc.time()
summary(bootdistcens(f1, niter = niter, parallel = "snow", ncpus = 2))
#> Nonparametric bootstrap medians and 95% percentile CI
#> Median 2.5% 97.5%
#> mean 2.144793 1.5914352 2.899763
#> sd 1.108123 0.6912424 1.673702
proc.time() - ptm
#> user system elapsed
#> 0.008 0.001 3.384
# parallel version using multicore (not available on Windows)
ptm <- proc.time()
summary(bootdistcens(f1, niter = niter, parallel = "multicore", ncpus = 2))
#> Nonparametric bootstrap medians and 95% percentile CI
#> Median 2.5% 97.5%
#> mean 2.163302 1.5524788 2.874380
#> sd 1.119044 0.7072572 1.656059
proc.time() - ptm
#> user system elapsed
#> 0.004 0.023 2.505
# }