PPC Calibration plots#352
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #352 +/- ##
==========================================
- Coverage 99.18% 98.97% -0.22%
==========================================
Files 35 36 +1
Lines 6132 6526 +394
==========================================
+ Hits 6082 6459 +377
- Misses 50 67 +17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ExamplesThese should allow for some tests of these functions. Creating example datalibrary(bayesplot)
ymin <- range(example_y_data(), example_yrep_draws())[1]
ymax <- range(example_y_data(), example_yrep_draws())[2]
# Observations and posterior predictive probabilitites.
y <- rbinom(length(example_y_data()), 1, (example_y_data() - ymin) / (ymax - ymin))
prep <- (example_yrep_draws() - ymin) / (ymax - ymin)
groups <- example_group_data()PAVA Calibration overlayBasic ppc_calibration_overlay(y, prep[1:50,])Grouped ppc_calibration_overlay_grouped(y, prep[1:50,], groups)PAVA CalibrationThis isn't yet quite what we want. Now the interval is not what we show in the paper. There, we use consistency intervals, that is, intervals centered at the diagonal displaying, where the calibration curve should lie, i.e. the posterior mean should stay within these bounds. ppc_calibration(y, prep)ppc_calibration_grouped(y, prep, groups) |
jgabry
left a comment
There was a problem hiding this comment.
This all sounds good, thanks @TeemuSailynoja. I made a few small review comments/questions. In addition to those questions, when you say
This isn't yet quite what we want. Now the interval is not what we show in the paper.
you mean that we will want to change this to use the consistency intervals you use in the paper, right? Do you think it's at all useful to give the user the option to choose which kind of interval? Or just strictly better to use the consistency intervals? I hadn't really thought about that.
| if (requireNamespace("monotone", quietly = TRUE)) { | ||
| monotone <- monotone::monotone | ||
| } else { | ||
| monotone <- function(y) { | ||
| stats::isoreg(y)$yf | ||
| } | ||
| } |
There was a problem hiding this comment.
Is there an advantage to using monotone::monotone instead of stats::isoreg?
There was a problem hiding this comment.
That is, does it do something slightly better? Or the same thing more efficiently? I've seen stats::isoreg before but I had never seen the monotone package. If there's no difference then it's probably not worth checking for the monotone package. If it's better then we could put monotone in Suggests and then check for it like you do here.
There was a problem hiding this comment.
monotone offers an implementation of the algorithm that is noticeably faster for large samples.
I think it would be good to add it to the suggests.
| #' @rdname PPC-calibration | ||
| #' @export | ||
| ppc_calibration_overlay <- function( | ||
| y, prep, ..., linewidth = 0.25, alpha = 0.5) { |
There was a problem hiding this comment.
So for these functions prep is a matrix of probabilities and not actually a matrix of draws of binary outcomes from the posterior predictive distribution, right? I think in that case the argument name prep makes sense. But the description at the top of the file says
Assess the calibration of the predictive distributions
yrepin relation to the data `y'
which makes it sound like the user should give us yrep. So I think we just need to reconcile how we describe this to the user.
There was a problem hiding this comment.
Your interpretation is right. The description needs more clarity. yrep can be made to be accepted by ppc_calibration (without overlay).
Leaving an option to choose is perhaps the best, as long as the difference is explained in the documentation. Confidence = "Where do we think the calibration curve for our model lies." |
|
Ok great, thanks for the replies. Sounds good to me. |
|
I've now updated the plotting functions ( Please take a look at the changes whenever you have a moment @avehtari, @TeemuSailynoja. I'm happy to receive any feedback and improve things further based on your thoughts. Thank you! |
|
The first comment says
and then
It would be nice to list the loo versions in the list of implemented functions. We need also better documentation for what are the expected arguments and how the plots are computed. With where posterior_predict gives posterior draws of counts and not probabilities, and then E_loo compute LOO-expectation so that we get single probability for each observation and In addition, for |
done |
See corresponding vignette: vignettes/ppc-calibration.Rmd |
|
@jgabry @avehtari This repo is ready for final review. It would be nice if you can have a look. Probably you can pay specific attention to how I incorporated this vignette at the moment as I was not sure how to do this correctly. Thank you! |
jgabry
left a comment
There was a problem hiding this comment.
Here's a first round of review comments, sorry for the delay!
| ord = order(.data$value), | ||
| y_id = .data$ord, | ||
| value = .data$value[.data$ord], | ||
| cep = monotone(y[.data$ord]) |
There was a problem hiding this comment.
I think here:
- ord = order(.data$value) is position-within-group
- cep = monotone(y[.data$ord]) indexes the full y
For any group after the first, does this compute CEPs using the wrong observations? Do we need to preserve the original y_id from .ppd_data() and index y with the ordered original IDs?
There was a problem hiding this comment.
Yes, indeed here was a logical mistake.
I changed it now such that the original index is preserved and used to index y before applying the ordering:
d <- .ppd_data(prep, group = group) |>
group_by(.data$group, .data$rep_id) |>
mutate(
ord = order(.data$value),
y_id_orig = .data$y_id, # for selecting groups in y
y_id = .data$ord,
value = .data$value[.data$ord],
cep = monotone(y[.data$y_id_orig[.data$ord]])
) |>
ungroup() |>
dplyr::select(dplyr::all_of(c("group", "y_id", "rep_id", "value", "cep")))
| .loo_resampling_probs <- function(w) { | ||
| if (!all(is.finite(w))) { | ||
| abort("All values in 'lw' must be finite.") | ||
| } | ||
| p <- if (any(w < 0)) { | ||
| # Treat negative entries as log-weights and stabilize before exponentiating. | ||
| exp(w - max(w)) | ||
| } else { | ||
| w | ||
| } | ||
| total <- sum(p) | ||
| if (!is.finite(total) || total <= 0) { | ||
| rep(1 / length(w), length(w)) | ||
| } else { | ||
| p / total | ||
| } | ||
| } |
There was a problem hiding this comment.
If I understand correctly, this function treats weights as log weights only if any entry is negative. But lw is documented as log weights, and I think valid unnormalized log weights can all be positive, right? Are we always assuming already normalized log weights?
There was a problem hiding this comment.
yes, unnormalized log weights can all be positive
There was a problem hiding this comment.
I updated the logic:
- I assume log weights are provided as this is also what we document as required input
- change
.loo_resampling_probs()to.normalize_lwand focus only on normalization (takeexpout of the function) - normalization is done on all passed
lw(i.e., log-weights; when lw is already normalized this should simply do "nothing")
.normalize_lw <- function(lw) {
if (!all(is.finite(lw))) {
abort("All values in 'lw' must be finite.")
}
max_lw <- max(lw)
log_sum_exp <- max_lw + log(sum(exp(lw - max_lw)))
lw - log_sum_exp
}
.loo_resample_data <- function(yrep, lw, psis_object) {
lw <- .get_lw(lw, psis_object)
stopifnot(identical(dim(yrep), dim(lw)))
yrep <- as.matrix(yrep)
lw <- as.matrix(lw)
# Resample each column (observation) with its corresponding weights.
# Sampling indices directly is much faster than constructing draws objects.
n_obs <- ncol(yrep)
n_draws <- nrow(yrep)
yrep_resampled <- matrix(NA_real_, nrow = n_draws, ncol = n_obs)
for (i in seq_len(n_obs)) {
probs_i <- exp(.normalize_lw(lw[, i]))
idx_i <- sample.int(n_draws, size = n_draws, replace = TRUE, prob = probs_i)
yrep_resampled[, i] <- yrep[idx_i, i]
}
# Add observation names if available
if (!is.null(colnames(yrep))) {
colnames(yrep_resampled) <- colnames(yrep)
}
yrep_resampled
}
…nd update documentation




This is my work in progress of the pava calibration plots discussed in #343
Currently implemented:
ppc_calibration_overlay()ppc_calibration_overlay_grouped()ppc_calibration()ppc_calibration_grouped()ppc_loo_calibration()ppc_loo_calibration_grouped()ppc_calibration_data()Needs:
ppc_calibration().ppc_calibration_data()be exposed to users?