Dependence aware tests for ppc*ecdf#428
Conversation
…improve input validation checks
jgabry
left a comment
There was a problem hiding this comment.
Here are a few comments and questions. So far I've only looked at some of the code, so I may have more, but I thought I would give you these now so we could start discussing.
… corresponding unittests
…e-bockting/bayesplot into dependence-aware-LOO-PIT
|
@florence-bockting The changes look good, but I think there are the same issues (with the |
|
@jgabry I have refactored the code such that it removes duplicate code and documentation by creating new helper functions or template files. |
|
@florence-bockting please also officially add yourself to the DESCRIPTION file as part of this PR! |
|
@jgabry just as a little reminder that this PR is still waiting for final review. Thank you |
jgabry
left a comment
There was a problem hiding this comment.
Here are some review comments. Sorry for the delay! Overall I think this is in good shape, just a few relatively minor comments.
| ) { | ||
| # pareto-pit values | ||
| if (isTRUE(pareto_pit) && is.null(pit)) { | ||
| suggested_package("rstantools") |
There was a problem hiding this comment.
I don't think rstantools is actually needed for this branch here, right?
| } | ||
| } | ||
|
|
||
| alpha <- 1 - prob |
There was a problem hiding this comment.
Should we validate that prob is a valid probability? Either here or before it's passed in? I guess whatever is consistent with how we've been checking other arguments.
| #' @param color When `method = "correlated"`, a vector with base color and | ||
| #' highlight color for the ECDF plot. Defaults to | ||
| #' `c(ecdf = "grey60", highlight = "red")`. The first element is used for | ||
| #' the main ECDF line, the second for highlighted suspicious regions. |
There was a problem hiding this comment.
The code that processes the color vector requires these exact names to be used in order to work properly (I think) but it doesn't check for the names. It indexes color["ecdf"] and color["highlight"]. The doc also mentions "first element" and "second element", but it's the names, not the position, that are used in the code. The user can pass incorrect names or unnamed vectors and I think the colors won't render properly. We should probably either check for the names or just use the position (first element and second element) and forget about the names.
| #' For `ppc_loo_pit_ecdf()` when `method = 'independent'`. | ||
| #' The default is to use interpolation if `K` is greater than 200. | ||
| #' @param pit An optional vector of probability integral transformed values for | ||
| #' which the ECDF is to be drawn. For `ppc_loo_pit_ecdf()`. If `NULL`, PIT |
There was a problem hiding this comment.
Should we also mention the other ppc_loo_* functions that take the pit argument in addition to ppc_loo_pit_ecdf()?
| #' @note | ||
| #' Note that the default "independent" method is **superseded** by | ||
| #' the "correlated" method (Tesso & Vehtari, 2026) which accounts for dependent | ||
| #' PIT values. |
There was a problem hiding this comment.
Because this is documented on the same page as the other function that the same @note it shows up twice in the .Rd file. However, it also doesn't show up at all (at least for me) when rendering the doc in RStudio. Maybe the Note field isn't rendered? Do you see that too?
Another option would be to remove both uses of @note and instead use the Plot Descriptions section. And I guess same thing for the loo versions, if applicable.
Description
The current approach in
ppc_loo_pit_ecdfandppc_pit_ecdfassumes independence of LOO-PIT values which is not valid (Marhunenda et al., 2005). The corresponding graphical test yields an envelope that is too wide, reducing the test's ability to reveal model miscalibration.Tesso & Vehtari (2026, see preprint) propose three testing procedures that can handle any dependent uniform values and provide an updated graphical representation that uses color coding to indicate influential regions or most influential points of the ECDF. This PR implements the new development, by adding the updated approach (method = "correlated") additionally to the previous approach (method = "independent").
TODOs
ppc_loo_pit_ecdf()function inppc-loo.Rppc_pit_ecdf()andppc_pit_ecdf_grouped()function inppc-distributions.Rmethodargumentmethodargument