Fit a mixed-subjects 2PL calibration via marginal maximum likelihood
Source:R/fit.R
fit_mixed_subjects_mml.RdEstimates item parameters using the true IRT marginal likelihood for all
three loss terms. Unlike fit_mixed_subjects(), which freezes posterior
quadrature weights at the initial parameter estimates before optimizing,
this function recomputes posterior weights at every gradient evaluation.
This eliminates the gradient asymmetry that causes fit_mixed_subjects() to
converge to false minima at inflated discrimination values when LLM item
parameters differ from human parameters.
Arguments
- observed
Human response matrix, with rows for subjects and columns for items. Values must be binary when
initial_parsis omitted.- predicted
Binary LLM responses (0/1) for the same rows and items as
observed. Probabilities are not accepted: fractional values are not a valid likelihood input for the marginal IRT objective and break the PPI correction, so sample binary responses from any probabilities first (e.g.rbinom).- generated
Binary generated or unlabeled LLM responses (0/1) for the same item columns. Probabilities are not accepted (see
predicted).- lambda
Power-tuning parameter in
[0, 1].- n_quad
Number of standard-normal quadrature nodes.
- initial_pars
Optional starting item parameters. If omitted, a 2PL model is fit to
observed.- quadrature
Optional quadrature grid with
thetaandweightcolumns.- mml_pred_weights
How to compute posteriors for the paired
predictedterm."own"uses posteriors from the predicted responses;"human"uses posteriors from the observed human responses. See Details.- slope_lower
Lower bound for discrimination parameters during optimization. Use
NULLfor no lower bound.- slope_upper
Upper bound on discrimination parameters. Unlike
fit_mixed_subjects(), this function should not require capping for well-posed problems because the true marginal objective has no false minimum at large discrimination.- control
Control list passed to
stats::optim().- ...
Additional arguments passed to
fit_2pl()wheninitial_parsis omitted.
Value
An object of class "mixedsubjects_fit" with the same structure as
fit_mixed_subjects(). For scalar lambda fits, the quadrature
summaries store posteriors at the converged parameters, and
stats::vcov() dispatches automatically to
vcov_mixed_subjects_mml() to compute the Louis-corrected marginal
sandwich covariance. Calling vcov_mixed_subjects() directly bypasses
the Louis correction. For vector lambda fits, the summaries store
the frozen posteriors used during optimization, and stats::vcov()
dispatches to vcov_mixed_subjects() (EM bread) for consistency with the
frozen Q-function objective.
Details
Why it matters for lambda selection. With the frozen expected-count
implementation, the gradient of L_pred uses concentrated human posteriors
while L_gen uses diffuse LLM posteriors, making
grad(L_pred) >> grad(L_gen) and systematically pushing discriminations
upward at any lambda > 0. In the marginal-MML formulation all three terms
use their own current-parameter posteriors, so the asymmetry is absent at the
true optimum. As a result tune_lambda_ability_risk() selects lambda > 0
whenever the LLM predictions are genuinely informative (e.g. predicted = observed), rather than collapsing to lambda = 0 for all misaligned LLMs.
mml_pred_weights.
"own"(default)L_pred uses posteriors computed from the predicted response matrix at the current parameter values. All three terms are true marginal likelihoods; objective and gradient are internally consistent. Recommended for most applications and required for
vcov_mixed_subjects_mml()to produce the fully correct Louis-formula bread."human"L_pred uses posteriors computed from the observed (human) response matrix, frozen at
initial_pars. This is a fixed-nuisance Q-function: the predicted term is treated as a frozen expected-count lower bound rather than a true marginal likelihood. Objective and gradient are mutually consistent (both use the same frozen posteriors) so L-BFGS-B converges correctly. Useful when strong ability-level pairing is needed. Note thatvcov_mixed_subjects_mml()applies Louis' formula to the stored fixed posteriors, which is approximately correct wheninitial_parsis close toconv_pars.
Per-item lambda (vector lambda). When lambda is a length-n_items
vector rather than a scalar, fit_mixed_subjects_mml switches to a
frozen Q-function objective: expected-count counts are computed once from
initial_pars and held fixed during L-BFGS-B, with item j's counts
weighted by lambda[j]. This is a consistent (objective, gradient) pair
but is not the full marginal-MML objective — it is a frozen expected-count
approximation analogous to fit_mixed_subjects(). Per-item lambda values
obtained from tune_lambda_ability_risk_item() assign lambda_j near 0 to
items where the LLM correction is harmful, containing the frozen-posterior
gradient asymmetry. Document per-item lambda results as approximate.
Examples
set.seed(1)
pars <- data.frame(a = c(1, 1.2, 0.9), d = c(0, -0.5, 0.3))
observed <- simulate_2pl(rnorm(40), pars)
generated <- simulate_2pl(rnorm(100), pars)
fit <- fit_mixed_subjects_mml(
observed, observed, generated,
lambda = 0.5, initial_pars = pars, n_quad = 7,
control = list(maxit = 100)
)
fit$item_pars
#> item a d b
#> 1 1 1.486806 -0.2247580 0.1511684
#> 2 2 1.294408 -0.9772844 0.7550047
#> 3 3 0.480298 0.1161510 -0.2418311