Skip to contents

Fits item parameters using observed human responses, paired LLM responses/predictions for those same subjects, and generated or unlabeled LLM responses. This implements the expected-count objective

Usage

fit_mixed_subjects(
  observed,
  predicted,
  generated,
  lambda = 1,
  n_quad = 31,
  initial_pars = NULL,
  quadrature = NULL,
  common_predicted_weights = TRUE,
  paired_missing = c("match_observed", "allow"),
  slope_lower = 1e-04,
  slope_upper = NULL,
  control = list(maxit = 500),
  ...
)

Arguments

observed

Human response matrix, with rows for subjects and columns for items. Values must be binary when initial_pars is omitted.

predicted

Binary LLM responses (0/1) for the same rows and items as observed. Probabilities are not accepted: fractional values are not a valid likelihood input for the marginal IRT objective and break the PPI correction, so sample binary responses from any probabilities first (e.g. rbinom).

generated

Binary generated or unlabeled LLM responses (0/1) for the same item columns. Probabilities are not accepted (see predicted).

lambda

Power-tuning parameter in [0, 1].

n_quad

Number of standard-normal quadrature nodes.

initial_pars

Optional starting item parameters. If omitted, a 2PL model is fit to observed.

quadrature

Optional quadrature grid with theta and weight columns.

common_predicted_weights

Logical; if TRUE, reuse the observed human posterior weights for predicted.

paired_missing

How to handle missingness when common_predicted_weights = TRUE. The default, "match_observed", requires observed and predicted to have the same missingness pattern so the paired LLM correction is evaluated only where a human label is present. Use "allow" only for explicit sensitivity analyses.

slope_lower

Lower bound for discrimination parameters during optimization. Use NULL for no lower bound.

slope_upper

Upper bound for discrimination parameters during optimization. Use NULL (the default) for no upper bound. Setting a finite bound (e.g. 4) can stabilize the frozen expected-count fit when the LLM parameters differ substantially from the human parameters.

control

Control list passed to stats::optim().

...

Additional arguments passed to fit_2pl() when initial_pars is omitted.

Value

An object of class "mixedsubjects_fit" with fitted item_pars, optimizer details, quadrature summaries, and input settings.

Details

L_human + lambda * (L_generated - L_paired_llm).

By default the paired LLM responses reuse the posterior quadrature weights from the observed human responses. This keeps the paired human and LLM terms on the same latent covariate distribution, which is the closest analog to prediction-powered inference with paired labels.

Examples

set.seed(1)
pars <- data.frame(a = c(1, 1.2, 0.9), d = c(0, -0.5, 0.3))
observed <- simulate_2pl(rnorm(40), pars)
predicted <- observed
generated <- simulate_2pl(rnorm(100), pars)
fit <- fit_mixed_subjects(
  observed, predicted, generated,
  lambda = 0.5, initial_pars = pars, n_quad = 7,
  control = list(maxit = 50)
)
fit$item_pars
#>   item         a           d           b
#> 1    1 1.0503170 -0.08682871  0.08266905
#> 2    2 1.2202867 -0.82420815  0.67542171
#> 3    3 0.8047358  0.20886422 -0.25954384