Cluster Analysis for Differential Item Functioning
Thai, Alvie
Thai, Alvie
Citations
Altmetric:
Soloist
Composer
Publisher
Date
2025-05-19
Additional date(s)
Abstract
One of the major challenges in education is accurately quantifying a student's knowledge and skills. Since we cannot directly measure a student's true intelligence, we rely on test performance, which serves as an imperfect representation of their abilities. This issue arises in many statistical applications where the key problem involves a population in which each individual possesses an underlying ability or trait that cannot be directly observed but can only be inferred through proxy variables. However, these proxies are often contaminated, providing only a noisy or imperfect approximation of the true latent variable.
This project focuses on techniques for recovering latent variables from noisy data. In this context, "recovery" refers to estimating the latent variable using indirect observations. Assuming a linear relationship between the latent trait and the observed proxy variables, we can estimate model parameters and subsequently recover the values of the latent variables.
Specifically, we will examine statistical approaches to latent variable recovery when the test contains items that exhibit differential item functioning (DIF). This means that certain test items do not solely measure the intended knowledge or ability but are also biased toward specific groups. The objective is to develop methods that detect the presence of DIF and adjust for it, allowing for a more accurate estimation of the underlying abilities.
To illustrate these methods, we will use the Holzinger-Swineford dataset, a well-known dataset in psychometrics used to analyze cognitive abilities across multiple domains. This dataset includes 88 observations with scores in five areas: Mechanical Comprehension, Verbal or Visual Comprehension, Algebra Operations, Analytical Operations, and Statistical Reasoning. By applying a linear contamination model, we aim to recover each student's latent ability while accounting for DIF.