Electronic Theses and Dissertation Database
Library Home  |  ` Library Catalog  |  ETD Home  |  Browse ETDs  |  Search ETDs  |  ETD Resources

Title page for ETD etd-07262006-090929


Type of Document Dissertation
Author Domaleski, Christopher Stephen
Author's Email Address domaleski@usa.net
URN etd-07262006-090929
Title Exploring the Efficacy of Pre-Equating a Large Scale Criterion-Referenced Assessment with Respect to Measurement Equivalence
Degree Ph.D.
Department Educational Policy Studies
Advisory Committee
Advisor Name Title
T.C. Oshima Committee Chair
Carolyn Furlow Committee Member
John H. Neel Committee Member
Malina K. Monaco Committee Member
Toshi Kii Committee Member
William L. Curlette Committee Member
Keywords
  • assessment
  • equating
Date of Defense 2006-04-19
Availability unrestricted
Abstract
This investigation examined the practice of relying on field test item calibrations in advance of the operational administration of a large scale assessment for purposes of equating and scaling. Often termed “pre-equating,” the effectiveness of this method is explored for a statewide, high-stakes assessment in grades three, five, and seven for the content areas of language arts, mathematics, and social studies.

Pre-equated scaling was based on item calibrations using the Rasch model from an off-grade field test event in which students tested were one grade higher than the target population. These calibrations were compared to those obtained from post-equating, which used the full statewide population of examinees.

Item difficulty estimates and Test Characteristic Curves (TCC) were compared for each approach and found to be similar. The Root Mean Square Error (RMSE) of the theta estimates for each approach ranged from .02 to .12. Moreover, classification accuracy for the pre-equated approach was generally high compared to results from post-equating. Only 3 of the 9 tests examined showed differences in the percent of students classified as passing; errors ranged from 1.7 percent to 3 percent.

Measurement equivalence between the field test and operational assessment was also explored using the Differential Functioning of Items and Tests (DFIT) framework. Overall, about 20 to 40 percent of the items on each assessment exhibited statistically significant Differential Item Functioning (DIF). Differential Test Functioning (DTF) was significant for fully 7 tests. There was a positive relationship between the magnitude of DTF and degree of incongruence between pre-equating and post-equating.

Item calibrations, score consistency, and measurement equivalence were also explored for a test calibrated with the one, two, and three parameter logistic model, using the TCC equating method. Measurement equivalence and score table incongruence was found to be slightly more pronounced with this approach.

It was hypothesized that differences between the field test and operational tests resulted from 1) recency of instruction 2) cognitive growth and 3) motivation factors. Additional research related to these factors is suggested.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  domaleski_christopher_s_200608_phd.pdf 2.43 Mb 00:11:15 00:05:47 00:05:04 00:02:32 00:00:12

Browse All Available ETDs by ( Author | Department )

Click here to send a comment to ETD Support