IRT Modeling Lab

Detection of DIF Using the SIBTEST Procedure

The simultaneous item bias test (SIBTEST) is a nonparametric method of detecting differential item and test functioning that was developed as an extension of Shealy and Stout's (1993) multidimensional item response theory. In this framework, DIF is conceptualized as a difference in the probability of endorsing a keyed item response, occurring when individuals in groups having the same levels of the latent attribute of interest, possess different amounts of nuisance abilities that influence responding. For example, in the personality domain, one might compare the responses of job applicants and nonapplicants who had been administered an agreeableness scale. If the two groups had the same distributions of the agreeableness trait, but differed in their levels of impression management, a potential nuisance determinant, then DIF might result.

Features of SIBTEST

SIBTEST can be used to detect bias at either the item or testlet level (a testlet is a bundle of items). SIBTEST conducts DIF analyses using original item response data rather than parameter estimates from a program, such as BILOG, that makes strong assumptions about the processes underlying item responses.

Numerous options are available for the SIBTEST user to tailor his/her DIF analyses to the research problem. For example, SIBTEST can be used to identify items that are biased against a particular group or groups of examinees by selecting the f (focal), r (reference), or e (either) options. Moreover, if a subset of items suspected of having DIF can be identified a priori, perhaps by inspection of item content or classical statistics, it is possible to conduct a limited number of planned comparisons rather than multiple, single-item DIF studies. Unfortunately, such knowledge is not available to the user in many applications. Consequently, for a scale containing 'n' items, 'n' single-item DIF statistics must be computed to obtain p-values indicating the significance of the outcomes. The observed p-values must then be compared to a critical p-value, such as .05 / n, that is adjusted for the number of comparisons made. If an observed p-value is less than the "corrected" critical p-value, then the null hypothesis of no DIF can be rejected. This method is analogous to the identification of DIF using Lord's chi-square procedure.

SIBTEST vs. Mantel-Haenszel.

In conjunction with the SIB statistics and the corresponding p-values provided in the program output, the SIBTEST program gives the results for the widely used Mantel-Haenszel (MH) DIF detection procedure (Holland & Thayer, 1988). The MH procedure, which is essentially nonparametric, involves a comparison of the log-odds ratio of endorsing keyed responses for the focal and reference groups, computed after partitioning the sample into categories on the basis of number correct scores. MH removes the bias associated with target ability differences in the reference and focal groups by including the studied item in the total test score used for partitioning. SIBTEST, in contrast, handles this bias by implementing a regression correction discussed in detail by Shealy and Stout (1993).

Summary of the SIBTEST Procedure

SIBTEST detects bias by comparing the responses of examinees in the reference and focal groups that have been allocated to bins using their scores on a "matching subtest" (Stout & Roussos, 1996). The matching subtest is a subset of items that, ideally, are known to be unbiased. In most practical applications, however, the user does not have accurate a priori knowledge regarding bias. Therefore, one has two options. First, he/she can rely on a visual inspection of the items or on classical test statistics to identify the matching subtest before proceeding with DIF analyses. Alternately, if neither visual inspection nor classical statistics are useful for identifying a valid matching subtest, one can conduct an "automatic DIF analysis". In that case, SIBTEST will be run successively for i = 1 to n items, where on a given trial, the ith item is the object of study and the remaining n - 1 items constitute the matching subtest.

The choice of a matching subtest is important for the identification of DIF, but clearly subject to error. If one relies on visual inspection of item content or on classical statistics that confound DIF with impact, then the matching subtest might contain a few biased items. On the other hand, if one conducts an automatic DIF analysis and DIF is present in at least one item of the scale, the matching subtest is certain to be contaminated on all but one analysis. Fortunately, simulation studies have shown that the SIBTEST and MH procedures are tolerant of small to moderate amounts of contamination of the matching criterion (Shealy & Stout, 1993). These studies have found that the Type I error rates are not inflated substantially when the matching subtest contains relatively few biased items, but Type II errors are more likely because the power to detect DIF is reduced.

Using SIBTEST When Pervasive DIF is Present

If an automatic DIF analysis reveals that nearly all n items in a scale are biased, a slight modification of the exploratory SIB procedure can be used in an attempt to identify a valid subset of items for further analyses. For example, if the first iteration of an automatic DIF analysis indicates that 10 of 12 items display DIF, it is likely that the matching subtest contains numerous biased items. Consequently, the item having the largest SIB statistic, and the smallest corresponding p-value, should be eliminated from the matching subtest and the analysis repeated. Through successive iterations, it is hoped that one can eventually identify a subset of items that are unbiased, so that one can define a valid matching subtest for an additional DIF analysis of the remaining items. However, if too few unbiased items remain (e.g. less than half the items in the scale) then it is unwise to proceed.

Running SIBTEST

SIBTEST is a commercial software package available for purchase from Assessment Systems. The DOS version of SIBTEST implements nonparametric DIF analysis using two executable files. First, you must run SIBIN.EXE. You will be prompted to answer several questions concerning your response data and the type of analysis you wish to conduct. SIBIN creates an input file containing this information, which is used by the SIBTEST.EXE program. Once an input file has been created, type SIBTEST to perform the DIF analysis.

Because different versions of SIBTEST are in circulation, we will provide only basic tips for responding to the questions asked by SIBIN.EXE. For many applications, using the default options is sufficient. For more detailed analyses, we urge you to consult the most recent SIBTEST manual.

GUIDELINES for SIBIN.EXE

  1. When prompted for the name of an input file for SIBTEST.EXE, enter 0 for the default option. This will save your input specifications in a file called SIB.INPUT

  2. Enter a two-line title describing your analysis.

  3. Enter the total number of items in the dataset to be analyzed. This value is used for reading the response data.

  4. Enter the name of the reference group "test scores" (i.e., response data).

  5. Enter the name of the focal group "test scores" (i.e., response data).

  6. Enter 1 if there are no spaces between item responses in the data files; 2 if space or comma delimited.

  7. Enter 0 to write SIBTEST output to the default file, SIB.OUTPUT.

  8. SIBTEST creates "cells" (groups of respondents) for computing covariances. Enter 2 as the minimum required number of examinees per cell (this is the default value).

  9. You will be asked to enter a probability for guessing correctly. We recommend choosing a value between 0 and .25.

  10. You will be asked to make a decision about the items to include in the "matching" and "assessment" subtests. This can be quite complicated, so for inexperienced users, we recommend entering 1 to conduct single-item DIF analyses, where SIBTEST automatically selects the items. (Alternatively, at this step, you have the option of specifying groups of items (bundles) to be examined on successive runs. This is discussed in more detail in our description of SIBTEST.) Your answer to this question determines what information you will be prompted for subsequently.

  11. Assuming you entered 1 in the previous step, you must now enter the number of items to be included in the DIF study. For our example data, enter 11.

  12. You will be asked to choose the level of detail provided in the output file. For most purposes, the abbreviated output is sufficient, so enter 0.

  13. Enter 0 for the default pooled weighting.

  14. Enter 0 to choose a single type of p-value for all the runs.

  15. You can decide whether SIBTEST will detect DIF against the focal (f), reference (r), or either (e) groups. Unless you have a specific hypothesis about the direction of DIF, enter e for either.
To conduct the DIF analysis, type SIBTEST at a DOS prompt. Enter 0 to read input specifications from the default file, SIB.INPUT, or specify an alternative filename.

Examining SIBTEST Output

The first part of SIBTEST output summarizes the input specifications. Check to make sure that you entered the information correctly.

Next, you will see the table of DIF results. The beta-uni statistic may be viewed as an estimate of the magnitude of DIF (see the manual for discussion). The SIB-uni p-values are located in column 4. The E next to each p-value indicates that we chose to identify DIF against either the reference or focal groups. To maintain a family alpha level of .05, we recommend choosing a critical value of .05 divided by the number of items. Thus, in this case, the critical value for DIF identification is .05/11=.0045. Items 1,2,3,7, and 11 therefore exhibit DIF.

Output

The results for the nonparametric Mantel-Haenszel method are also presented for comparison in columns 5-7. the p-values in column 6 indicate which items exhibit DIF. Using a critical p-value of .05/11, items 1,2,3, and 11 exhibit DIF.



Back