The general idea is to have a simple "model" program that has some buggy statements. There are multiple execution paths through the program and some of those which execute buggy statements lead to failure. The details of the model are expressed using a C .h file that can be generated from a high level description using some Prolog code. The C code simulates SBFL using the model to evaluate different approaches, notably different set similarity measures.
The idea of SBFL is that we have a bunch of test cases (corresponding to a multiset of execution paths), some of which fail. Set similarity measures are used to compare the set of failed tests with the set of tests in which each respective statement is executed. This is used to rank the statements accoding to their "likelihood" of being buggy. Performance is measured based on where the bugs appear in the ranking (eg, the percentage of code required to be looked at in order to find the top ranked bug). The performance depends on the set of test cases so we pick a fixed number of tests T and generate multiple (eg 10000000) multisets of T execution paths to compute average performance (for small models and T we can use every possible multiset but generally we need this sampling approach). The code supports a large number of similarity measures and adding more is very easy. It also allows various other parameters to be adjusted and has been used for many experiments.
More details are available via: