simibd V1.0
Last Modification: December 7, 1995


README.FIRST


INTRODUCTION

This directory and the files within contain the software implementation of the algorithms described in Davis et al., "Nonparametric simulation-based statistics for detecting linkage in general pedigrees," Am J Hum Genet 57(4):A190

The software package is divided into two distinct executables. The executable "simibd" performs the actual statistical calculations and directs the second executable, "xslink", to perform the simulations necessary to describe the simulated null distribution. "xslink" is a version of "fastslink", which is further described in the README.XSLINK file.

REFERENCES

Please use the following four references when using results from these programs.

The algorythms used in simibd are based on:

  1) Davis S, Schroder M, Goldin LR, and Weeks DE, "Nonparametric 
     simulation-based statistics for detecting linkage in general
     pedigrees," Am J Hum Genet, 57(4): A190.

Also, please reference the following three papers that deal with the SLINK code:

SLINK implements a simulation algorithm developed by Jurg Ott and described in:

  2) Ott J (1989) Computer-simulation methods in human linkage
     analysis.  Proc Natl Acad Sci USA 86:4175-4178

The algorithm was implemented in the original SLINK computer program package by Weeks, Ott, and Lathrop:
  3) Weeks DE, Ott J, Lathrop GM (1990) SLINK: a general simulation 
     program for linkage analysis. Am J Hum Genet 47:A204 (abstr)
The SLINK simulation program has been modified by Schaffer and Weeks to use the algorithms developed by Cottingham et al:
  4) Cottingham Jr RW, Idury RM, Schaffer AA (1993) Faster sequential
     genetic linkage computations. Am J Hum Genet 53:252-263

Please cite references 1-4 if you use this package. Thank you.

COMPILATION

In order to use the programs included here, you must compile them to run on your machine. We recommend that you read the README.XSLINK before doing this, as there are compilation instructions specific to "xslink" included there. However, simply issuing the command:

        make

will produce the executables "simibd" and "xslink" that can perform the SimIBD and SimISO calculations (see reference). Note that if you have a version of the program already "made", you must issue the command:
        make cleanall

before attempting to make a new version.

In our experiences, producing optimized code provides a significant increase in speed while still affording correct answers. If you are unfamiliar with producing optimized code when compiling, see your system administrator for assistance. We recommend using gcc with at least one level of optimization, the default setting for compiling with make. The default compiler and options can be changed by editing the "Makefile" in this directory.

USE

Using simibd is simply a matter of having a LINKAGE formatted pedigree and locus file. Then, issue the command:

simibd <pedfile> <locfile>

where <pedfile> is the name of the pedigree file and <locfile> is the name of the locus or data file. You will be prompted for several pieces of information including the family weighting function, the total number of replicates (this number will determine the accuracy of the resulting p-value), the number of replicates per xslink run (this number allows the run to be broken up into smaller pieces, thereby reducing the amount of disk space neecessary for calculation), the number of the trait locus, the value of the trait signifying affection, and the marker locus to be analyzed. We recommend using the weighting function 1/sqrt(p) [p is the population frequency of a given allele]. The number of replicates will vary depending on the level of accuracy desired for the p-value and, to a lesser extent, on the number of affecteds in the pedigree. The number of replicates per xslink run is dependent on the amount of disk space that you wish to devote to writing the replicates generated by xslink. Try using 10% of the total number of replicates as a starter value (i.e., if you are using 1000 replicates, try using 100 replicates per xslink run).

If you would like to perform either the SimISO and SimAPM statistics in addition to the SimIBD statistic, issue the same command as above, but with the optional "-a" or "-u". For example, to get the SimAPM result, issue the command:

simibd -a <pedfile> <locfile>

The command
simibd -au <pedfile> <locfile>

will generate results for SimIBD, SimISO, and SimAPM.

Simibd attempts to keep you up-to-date about its progress and will estimate time necessary to complete the current task at hand. When run is complete, you will have before you a great deal of information. For your convenience, the brief output is contained in the file "simibd.out". This contains only the summary values of the statistics and the assosiated p-values for each family. Other output files are available with more detailed results including histograms of the data.

The output files are summarized below:

FILE CONTENTS


simibd.out                Brief output from all statistics run
                          & long output from the SimISO calculations
simibd.aff.out            Long output from the SimIBD calculations
simibd.un.out             Long output from the Unaffected calculations
apm.out                   Long output from the SimAPM calculations

Temporary files used for communication with xslink are:

FILE CONTENTS


simped.dat                Pedigree file for xslink to use
simdata.dat               Locus file for xslink to use
pedfile.dat               Simulated replicates generated by xslink

NUMBER OF REPLICATES TO SIMULATE

The accuracy of p-value obtained is a function of the number of replicates simulated to produce the null distribution. For example, if it is reqired that the p-value be accurate to three decimal places, then at least 1000 replicates must be simulated. However, it may be beneficial to use a larger number of replicates when the pedigrees being simulated have a large number of affected individuals because covering the large sample space adequately enough to produce a null distribution may take more than the number of replicates needed for p-value accuracy. In practice, if there is much variability in the p-value obtained from the same data run several times, increase the number of replicates simulated.

QUESTIONS AND COMMENTS

If you have any comments about how to improve this code, please address them to:

      Sean Davis
      University of Pittsburgh
      Department of Human Genetics
      A300 Crabtree Hall
      130 DeSoto Street
      Pittsburgh, PA 15213
      davis@moriarty.hgen.pitt.edu
            
               or

      Daniel E. Weeks
      University of Pittsburgh
      Department of Human Genetics
      A310 Crabtree Hall
      130 DeSoto Street
      Pittsburgh, PA 15213
      daniel.weeks@well.ox.ac.uk