simibd v 1.0
Last mod: January 5, 1996


README.XSLINK


INTRODUCTION

In order to perform the simulations necessary for constructing the null distribution, SimIBD calls xslink, a modified version of SLINK, that is described in the paper Faster Sequential Genetic Linkage Computations (American Journal of Human Genetics, 53(1993), pp. 252-263)

The code here, once compiled, results in a file called xslink that is the modified version of SLINK, described above.

SOURCE CODE ORGANIZATION

The code is now broken up into .h files which have just definitions and declarations, and .c files that have just C procedures. Each file opens with a comment explaining what role that file plays. Here we summarize which files are needed for which programs:

File Name
commondefs.h
sldefs.h
moddefs.h
slautomodified.c
commongetvect.c
commonnuclear.c
iostuff.c
slgetvect.c
xslink.c
slinputcode.c
sloldnuclear.c
oldsegup.c
slsexmodified.c

Most of the modified code is in the following files:

    moddefs.h           definitions and declarations associated

    slautomodified.c    contains the fast, but memory intensive 
                        version of the new code.

    slsexmodified.c     contains the sexlinked version of
                        slautomodified.c

    oldsegup.c          contains the old (p2c version of original
                        programs) versions of segup() and segsexup(),
                        which are needed for handling mutation data.

We have changed seg() in the original code so that the modified routines are called only for mutation-less data. For data with mutation, the old routines are called for compatibility. These have not been modified yet. Two routines with the names oldsegup() and oldsegsexup() are called which are the same as the old segup() and segsexup().

COMPILATION

Though typing make at the shell prompt will make all the files necessary to run SimIBD, there may be situations in which it is desirable to compile xslink only. Here are the instructions for doing so.

To compile xslink by itself, simply type:

make xslink
Please read the discussion below about constant definitions before trying to compile.

The Makefile we are distributing uses the gcc compiler distributed by the Free Software Foundation. If gcc is not available to you or you would like to use the cc compiler instead, you can change this by:

When you issue the make command, add the tag CC=cc; this will override the setting in Makefile; e.g.,

        make xslink CC=cc
In general, we recommend using gcc instead of cc, because gcc produces faster machine code.

If you are unfamiliar with these concepts and want help, see your system administrator for information about how your particular system is organized.

Constant definitions - VERY IMPORTANT!!!

There are 4 constants that are defined in commondefs.h and 2 constants that are defined in moddefs.h that you will want to set before compiling. This means that you will edit the files to put in the appropriate numbers and then compile.

The constants in commondefs.h are:

        maxhap
        maxclasssize
        maxneed
        maxprobclass

maxhap and maxneed have been in the programs all along, the other two are new.

The constants in moddefs.h are

AUTOSOMAL_RUN

The reasons to set the constants correctly are:

  1. If the constants are too low, you may get the wrong answer.
  2. If the constants are too high, you may run out of space unnecessarily.
  3. If the constants are too high, it slows down the program unnecessarily.

So, set the constants!

MAXHAP is the maximum number of possible joint haplotypes. It is the product of the number of alleles at each locus in the analysis. For example, if you want to do a 3-locus SLINK run in which the 3 loci have 2,3, and 5 alleles respectively, set maxhap to 30 (which is 2x3x5).

MAXCLASSSIZE is the maximum size of an isozygote class for one individual. It should be set to 2^(k-1) where k is the number of loci. In numbers:

        Use  2 for 2 loci
        Use  4 for 3 loci
        Use  8 for 4 loci
        Use 16 for 5 loci
        and so on.

MAXNEED is the number of different recombination probabilities needed. Let k>=3 be the number of loci. The formula to compute maxneed is:

        maxneed = C(k,0)*1^2 + C(k,1)*1^2 + C(k,2)*2^2 + C(k,3)*4^2
                        + C(k,4)*8^2 + C(k,5)*16^2
                        + ... C(k,k)*maxclassize * maxclasssize

The notation C(k,i) stands for the binomial coefficient k choose i. That is the number of ways of choosing i objects out of k when order doesn't matter and repetition is not allowed. Three important values for maxneed are:

        maxneed =   7 for 2 loci (use this value for LODSCORE)
        maxneed =  32 for 3 loci
        maxneed = 157 for 4 loci
        maxneed = 782 for 5 loci

MAXPROBCLASS. The formula for maxprobclass is similar to maxneed. It is:

        maxprobclass = C(k,0)*1 + C(k,1)*1 + C(k,2)*2 + C(k,3)*4
                        + C(k,4)*8 + ... C(k,k)*maxclasssize.

Some simple values are:

        maxprobclass =  14 for 3 loci
        maxprobclass =  41 for 4 loci
        maxprobclass = 122 for 5 loci

Both maxneed and maxprobclass satisfy simple recurrence relations on k.

AUTOSOMAL_RUN must be 1 for SimIBD.

Memory Requirements

These programs can require large amounts of memory. For instance xslink as configured in this distribution and compiled with

  make xslink
requires less than 1 Mb. Of course the amount of memory required is very dependent on the number of loci and the number of alleles at each locus. However larger amounts of memory is not a problem to run under Sun OS for instance, because this is a virtual memory operating system. Ideally one would want to run a program of this size on a machine with 32 Mb of memory, but in our experience it is possible to run on machines with as little as 12 Mb.

Of course it is necessary to have a swap file with sufficient space to run the OS and have enough free space to for the program.

To see how much space a program requires, use the unix command:

        size <program name>
for instance:
        unix> /usr/bin/size xslink

        text    data    bss     dec     hex
        131072  8192    830152  969416  ecac8

This value under "dec" is the decimal number of bytes for the whole program. So we see in this case that less than 1 Mbyte is required.

Then compare this with the unix pstat command:

        unix> /etc/pstat -s

2992k allocated + 688k reserved = 3680k used, 61676k available

This indicates that a total of 3680 Kbytes has been allocated by programs in execution on this system for swap space, and with the current job mix, another 61.7 Mb are available. So in this case xslink will be able to run.

To enlarge the swap space consult your local system administrator.