simibd v 1.0
Last mod: January 5, 1996
In order to perform the simulations necessary for constructing the null distribution, SimIBD calls xslink, a modified version of SLINK, that is described in the paper Faster Sequential Genetic Linkage Computations (American Journal of Human Genetics, 53(1993), pp. 252-263)
The code here, once compiled, results in a file called xslink that
is the modified version of SLINK, described above.
SOURCE CODE ORGANIZATION
The code is now broken up into .h files which have just definitions and declarations, and .c files that have just C procedures. Each file opens with a comment explaining what role that file plays. Here we summarize which files are needed for which programs:
File Name
commondefs.h
sldefs.h
moddefs.h
slautomodified.c
commongetvect.c
commonnuclear.c
iostuff.c
slgetvect.c
xslink.c
slinputcode.c
sloldnuclear.c
oldsegup.c
slsexmodified.c
Most of the modified code is in the following files:
moddefs.h definitions and declarations associated
slautomodified.c contains the fast, but memory intensive
version of the new code.
slsexmodified.c contains the sexlinked version of
slautomodified.c
oldsegup.c contains the old (p2c version of original
programs) versions of segup() and segsexup(),
which are needed for handling mutation data.
We have changed seg() in the original code so that the modified
routines are called only for mutation-less data. For data with
mutation, the old routines are called for compatibility. These have
not been modified yet. Two routines with the names oldsegup() and
oldsegsexup() are called which are the same as the old segup() and
segsexup().
COMPILATION
Though typing make at the shell prompt will make all the files necessary to run SimIBD, there may be situations in which it is desirable to compile xslink only. Here are the instructions for doing so.
To compile xslink by itself, simply type:
make xslinkPlease read the discussion below about constant definitions before trying to compile.
The Makefile we are distributing uses the gcc compiler distributed by the Free Software Foundation. If gcc is not available to you or you would like to use the cc compiler instead, you can change this by:
When you issue the make command, add the tag CC=cc; this will override the setting in Makefile; e.g.,
make xslink CC=cc
In general, we recommend using gcc instead of cc, because gcc produces
faster machine code.
If you are unfamiliar with these concepts and want help, see your system administrator for information about how your particular system is organized.
There are 4 constants that are defined in commondefs.h and 2 constants that are defined in moddefs.h that you will want to set before compiling. This means that you will edit the files to put in the appropriate numbers and then compile.
The constants in commondefs.h are:
maxhap
maxclasssize
maxneed
maxprobclass
maxhap and maxneed have been in the programs all along, the other two are new.
The constants in moddefs.h are
AUTOSOMAL_RUN
The reasons to set the constants correctly are:
So, set the constants!
MAXHAP is the maximum number of possible joint haplotypes. It is the product of the number of alleles at each locus in the analysis. For example, if you want to do a 3-locus SLINK run in which the 3 loci have 2,3, and 5 alleles respectively, set maxhap to 30 (which is 2x3x5).
MAXCLASSSIZE is the maximum size of an isozygote class for one individual. It should be set to 2^(k-1) where k is the number of loci. In numbers:
Use 2 for 2 loci
Use 4 for 3 loci
Use 8 for 4 loci
Use 16 for 5 loci
and so on.
MAXNEED is the number of different recombination probabilities needed. Let k>=3 be the number of loci. The formula to compute maxneed is:
maxneed = C(k,0)*1^2 + C(k,1)*1^2 + C(k,2)*2^2 + C(k,3)*4^2
+ C(k,4)*8^2 + C(k,5)*16^2
+ ... C(k,k)*maxclassize * maxclasssize
The notation C(k,i) stands for the binomial coefficient k choose i. That is the number of ways of choosing i objects out of k when order doesn't matter and repetition is not allowed. Three important values for maxneed are:
maxneed = 7 for 2 loci (use this value for LODSCORE)
maxneed = 32 for 3 loci
maxneed = 157 for 4 loci
maxneed = 782 for 5 loci
MAXPROBCLASS. The formula for maxprobclass is similar to maxneed. It is:
maxprobclass = C(k,0)*1 + C(k,1)*1 + C(k,2)*2 + C(k,3)*4
+ C(k,4)*8 + ... C(k,k)*maxclasssize.
Some simple values are:
maxprobclass = 14 for 3 loci
maxprobclass = 41 for 4 loci
maxprobclass = 122 for 5 loci
Both maxneed and maxprobclass satisfy simple recurrence relations on k.
AUTOSOMAL_RUN must be 1 for SimIBD.
These programs can require large amounts of memory. For instance xslink as configured in this distribution and compiled with
make xslinkrequires less than 1 Mb. Of course the amount of memory required is very dependent on the number of loci and the number of alleles at each locus. However larger amounts of memory is not a problem to run under Sun OS for instance, because this is a virtual memory operating system. Ideally one would want to run a program of this size on a machine with 32 Mb of memory, but in our experience it is possible to run on machines with as little as 12 Mb.
Of course it is necessary to have a swap file with sufficient space to run the OS and have enough free space to for the program.
To see how much space a program requires, use the unix command:
size <program name>
for instance:
unix> /usr/bin/size xslink
text data bss dec hex
131072 8192 830152 969416 ecac8
This value under "dec" is the decimal number of bytes for the whole program. So we see in this case that less than 1 Mbyte is required.
Then compare this with the unix pstat command:
unix> /etc/pstat -s
2992k allocated + 688k reserved = 3680k used, 61676k available
This indicates that a total of 3680 Kbytes has been allocated by programs in execution on this system for swap space, and with the current job mix, another 61.7 Mb are available. So in this case xslink will be able to run.
To enlarge the swap space consult your local system administrator.