by Dan Weeks
(1993)
DESCRIPTION
This program is designed to efficiently compute the APM statistic for a number of pedigrees over a number of marker loci. It is capable of storing the results of kinship calculations internally, greatly speeding up computations for different markers on the same pedigree. It can also store kinship information in a file to be used later.
Internally storing the kinship values requires large amounts of memory. In order that this program may be used on systems with limited memory, a scheme to limit the amount of information stored has been employed. Unfortunately, some pascal compilers halt with an error when a new() can't allocate memory; this makes the memory limiting very necessary but of limited use. Users with those compilers may have to set a very small limit.
The program has also been modified to exclude parent-child pairs from the APM statistic. It has been found that this yields greater power in detecting linkage.
LIMITS
As distributed, the program can handle up to 400 families with up to 60 members each and 20 marker loci of up to 40 alleles each. Pedigree titles can be up to 80 characters long and locus names can be up to 10 characters long.
The maximum number of families, members, loci, and alleles can all be changed by altering constants at the beginning of the program. There are other constants that depend on these, as described in the comments, and they should be changed as well.
The problem, of course, with just making these numbers huge is that some compilers have limits on the amount of memory used for variables in each block.
INPUT FILE FORMATS
The program uses a file format which supports multiple loci. It must include for each locus the locus name, the number of alleles, the allele frequencies, and the genotypes for each of the affecteds. The format of the pedigree data file is:
<number of pedigrees> <number of loci>
<number of alleles at locus #1> <name of locus #1>
<list of allele frequencies for locus #1>
<number of alleles at locus #2> <name of locus #2>
.
.
.
<title of pedigree #1>
<number of members> <number of affecteds> <number of typed loci>
<list of mothers>
<list of fathers>
<list of all affecteds>
<locus number> <list of genotypes>
<locus number> <list of genotypes>
.
.
.
<title of pedigree #2>
.
.
.
The locus numbers are the numerical ID's of the loci as determined from the order of their entries at the beginning of the file (the first locus is numbered 1, the second 2, etc.). The lines containing the locus number and list of genotypes must be ordered so that the locus numbers are increasing. The genotypes are thus arranged in the form of a table, with affected id's along the top in increasing order (moving to the right) and locus id's along the left in increasing order (moving downward); for example, a valid table of genotypes might be:
3 4 5 7 <- list of all affecteds
1 1 1 2 1 1 1 1 2 <- locus number and list of genotypes
3 1 2 1 1 0 0 1 1 <- ""
4 1 1 1 2 1 1 1 1 <- ""
which means that, for the first locus, 3 has genotype 1/1, 4 has genotype 2/1, 5 has genotype 1/1, and 7 has genotype 1/2. For the third locus, 3 is typed as 1/2, 4 as 1/1, 5 is untyped, and 7 has genotype 1/1. And so on. The family is untyped for locus 2 and any loci after the fourth.
Here is a real example data file:
2 3
3 ACK1
0.450 0.300 0.250
3 ACK2
0.550 0.200 0.250
2 ACK3
0.465 0.535
TESTPED 1
15 3 3
0 0 2 2 2 0 0 0 5 5 5 7 12 12 12
0 0 1 1 1 0 0 0 6 6 6 8 11 11 11
4 10 13
1 2 2 2 2 1 2
2 3 3 0 0 2 3
3 0 0 1 1 1 2
TESTPED 2
8 2 1
0 0 1 1 1 0 6 6
0 0 2 2 2 0 5 5
4 7
2 2 3 3 3
As previously mentioned, apm can produce a file containing kinship data, which it can later reuse. This file maintains the results of summations accumulated in the kinship calculations for each family and for each list of affecteds. Care must be taken that it does not go out-of-date with respect to the pedigree data file; the only changes that can be made in the pedigree data file that will not render the file of kinship data invalid are changes in the locus names, the number of alleles, the allele frequencies, and the list and number of affecteds for any or all pedigrees. If a pedigree name or overall structure is changed, the old data file will no longer be valid (if just the name is changed, it can also be changed in the kinship data file; doing so will make the file usable in combination with the new pedigree file).
This is the format of the file of kinship data:
<title of pedigree #1>
<number of records for this pedigree1>
<first list of affected members1>
<E(Z), the overall mean of the mean marker similarities E(Zij)1>
<the sum of all Phi[(i,j,k,l)]1>
<the sum of all Phi[(i,j)(k,l)]1>
<the sum of all Phi[(i,j,k)(l)], Phi[(i,j,l)(k)], Phi[(i,k,l)(j)],
Phi[(j,k,l)(i)], Phi[(i,l)(j,k)], and Phi[(i,k)(j,l)]1>
<the sum of all Phi[(i,j)(k)(l)] and Phi[(i)(j)(k,l)]1>
<the sum of all Phi[(i,k)(j)(l)], Phi[(i,l)(j)(k)],
Phi[(j,l)(i)(k)], Phi[(j,k)(i)(l)]1>
<the sum of all Phi[(i)(j)(k)(l)]1>
<second list of affected members1>
.
.
.
<title of pedigree #21>
.
.
.
where
Phi[]is the kinship coefficient. An example file is:
TESTPED1
3
10 13
1.250000e-01
3.125000e-02
0.000000e+00
3.437500e-01
6.250000e-02
4.375000e-01
1.250000e-01
4 13
6.250000e-02
1.562500e-02
0.000000e+00
2.968750e-01
3.125000e-02
4.687500e-01
1.875000e-01
4 10 13
3.125000e-01
1.250000e-01
2.343750e-02
1.898438e+00
6.406250e-01
4.203125e+00
2.109375e+00
TESTPED2
1
4 7
1.250000e-01
3.125000e-02
0.000000e+00
3.437500e-01
6.250000e-02
4.375000e-01
1.250000e-01
OUTPUT FILE FORMATS
The output files out1.dat, outsqr.dat, and out1p.dat are intended for use with the sim program. The formats of all of these files are the same (they differ in the function used to weight the allele frequencies: f(p) = 1, f(p) = 1/sqrt(p), f(p) = 1/p, respectively):
<magic number>
<number of pedigrees> <number of loci>
<number of alleles for locus #1> <name of locus #1>
<frequency of allele #1>
<frequency of allele #2>
.
.
<number of alleles for locus #2> <name of locus #2>
.
.
<title of pedigree #1>
<number of members> <number of affecteds> <number of loci>
<list of mothers>
<list of fathers>
<locus number> <number of affecteds at this locus>
<affected #1>
<affected #2>
.
.
<cumulative mean> <cumulative variance>
<locus number> <number of affecteds at this locus>
.
.
<title of pedigree #2>
.
.
<statistic for first locus> <p-value>
<statistic for second locus> <p-value>
.
.
The output file out1.dat for the above pedigree file looks like this:
1
2 3
3 ACK1
0.45000
0.30000
0.25000
3 ACK2
0.55000
0.20000
0.25000
2 ACK3
0.46500
0.53500
TESTPED1
15 3 3
0 0 2 2 2 0 0 0 5 5 5 7 12 12 12
0 0 1 1 1 0 0 0 6 6 6 8 11 11 11
1 3
4
10
13
1.26656
0.27363
2 2
4
13
0.44219
0.07280
3 2
10
13
0.56464
0.05909
TESTPED2
8 2 1
0 0 1 1 1 0 6 6
0 0 2 2 2 0 5 5
2 2
4
7
0.47938
0.06961
1.40212 0.08043
0.20678 0.41808
-0.26594 0.60486
In addition to those three files, a summary file is produced, named 'table.out'. It contains the locus information, the statistics and their p-values for all families, and the overall statistics and their p-values. The format is obvious; here is the file produced alongside out1.dat above:
allele frequencies:
ACK1 0.45000 0.30000 0.25000
ACK2 0.55000 0.20000 0.25000
ACK3 0.46500 0.53500
family mean variance observedx Na statistic
TESTPED1 <--- pedigree title
LOCUS 1 ACK1
f(p) = 1 1.26656 0.27363 2.00000 3 1.40212
f(p) = 1/sqrt(p) 2.12586 0.70359 3.65148 3 1.81882
f(p) = 1/p 3.62500 2.19444 6.66667 3 2.05329
LOCUS 2 ACK2
f(p) = 1 0.44219 0.07280 0.50000 2 0.21426
f(p) = 1/sqrt(p) 0.68899 0.16435 1.00000 2 0.76717
f(p) = 1/p 1.12500 0.54403 2.00000 2 1.18630
LOCUS 3 ACK3
f(p) = 1 0.56464 0.05909 0.50000 2 -0.26594
f(p) = 1/sqrt(p) 0.79652 0.11693 0.73324 2 -0.18508
f(p) = 1/p 1.12500 0.23499 1.07527 2 -0.10259
TESTPED2 <--- pedigree title
LOCUS 2 ACK2
f(p) = 1 0.47938 0.06961 0.50000 2 0.07817
f(p) = 1/sqrt(p) 0.75565 0.15779 1.00000 2 0.61515
f(p) = 1/p 1.25000 0.55682 2.00000 2 1.00509
f(p) = 1
The statistic for locus ACK1 for all 1 families is 1.40212 with p-value 0.08043
The statistic for locus ACK2 for all 2 families is 0.20678 with p-value 0.41808
The statistic for locus ACK3 for all 1 families is -0.26594 with p-value 0.60486
f(p) = 1/sqrt(p)
The statistic for locus ACK1 for all 1 families is 1.81882 with p-value 0.03447
The statistic for locus ACK2 for all 2 families is 0.97745 with p-value 0.16417
The statistic for locus ACK3 for all 1 families is -0.18508 with p-value 0.57342
f(p) = 1/p
The statistic for locus ACK1 for all 1 families is 2.05329 with p-value 0.02003
The statistic for locus ACK2 for all 2 families is 1.54955 with p-value 0.06062
The statistic for locus ACK3 for all 1 families is -0.10259 with p-value 0.54087
NOTE: The p-values may be unreliable for small numbers of families. We recommend that you use the simulation program "sim" along with the histogram program "hist" to generate empirical p-values.
BUGS
They are all still hiding under their rocks. If you find one, please catch it and mail it to us!
REFERENCES
See the accompanying REFERENCES file.
DESCRIPTION
This program is designed to efficiently compute the multilocus APM statistic for a number of pedigrees over a number of marker loci. It is capable of storing the results of kinship calculations internally, greatly speeding up computations.
Internally storing the kinship values requires large amounts of memory. In order that this program may be used on systems with limited memory, a scheme to limit the amount of information stored has been employed. Unfortunately, some pascal compilers halt with an error when a new() can't allocate memory; this makes the memory limiting very necessary but of limited use. Users with those compilers may have to set a very small limit.
The program has also been modified to exclude parent-child pairs from the APM statistic. It has been found that this yields greater power in detecting linkage.
At some time in the future, this program will probably be combined with apm.
LIMITS
As distributed, the program can handle up to 50 families with up to 50 members each and 20 marker loci of up to 50 alleles each. Pedigree titles can be up to 80 characters long.
The maximum number of families, members, loci, and alleles can all be changed by altering constants at the beginning of the program. There are other constants that depend on these, as described in the comments, and they should be changed as well.
The problem, of course, with just making these numbers huge is that some compilers have limits on the amount of memory used for variables in each block.
INPUT FILE FORMATS
All affecteds must be typed at ALL markers. The program takes MULT format APM files (see the INTRO file for more details). The file format looks like this:
<number of pedigrees> <number of loci>
<number of alleles at locus #1> <name of locus #1>
<list of allele frequencies for locus #1>
<number of alleles at locus #2> <name of locus #2>
.
.
.
<title of pedigree #1>
<number of members> <number of affecteds>
<list of mothers>
<list of fathers>
<affected #1> <genotype at locus #1> <genotype at locus #2> ...
<affected #2> <genotype at locus #1> <genotype at locus #2> ...
.
.
.
<title of pedigree #2>
.
.
.
Here is a real example data file:
2 3
3 ACK1
0.450 0.300 0.250
3 ACK2
0.550 0.200 0.250
2 ACK3
0.465 0.535
TESTPED1
15 3
0 0 2 2 2 0 0 0 5 5 5 7 12 12 12
0 0 1 1 1 0 0 0 6 6 6 8 11 11 11
4 2 2 2 2 1 2
10 3 3 1 3 2 2
13 3 2 1 1 1 2
TESTPED2
8 2 1
0 0 1 1 1 0 6 6
0 0 2 2 2 0 5 5
4 1 3 1 1 2 2
7 2 3 3 3 1 1
OUTPUT FILE FORMATS
The output files out1.dat, outsqr.dat, and out1p.dat are intended for use with the simmult program. The formats of all of these files are the same (they differ in the function used to weight the allele frequencies: f(p) = 1, f(p) = 1/sqrt(p), f(p) = 1/p, respectively):
<magic number>
<number of pedigrees> <number of loci>
<number of alleles for locus #1> <name of locus #1>
<frequency of allele #1>
<frequency of allele #2>
.
.
<number of alleles for locus #2> <name of locus #2>
.
.
<title of pedigree #1>
<number of members> <number of affecteds>
<list of mothers>
<list of fathers>
<affected #1>
<affected #2>
.
.
<list of cumulative means and variances for the loci>
<list of observed X values for all loci> <total multilocus variance>
<title of pedigree #2>
.
.
The output file out1.dat for the above pedigree file looks like this:
11
2 3
3 ACK1
0.45000
0.30000
0.25000
3 ACK2
0.55000
0.20000
0.25000
2 ACK3
0.46500
0.53500
TESTPED1
15 3
0 0 2 2 2 0 0 0 5 5 5 7 12 12 12
0 0 1 1 1 0 0 0 6 6 6 8 11 11 11
4
10
13
1.26656 0.27363 1.40094 0.35872
1.66283 0.25792
1.00000 0.50000 1.50000 0.89705
TESTPED2
8 2
0 0 1 1 1 0 6 6
0 0 2 2 2 0 5 5
4
7
0.43562 0.05997 0.47938 0.06961
0.56464 0.05909
0.25000 0.00000 0.00000 0.19026
In addition to those three files, a summary file is produced, named 'table.out'. It contains the locus information, the statistics and their p-values for all families, and the overall statistics and their p-values. The format is obvious; the example file to go with the above pedigree file is not included here as it is rather lengthy.
Finally, there is one other output file, apmmult.out. This file consists of lines of the following format:
<number identifying pedigree (1, 2, 3, ...)>
<multilocus covariance, observed X, mean, variance, and total variance>
(that's all on one line).
BUGS
Please let us know if you find any!
REFERENCES
See the accompanying REFERENCES file.
SYNTAX
Getting brief help:
chapm {-help,-usage}
Reading from LINKAGE:
chapm [-intype <type>]
[-outtype <type>]
[{-pedfile,-infile} <file>]
[-locusfile <file>]
[-outfile <file>]
[-disease <locus #>]
[-affdata "<string>"]
[{-loci,-locus} "<string>"]
[-check]
chapm -quiet
[-intype <type>]
-outtype <type>
[{-pedfile,-infile} <file>]
-locusfile <file>
[-outfile <file>]
-disease <locus #>
-affdata "<string>"
[{-loci,-locus} "<string>"]
[-check]
Reading from APM:
chapm [-intype <type>]
[-outtype <type>]
[{-pedfile,-infile} <file>]
[-outfile <file>]
[{-loci,-locus} "<string>"]
[-check]
chapm -quiet
-intype <type>
-outtype <type>
[{-pedfile,-infile} <file>]
[-outfile <file>]
[{-loci,-locus} "<string>"]
[-check]
[] = optional (otherwise required)
{} = one of the items in this list
<type> = a valid file type
<file> = a file name
<locus #> = a number between 1 and the number of loci
"<string>" = a string enclosed in quotes (in all cases
the quotes may be omitted if there are no
spaces or special characters in the string)
DESCRIPTION
Chapm can read LINKAGE or any of the APM file formats (SL, ML, and MULT) and write any of the APM formats (see the INTRO file for a description of these formats). When reading LINKAGE, it uses other necessary information provided by the user (either through interactive input or command-line arguments) to determine which pedigree members are affected. For LINKAGE files, the locus types that are supported are Affection Status, Binary Factor, Numbered Alleles, and Quantitative Variable. Any of these types may be used to determine who is affected.
Since chapm is capable of more extensive checking of the pedigree and locus data than the APM analysis programs themselves, we recommend that you use chapm with the -check option before doing any analyses. Chapm can also be used to polish a data file which has some simple format problem - if the loci are in the wrong order in a ML file, for example. The polished file that it writes will be compatible with the APM programs.
The program may be used interactively or non-interactively. With the -quiet option, it suppresses all unimportant output and bypasses certain safety features (such as the prompt to confirm that the user wishes to overwrite a file). It can also read the pedigree file from standard input and write the output to standard output if the -quiet option is used.
Chapm performs checks for obvious errors whenever it reads new data. These checks are automatic and do not require the -check option. Also, loops that have been broken in LINKAGE files are automatically reconnected.
After the file has been read in (and after it has been internally converted to APM if it was LINKAGE), chapm can, at the user's request, change the order and number of marker loci. It will also perform more extensive checks on pedigree and locus integrity if the -check option was specified (see the description of the -check argument below).
Before writing the output file, chapm checks to see if any of the pedigrees need to be renumbered. The APM programs require that the ancestors of any given member have smaller ID numbers than the given member; if this is not the case for all members in a pedigree (as often happens when reading LINKAGE), then the pedigree must be renumbered.
Also before writing, it deletes any pedigrees that have fewer than two typed affecteds. What actually happens depends on the output file format - when writing MULT and SL format files, a pedigree is deleted if it does not have at least two affecteds typed at all markers, but when writing ML format files, a pedigree is deleted if it does not have at least two affecteds typed at at least one marker. This is because the APM programs that use ML format can support affecteds which are not typed at all loci.
ARGUMENTS
For an Affection Status disease locus with liability classes: You might use -affdata "2-2" to declare all members with status 2 (affection) and in liability class 2 of that locus affected. Or you might use -affdata "2-*" to declare all members of status 2 (in all classes) affected. You can also make multiple specifications; -affdata "2-1 2-2" for example. Legal status numbers are 0, 1, and 2. Legal class numbers are between 0 and the number of classes (specified in the locus file).
For an Affection Status locus with no liability classes: This is the same as for the above, only without the class specification. For example, to declare all members of status 2 affected, you might use -affdata "2".
For a Binary Factor or Numbered Allele locus: Input for both of these locus types is the same. The specification is the pair of allele numbers that define the affected genotype. For example, for a locus with two alleles, you might use -affdata "1/2" or -affdata "2/1" for those with alleles 1 and 2, or -affdata "1/*" for all those with allele 1 present (the other can be anything), and so on.
For a Quantitative Variable disease locus: Input for this is a range or a single value. To mark as affected all those of quantitative value 100, you might use -affdata "100". To mark all those between 100 and 200 (inclusive), you might use -affdata "100-200". To mark all those below (or equal to) 100, you might use -affdata "*-100".
(the keywords used below whose meanings are not obvious are: defined: has been assigned a value or is known to exist (if values are not applicable) sensible: has been defined a value within reasonable bounds complete: all required attributes have been defined)
In checking pedigree integrity:
For each pedigree:
In checking locus integrity:
For each locus:
As an example, consider a LINKAGE file that you wish to convert to ML format. Say that it has an Affection Status locus followed by two Numbered Allele loci, followed by a Quantitative Variable locus. You might use the Affection Status locus to determine affection, and you might want to save both marker loci (the second and third). To do this, you would simply supply this in the command line: -loci "2 3". Or, you could use -loci "3 2" to reverse the order, or you could save only the second (the third in the original file) via -locus "3", etc.
ERROR MESSAGES
The error messages are meant to be as concise and informative as possible. The program, of course, can't always tell you what is wrong; most of the time it can only say what it thinks is wrong and describe the symptoms that it found.
NOTES ABOUT LIMITATIONS OF FILE FORMATS
Most notably, there is a disparity between MULT and ML format: In ML format, affecteds may not be typed at all loci, whereas in MULT format, all affecteds must be typed at all loci. You may, therefore, find that you are losing affecteds or whole pedigrees when converting from ML to MULT, because of some affecteds being untyped at some loci. There is not anything inherently wrong with this, but it is something to keep in mind.
BUGS
As of this date, the support for Quantitative Variable loci has not been fully tested.
REFERENCES
See the accompanying REFERENCES file.
SYNOPSIS
#include <fcmap.h>
int fcmap(stream, format [ , arg ] ... )
FILE *stream;
char *format;
DESCRIPTION
fcmap() is a working fscanf() kind of routine that uses reasonable semantics to provide functional data input. It reads from stream data of the types specified in the format, using the same convention as printf(). The return value is the last character read from the stream, or 0 if the last character has been used for something (for example, if you have read a character into an argument, and nothing else, no other characters would have been read from the stream - read on and it should begin to make sense). It will also return EOF at the end of the file.
One of the differences between this function and fscanf() is that the terminal characters of numbers may be used in constructing the next argument. For example, when reading an integer followed by a string, the first character of the string is lost using fscanf() (on most systems). This is because the first character of the string is used by the number-reading routine to terminate the number. This has far-reaching consequences, and it has caused me much trouble in the past. Say, for example, you have a line that has a number and, after the number, may or may not have comments. You want to skip the comments if they are there (just skip to the end of the line), but if the character which terminates the number is the actual newline, you can't just skip to a newline because you will skip the entire _next_ line! Very annoying.
ARGUMENTS
The format string must be NULL terminated. It may contain any characters, but the character '%' is reserved to indicate the type of the next input to be formatted. (If, however, the character following the '%' is another '%', then the combination is taken as a single literal '%').
The other arguments must be pointers to the correct types, except for those which correspond to "%=" and "%T" formats (see below).
Types are specified following a '%'; the function of the supported type identifiers are:
Two other functions are supported. If the string "%=" appears in the format, the next argument, which must be an integer (_not_ a pointer), is assigned to the last-character-read. This is useful, for example, if you expect to read numbers on a line from within a loop, then after the loop skip the remainder of the line. Your code might look something like this:
int lch, entry, Entry[], EntryCnt;
for (entry = 0; entry < EntryCnt; entry++)
lch = fcmap(stdin, "%d", &(Entry[entry]));
fcmap(stdin, "%=\n", lch);
This reads EntryCnt numbers from the input stream, saves the terminating character of the last number, and uses it (in case it actually is a newline) to skip to the next newline.
Also, if the format string contains "%T", then the next argument (which must also be an integer) becomes the "generic terminator". When a read is attempted and this termination character is encountered, the reading is aborted and the function returns. The termination character itself is returned. So if, for example, you wanted to read a line of integers separated by any amount of white space (save newlines), but you didn't know how many integers, and the line is terminated by a newline, you might do something like this:
int n = 0, lch, i[], dum;
do {
lch = fcmap(stdin, "%T%d", '\n', &dum);
if (lch == '\n') break;
i[n++] = dum;
} while ((lch != '\n') && (lch != EOF));
After this little routine, n contains the number of integers and i[] contains the actual integers. It's still a little ugly, but it's better than most other ways.
OTHER INTERFACE BITS
There are also two integer variables that are used by fcmap() and that your programs can use, and for which macros are provided to help you:
Both of these are cleared each time fcmap() is called.
The macros provided are:
With these macros, you might have written the last example routine (the one which reads any number of integers terminated by a newline) like:
int n = 0, lch, i[], dum;
do {
lch = fcmap(stdin, "%T%d", '\n', &dum);
if (FCMAP_NARGS < 2) break;
i[n++] = dum;
} while ((lch != '\n') && (lch != EOF));
And you would probably get better results.
BUGS
Field widths and string lengths are currently not supported. Many other things are not supported (like unsigned's, octals, hexadecimals, etc.). (They'll be added in as the needs arise.)
DESCRIPTION
This program takes a file of numbers separated by spaces (or it takes them from stdin if requested) and generates some statistical figures and a histogram. It can also compute empirical p-values.
At least two samples are required. All output goes to the standard output.
ARGUMENTS
usage:
hist [-p {-,<file of statistics>}] [-s] {-,<file of samples>}
When using with APM, the "file of samples" is usually one of the tstat*.out files containing the simulated statistics, and the "file of statistics" is generally not used (though "-p -" is sometimes used to enter statistics manually to compute empirical p-values (see below)).
If -p is specified, the program will compute empirical p-values after everything else is done. If the p-value statistics are being taken from standard input (as directed by using '-' instead of the file name), it will prompt for them.
The -s option is required only if -p is specified, to avoid ambiguity. If -p is not used, just a file name may be given.
If -p is specified, then there must be at least one file specification (for either the p-value statistics or the samples). Both kinds of data cannot be read in from the standard input.
EXAMPLES
'hist <file>' reads samples from <file> and does not compute
empirical p-values
'hist -p <file> -s -' reads samples from stdin and statistics
for empirical p-values from <file>
'hist -p <file 1> -s <file 2>' reads statistics for p-values
from <file 1> and samples from <file 2>
'hist -p - -s <file>' reads the samples from <file> and the
statistics for the p-values from stdin
'hist' is illegal
'hist -p - -s -' is illegal (both files can't be unspecified)
BUGS
None known.
REFERENCES
Just myself. But for references regarding APM, see the accompanying REFERENCES file.
DESCRIPTION
This program simulates pedigrees by generating random genotypes for each member over a number of iterations. It computes the statistic over all families for each locus and for each iteration, and then it computes the mean and variance of these statistics.
The program has also been modified to exclude parent-child pairs from the APM statistic. It has been found that this yields greater power in detecting linkage, and since parent-child pairs are excluded in the apm program, they must be excluded here as well.
INPUT FILE FORMATS
The output files out1.dat, outsqr.dat, and out1p.dat produced by apm are intended for use with the sim program. The formats of all of these files are the same (they differ in the function used to weight the allele frequencies: f(p) = 1, f(p) = 1/sqrt(p), f(p) = 1/p, respectively):
<magic number>
<number of pedigrees> <number of loci>
<number of alleles for locus #1> <name of locus #1>
<frequency of allele #1>
<frequency of allele #2>
.
.
<number of alleles for locus #2> <name of locus #2>
.
.
<title of pedigree #1>
<number of members> <number of affecteds> <number of loci>
<list of mothers>
<list of fathers>
<locus number> <number of affecteds at this locus>
<affected #1>
<affected #2>
.
.
<cumulative mean> <cumulative variance>
<locus number>
.
.
<title of pedigree #2>
.
.
<statistic for first locus> <p-value>
<statistic for second locus> <p-value>
.
.
OUTPUT FILE FORMATS
There is one output file for each locus, named tstat<n>.out (where <n> is the locus number). Each file contains the simulated statistics for each iteration - useful if you need them for input to a statistics package.
There is also one output file, sim.out, which is appended each time you run sim. It contains all the results of the runs that you normally see at the standard output.
BUGS
There aren't any that I know of.
REFERENCES
See the accompanying REFERENCES file.
Hall JM, Lee MK, Newman B, Morrow JE, Anderson LA, Huey B, King M- C (1990) Linkage of early-onset familial breast cancer to chromosome 17q21. Science 250:1684-1689
Lange K (1986a) The affected sib-pair method using identity by state relations. Am J Hum Genet 39:148-150
Lange K (1986b) A test statistic for the affected-sib-set method. Ann Hum Genet 50:283-290
Lange K, Weeks DE (1990) Linkage methods for identifying genetic risk factors. World Rev Nutr Diet 63:236-249
Weeks DE, Lange K (1988) The affected-pedigree-member method of linkage analysis. Am J Hum Genet 42:315-326
Weeks DE, Lange K (1991) An overview of the affected-pedigree- member method of linkage analysis. Proceedings of the 23rd symposium on the interface, Seattle, Interface Foundation of North America, pp. 386-391
Weeks DE, Lange K (1992) A multilocus extension of the affected- pedigree-member method of linkage analysis. Am J Hum Genet 50:859-868
Schroeder MD, Brown DL, Weeks DE (1994) Improved programs for the affected-pedigree-member method of linkage analysis. Genet Epidemiol 11:69-74