"MAN PAGE" for APM:
Affected-Pedigree-Member Method

by Dan Weeks

(1993)


apm | apmmult | chapm | fcmap | hist | sim | references

1. apm : compute the APM statistic for multiple marker loci (from ML files)

DESCRIPTION

This program is designed to efficiently compute the APM statistic for a number of pedigrees over a number of marker loci. It is capable of storing the results of kinship calculations internally, greatly speeding up computations for different markers on the same pedigree. It can also store kinship information in a file to be used later.

Internally storing the kinship values requires large amounts of memory. In order that this program may be used on systems with limited memory, a scheme to limit the amount of information stored has been employed. Unfortunately, some pascal compilers halt with an error when a new() can't allocate memory; this makes the memory limiting very necessary but of limited use. Users with those compilers may have to set a very small limit.

The program has also been modified to exclude parent-child pairs from the APM statistic. It has been found that this yields greater power in detecting linkage.

LIMITS

As distributed, the program can handle up to 400 families with up to 60 members each and 20 marker loci of up to 40 alleles each. Pedigree titles can be up to 80 characters long and locus names can be up to 10 characters long.

The maximum number of families, members, loci, and alleles can all be changed by altering constants at the beginning of the program. There are other constants that depend on these, as described in the comments, and they should be changed as well.

The problem, of course, with just making these numbers huge is that some compilers have limits on the amount of memory used for variables in each block.

INPUT FILE FORMATS

The program uses a file format which supports multiple loci. It must include for each locus the locus name, the number of alleles, the allele frequencies, and the genotypes for each of the affecteds. The format of the pedigree data file is:

     <number of pedigrees>   <number of loci>
     <number of alleles at locus #1>   <name of locus #1>
     <list of allele frequencies for locus #1>
     <number of alleles at locus #2>   <name of locus #2>
     .
     .
     .
     <title of pedigree #1>
     <number of members>  <number of affecteds>  <number of typed loci>
     <list of mothers>
     <list of fathers>
     <list of all affecteds>
     <locus number> <list of genotypes>
     <locus number> <list of genotypes>
     .
     .
     .
     <title of pedigree #2>
     .
     .
     .

The locus numbers are the numerical ID's of the loci as determined from the order of their entries at the beginning of the file (the first locus is numbered 1, the second 2, etc.). The lines containing the locus number and list of genotypes must be ordered so that the locus numbers are increasing. The genotypes are thus arranged in the form of a table, with affected id's along the top in increasing order (moving to the right) and locus id's along the left in increasing order (moving downward); for example, a valid table of genotypes might be:

         3     4     5     7    <- list of all affecteds
     1  1 1   2 1   1 1   1 2   <- locus number and list of genotypes
     3  1 2   1 1   0 0   1 1   <-  ""
     4  1 1   1 2   1 1   1 1   <-  ""

which means that, for the first locus, 3 has genotype 1/1, 4 has genotype 2/1, 5 has genotype 1/1, and 7 has genotype 1/2. For the third locus, 3 is typed as 1/2, 4 as 1/1, 5 is untyped, and 7 has genotype 1/1. And so on. The family is untyped for locus 2 and any loci after the fourth.

Here is a real example data file:

          2   3
        3   ACK1
    0.450  0.300  0.250
        3   ACK2
    0.550  0.200  0.250
        2   ACK3
    0.465   0.535
        TESTPED 1
          15   3   3
        0 0 2 2 2 0 0 0 5 5 5 7 12 12 12
        0 0 1 1 1 0 0 0 6 6 6 8 11 11 11
          4     10    13
     1    2 2   2 2   1 2
     2    3 3   0 0   2 3
     3    0 0   1 1   1 2
        TESTPED 2
           8   2   1
        0 0 1 1 1 0 6 6
        0 0 2 2 2 0 5 5
          4     7
     2    2 3   3 3

As previously mentioned, apm can produce a file containing kinship data, which it can later reuse. This file maintains the results of summations accumulated in the kinship calculations for each family and for each list of affecteds. Care must be taken that it does not go out-of-date with respect to the pedigree data file; the only changes that can be made in the pedigree data file that will not render the file of kinship data invalid are changes in the locus names, the number of alleles, the allele frequencies, and the list and number of affecteds for any or all pedigrees. If a pedigree name or overall structure is changed, the old data file will no longer be valid (if just the name is changed, it can also be changed in the kinship data file; doing so will make the file usable in combination with the new pedigree file).

This is the format of the file of kinship data:

                                              
    <title of pedigree #1>
     <number of records for this pedigree1>
     <first list of affected members1>
     <E(Z), the overall mean of the mean marker similarities E(Zij)1>
     <the sum of all Phi[(i,j,k,l)]1>
     <the sum of all Phi[(i,j)(k,l)]1>
     <the sum of all Phi[(i,j,k)(l)], Phi[(i,j,l)(k)], Phi[(i,k,l)(j)],
            Phi[(j,k,l)(i)], Phi[(i,l)(j,k)], and Phi[(i,k)(j,l)]1>
     <the sum of all Phi[(i,j)(k)(l)] and Phi[(i)(j)(k,l)]1>
     <the sum of all Phi[(i,k)(j)(l)], Phi[(i,l)(j)(k)],
            Phi[(j,l)(i)(k)], Phi[(j,k)(i)(l)]1>
     <the sum of all Phi[(i)(j)(k)(l)]1>
     <second list of affected members1>
    .
    .
    .
     <title of pedigree #21>
    .
    .
    .

where

Phi[]
is the kinship coefficient. An example file is:

       TESTPED1
3 10 13 1.250000e-01 3.125000e-02 0.000000e+00 3.437500e-01 6.250000e-02 4.375000e-01 1.250000e-01 4 13 6.250000e-02 1.562500e-02 0.000000e+00 2.968750e-01 3.125000e-02 4.687500e-01 1.875000e-01 4 10 13 3.125000e-01 1.250000e-01 2.343750e-02 1.898438e+00 6.406250e-01 4.203125e+00 2.109375e+00 TESTPED2 1 4 7 1.250000e-01 3.125000e-02 0.000000e+00 3.437500e-01 6.250000e-02 4.375000e-01 1.250000e-01

OUTPUT FILE FORMATS

The output files out1.dat, outsqr.dat, and out1p.dat are intended for use with the sim program. The formats of all of these files are the same (they differ in the function used to weight the allele frequencies: f(p) = 1, f(p) = 1/sqrt(p), f(p) = 1/p, respectively):

    <magic number>
    <number of pedigrees>   <number of loci>
    <number of alleles for locus #1>   <name of locus #1>
    <frequency of allele #1>
    <frequency of allele #2>
    .
    .
    <number of alleles for locus #2>   <name of locus #2>
    .
    .
    <title of pedigree #1>
    <number of members>   <number of affecteds>   <number of loci>
    <list of mothers>
    <list of fathers>
    <locus number>   <number of affecteds at this locus>
    <affected #1>
    <affected #2>
    .
    .
    <cumulative mean>   <cumulative variance>
    <locus number>   <number of affecteds at this locus>
    .
    .
    <title of pedigree #2>
    .
    .
    <statistic for first locus> <p-value>
    <statistic for second locus> <p-value>
    .
    .

The output file out1.dat for the above pedigree file looks like this:

             1
             2         3
             3    ACK1
       0.45000
       0.30000
       0.25000
             3    ACK2
       0.55000
       0.20000
       0.25000
             2    ACK3
       0.46500
       0.53500
       TESTPED1
                15             3             3
       0   0   2   2   2   0   0   0   5   5   5   7  12  12  12
       0   0   1   1   1   0   0   0   6   6   6   8  11  11  11
      1       3
           4
          10
          13
                 1.26656
                 0.27363
      2       2
           4
          13
                 0.44219
                 0.07280
      3       2
          10
          13
                 0.56464
                 0.05909
       TESTPED2
                 8             2             1
       0   0   1   1   1   0   6   6
       0   0   2   2   2   0   5   5
      2       2
           4
           7
                 0.47938
                 0.06961
       1.40212   0.08043
       0.20678   0.41808
      -0.26594   0.60486

In addition to those three files, a summary file is produced, named 'table.out'. It contains the locus information, the statistics and their p-values for all families, and the overall statistics and their p-values. The format is obvious; here is the file produced alongside out1.dat above:

    allele frequencies:

       ACK1       0.45000    0.30000    0.25000
       ACK2       0.55000    0.20000    0.25000
       ACK3       0.46500    0.53500

    family  mean  variance  observedx Na  statistic
       TESTPED1            <--- pedigree title
    LOCUS   1     ACK1
    f(p) = 1           1.26656   0.27363   2.00000    3   1.40212
    f(p) = 1/sqrt(p)   2.12586   0.70359   3.65148    3   1.81882
    f(p) = 1/p         3.62500   2.19444   6.66667    3   2.05329
    LOCUS   2     ACK2
    f(p) = 1           0.44219   0.07280   0.50000    2   0.21426
    f(p) = 1/sqrt(p)   0.68899   0.16435   1.00000    2   0.76717
    f(p) = 1/p         1.12500   0.54403   2.00000    2   1.18630
    LOCUS   3     ACK3
    f(p) = 1           0.56464   0.05909   0.50000    2  -0.26594
    f(p) = 1/sqrt(p)   0.79652   0.11693   0.73324    2  -0.18508
    f(p) = 1/p         1.12500   0.23499   1.07527    2  -0.10259
       TESTPED2            <--- pedigree title
    LOCUS   2     ACK2
    f(p) = 1           0.47938   0.06961   0.50000    2   0.07817
    f(p) = 1/sqrt(p)   0.75565   0.15779   1.00000    2   0.61515
    f(p) = 1/p         1.25000   0.55682   2.00000    2   1.00509

    f(p) = 1
    The statistic for locus    ACK1    for all    1 families is    1.40212  with p-value    0.08043
    The statistic for locus    ACK2    for all    2 families is    0.20678  with p-value    0.41808
    The statistic for locus    ACK3    for all    1 families is   -0.26594  with p-value    0.60486
    f(p) = 1/sqrt(p)
    The statistic for locus    ACK1    for all    1 families is    1.81882  with p-value    0.03447
    The statistic for locus    ACK2    for all    2 families is    0.97745  with p-value    0.16417
    The statistic for locus    ACK3    for all    1 families is   -0.18508  with p-value    0.57342
    f(p) = 1/p
    The statistic for locus    ACK1    for all    1 families is    2.05329  with p-value    0.02003
    The statistic for locus    ACK2    for all    2 families is    1.54955  with p-value    0.06062
    The statistic for locus    ACK3    for all    1 families is   -0.10259  with p-value    0.54087

NOTE: The p-values may be unreliable for small numbers of families. We recommend that you use the simulation program "sim" along with the histogram program "hist" to generate empirical p-values.

BUGS

They are all still hiding under their rocks. If you find one, please catch it and mail it to us!

REFERENCES

See the accompanying REFERENCES file.

APM Release 2.0 Last change: 5 Jul 1993


2. apmmult : compute the multiple-locus APM statistic (from MULT files)

DESCRIPTION

This program is designed to efficiently compute the multilocus APM statistic for a number of pedigrees over a number of marker loci. It is capable of storing the results of kinship calculations internally, greatly speeding up computations.

Internally storing the kinship values requires large amounts of memory. In order that this program may be used on systems with limited memory, a scheme to limit the amount of information stored has been employed. Unfortunately, some pascal compilers halt with an error when a new() can't allocate memory; this makes the memory limiting very necessary but of limited use. Users with those compilers may have to set a very small limit.

The program has also been modified to exclude parent-child pairs from the APM statistic. It has been found that this yields greater power in detecting linkage.

At some time in the future, this program will probably be combined with apm.

LIMITS

As distributed, the program can handle up to 50 families with up to 50 members each and 20 marker loci of up to 50 alleles each. Pedigree titles can be up to 80 characters long.

The maximum number of families, members, loci, and alleles can all be changed by altering constants at the beginning of the program. There are other constants that depend on these, as described in the comments, and they should be changed as well.

The problem, of course, with just making these numbers huge is that some compilers have limits on the amount of memory used for variables in each block.

INPUT FILE FORMATS

All affecteds must be typed at ALL markers. The program takes MULT format APM files (see the INTRO file for more details). The file format looks like this:

     <number of pedigrees>   <number of loci>
     <number of alleles at locus #1>   <name of locus #1>
     <list of allele frequencies for locus #1>
     <number of alleles at locus #2>   <name of locus #2>
     .
     .
     .
     <title of pedigree #1>
     <number of members>  <number of affecteds>
     <list of mothers>
     <list of fathers>
     <affected #1> <genotype at locus #1> <genotype at locus #2> ...
     <affected #2> <genotype at locus #1> <genotype at locus #2> ...
     .
     .
     .
     <title of pedigree #2>
     .
     .
     .

Here is a real example data file:

          2   3
       3   ACK1
    0.450  0.300  0.250
       3   ACK2
    0.550  0.200  0.250
       2   ACK3
    0.465   0.535
       TESTPED1
          15   3
       0 0 2 2 2 0 0 0 5 5 5 7 12 12 12
       0 0 1 1 1 0 0 0 6 6 6 8 11 11 11
     4    2 2   2 2   1 2
     10   3 3   1 3   2 2
     13   3 2   1 1   1 2
       TESTPED2
          8   2   1
       0 0 1 1 1 0 6 6
       0 0 2 2 2 0 5 5
     4    1 3   1 1   2 2
     7    2 3   3 3   1 1



OUTPUT FILE FORMATS

The output files out1.dat, outsqr.dat, and out1p.dat are intended for use with the simmult program. The formats of all of these files are the same (they differ in the function used to weight the allele frequencies: f(p) = 1, f(p) = 1/sqrt(p), f(p) = 1/p, respectively):

    <magic number>
    <number of pedigrees>   <number of loci>
    <number of alleles for locus #1>   <name of locus #1>
    <frequency of allele #1>
    <frequency of allele #2>
    .
    .
    <number of alleles for locus #2>   <name of locus #2>
    .
    .
    <title of pedigree #1>
    <number of members>   <number of affecteds>
    <list of mothers>
    <list of fathers>
    <affected #1>
    <affected #2>
    .
    .
    <list of cumulative means and variances for the loci>
    <list of observed X values for all loci>  <total multilocus variance>
    <title of pedigree #2>
    .
    .

The output file out1.dat for the above pedigree file looks like this:

          11
           2         3
           3   ACK1
     0.45000
     0.30000
     0.25000
           3   ACK2
     0.55000
     0.20000
     0.25000
           2   ACK3
     0.46500
     0.53500
     TESTPED1
              15             3
     0   0   2   2   2   0   0   0   5   5   5   7  12  12  12
     0   0   1   1   1   0   0   0   6   6   6   8  11  11  11
     4

    10
    13
               1.26656             0.27363             1.40094             0.35872
             1.66283             0.25792
               1.00000             0.50000             1.50000             0.89705
     TESTPED2
               8             2
     0   0   1   1   1   0   6   6
     0   0   2   2   2   0   5   5
     4
     7
               0.43562             0.05997             0.47938             0.06961
             0.56464             0.05909
               0.25000             0.00000             0.00000             0.19026

In addition to those three files, a summary file is produced, named 'table.out'. It contains the locus information, the statistics and their p-values for all families, and the overall statistics and their p-values. The format is obvious; the example file to go with the above pedigree file is not included here as it is rather lengthy.

Finally, there is one other output file, apmmult.out. This file consists of lines of the following format:

    <number identifying pedigree (1, 2, 3, ...)>
    <multilocus covariance, observed X, mean, variance, and total variance>

(that's all on one line).

BUGS

Please let us know if you find any!

REFERENCES

See the accompanying REFERENCES file.

APM Release 2.0 Last change: 5 Jul 1993


3. chapm: convert LINKAGE to APM and APM to APM

SYNTAX

Getting brief help:
chapm {-help,-usage}

Reading from LINKAGE:

     chapm [-intype <type>]
           [-outtype <type>]
           [{-pedfile,-infile} <file>]
           [-locusfile <file>]
           [-outfile <file>]
           [-disease <locus #>]
           [-affdata "<string>"]
           [{-loci,-locus} "<string>"]
           [-check]
     chapm -quiet
           [-intype <type>]
           -outtype <type>
           [{-pedfile,-infile} <file>]
           -locusfile <file>
           [-outfile <file>]
           -disease <locus #>
           -affdata "<string>"
           [{-loci,-locus} "<string>"]
           [-check]

Reading from APM:

     chapm [-intype <type>]
           [-outtype <type>]
           [{-pedfile,-infile} <file>]
           [-outfile <file>]
           [{-loci,-locus} "<string>"]
           [-check]
     chapm -quiet
           -intype <type>
           -outtype <type>
           [{-pedfile,-infile} <file>]
           [-outfile <file>]
           [{-loci,-locus} "<string>"]
           [-check]

     [] = optional (otherwise required)
     {} = one of the items in this list
     <type> = a valid file type
     <file> = a file name
     <locus #> = a number between 1 and the number of loci
     "<string>" = a string enclosed in quotes (in all cases
                  the quotes may be omitted if there are no
                  spaces or special characters in the string)

DESCRIPTION

Chapm can read LINKAGE or any of the APM file formats (SL, ML, and MULT) and write any of the APM formats (see the INTRO file for a description of these formats). When reading LINKAGE, it uses other necessary information provided by the user (either through interactive input or command-line arguments) to determine which pedigree members are affected. For LINKAGE files, the locus types that are supported are Affection Status, Binary Factor, Numbered Alleles, and Quantitative Variable. Any of these types may be used to determine who is affected.

Since chapm is capable of more extensive checking of the pedigree and locus data than the APM analysis programs themselves, we recommend that you use chapm with the -check option before doing any analyses. Chapm can also be used to polish a data file which has some simple format problem - if the loci are in the wrong order in a ML file, for example. The polished file that it writes will be compatible with the APM programs.

The program may be used interactively or non-interactively. With the -quiet option, it suppresses all unimportant output and bypasses certain safety features (such as the prompt to confirm that the user wishes to overwrite a file). It can also read the pedigree file from standard input and write the output to standard output if the -quiet option is used.

Chapm performs checks for obvious errors whenever it reads new data. These checks are automatic and do not require the -check option. Also, loops that have been broken in LINKAGE files are automatically reconnected.

After the file has been read in (and after it has been internally converted to APM if it was LINKAGE), chapm can, at the user's request, change the order and number of marker loci. It will also perform more extensive checks on pedigree and locus integrity if the -check option was specified (see the description of the -check argument below).

Before writing the output file, chapm checks to see if any of the pedigrees need to be renumbered. The APM programs require that the ancestors of any given member have smaller ID numbers than the given member; if this is not the case for all members in a pedigree (as often happens when reading LINKAGE), then the pedigree must be renumbered.

Also before writing, it deletes any pedigrees that have fewer than two typed affecteds. What actually happens depends on the output file format - when writing MULT and SL format files, a pedigree is deleted if it does not have at least two affecteds typed at all markers, but when writing ML format files, a pedigree is deleted if it does not have at least two affecteds typed at at least one marker. This is because the APM programs that use ML format can support affecteds which are not typed at all loci.

ARGUMENTS

ERROR MESSAGES

The error messages are meant to be as concise and informative as possible. The program, of course, can't always tell you what is wrong; most of the time it can only say what it thinks is wrong and describe the symptoms that it found.

NOTES ABOUT LIMITATIONS OF FILE FORMATS

Most notably, there is a disparity between MULT and ML format: In ML format, affecteds may not be typed at all loci, whereas in MULT format, all affecteds must be typed at all loci. You may, therefore, find that you are losing affecteds or whole pedigrees when converting from ML to MULT, because of some affecteds being untyped at some loci. There is not anything inherently wrong with this, but it is something to keep in mind.

BUGS

As of this date, the support for Quantitative Variable loci has not been fully tested.

REFERENCES

See the accompanying REFERENCES file.

APM Release 2.0 Last change: 5 Jul 1993


4. fcmap formatted data input

SYNOPSIS

#include <fcmap.h>

     int fcmap(stream, format [ , arg ] ... )
     FILE *stream;
     char *format;

DESCRIPTION

fcmap() is a working fscanf() kind of routine that uses reasonable semantics to provide functional data input. It reads from stream data of the types specified in the format, using the same convention as printf(). The return value is the last character read from the stream, or 0 if the last character has been used for something (for example, if you have read a character into an argument, and nothing else, no other characters would have been read from the stream - read on and it should begin to make sense). It will also return EOF at the end of the file.

One of the differences between this function and fscanf() is that the terminal characters of numbers may be used in constructing the next argument. For example, when reading an integer followed by a string, the first character of the string is lost using fscanf() (on most systems). This is because the first character of the string is used by the number-reading routine to terminate the number. This has far-reaching consequences, and it has caused me much trouble in the past. Say, for example, you have a line that has a number and, after the number, may or may not have comments. You want to skip the comments if they are there (just skip to the end of the line), but if the character which terminates the number is the actual newline, you can't just skip to a newline because you will skip the entire _next_ line! Very annoying.

ARGUMENTS

The format string must be NULL terminated. It may contain any characters, but the character '%' is reserved to indicate the type of the next input to be formatted. (If, however, the character following the '%' is another '%', then the combination is taken as a single literal '%').

The other arguments must be pointers to the correct types, except for those which correspond to "%=" and "%T" formats (see below).

Types are specified following a '%'; the function of the supported type identifiers are:

d
Skip to the next digit (or '-') and read an integer terminated by a non-digit. Store in the integer argument.
f
Skip to the next digit (or '-') and read an integer terminated by a non-digit. If the last character read was a '.', read characters until a non-digit is reached, storing as cumulates to the fractional part. If the last character was a 'e' or a 'E', read the exponent as an integer. Store the result in the argument of type float.
g
Same as f except store the result in the argument of type double.
c
Read the next character and store in the character argument.
s
The string argument works slightly differently. The format is "%sT", where T is some character (it can also be a NULL, particularly if the "%s" is at the end of the format string). A string is read up to (but excluding) the character T, or, if T is NULL or a space, the next white space. The string is then NULL-terminated.
Other characters may appear in the format string. If they are spaces (exclusively ' ' characters) they are ignored, except in the cases where they follow a "%s" format qualifier (see above). This is to allow legibility of the format string. If they are anything else, the input stream is scanned until that character is read. Thus, you can scan to a marker which signifies the beginning of data, you can skip to the end of a line, and so on.

Two other functions are supported. If the string "%=" appears in the format, the next argument, which must be an integer (_not_ a pointer), is assigned to the last-character-read. This is useful, for example, if you expect to read numbers on a line from within a loop, then after the loop skip the remainder of the line. Your code might look something like this:

          int lch, entry, Entry[], EntryCnt;
          for (entry = 0; entry < EntryCnt; entry++)
              lch = fcmap(stdin, "%d", &(Entry[entry]));
          fcmap(stdin, "%=\n", lch);

This reads EntryCnt numbers from the input stream, saves the terminating character of the last number, and uses it (in case it actually is a newline) to skip to the next newline.

Also, if the format string contains "%T", then the next argument (which must also be an integer) becomes the "generic terminator". When a read is attempted and this termination character is encountered, the reading is aborted and the function returns. The termination character itself is returned. So if, for example, you wanted to read a line of integers separated by any amount of white space (save newlines), but you didn't know how many integers, and the line is terminated by a newline, you might do something like this:

          int n = 0, lch, i[], dum;
          do {
              lch = fcmap(stdin, "%T%d", '\n', &dum);
              if (lch == '\n') break;
              i[n++] = dum;
          } while ((lch != '\n') && (lch != EOF));

After this little routine, n contains the number of integers and i[] contains the actual integers. It's still a little ugly, but it's better than most other ways.

OTHER INTERFACE BITS

There are also two integer variables that are used by fcmap() and that your programs can use, and for which macros are provided to help you:

fcmap_nargs
contains the number of operations performed (operations that count include all '%' args (even "%=" and "%T") and all characters that are searched for (the string termination character not included)). So, "%T%sX%d\n" will result in fcmap_nargs <= 4 (%T, %s, %d, and '\n'). If the termination character is read or EOF is encountered, you can use this variable to see how much data was actually successfully read
fcmap_stat
contains status information (whether the last character and termination character are defined, and whether the routine stopped because of the termination character or EOF being reached)

Both of these are cleared each time fcmap() is called.

The macros provided are:

FCMAP_NARGS
simply the same as fcmap_nargs
FCMAP_TERM
non-zero if the termination character or EOF was encountered
FCMAP_TCHDEF
whether the termination character was/is defined (non-zero if true)
FCMAP_LCHDEF
whether there is a "last character read" that may be used for other input

With these macros, you might have written the last example routine (the one which reads any number of integers terminated by a newline) like:

          int n = 0, lch, i[], dum;
          do {
              lch = fcmap(stdin, "%T%d", '\n', &dum);
              if (FCMAP_NARGS < 2) break;
              i[n++] = dum;
          } while ((lch != '\n') && (lch != EOF));
And you would probably get better results.

BUGS

Field widths and string lengths are currently not supported. Many other things are not supported (like unsigned's, octals, hexadecimals, etc.). (They'll be added in as the needs arise.)

Zoot Release 1.0 Last change: 17 May 1993


5.hist: generate statistics and a small ascii histogram

DESCRIPTION

This program takes a file of numbers separated by spaces (or it takes them from stdin if requested) and generates some statistical figures and a histogram. It can also compute empirical p-values.

At least two samples are required. All output goes to the standard output.

ARGUMENTS

usage:

hist [-p {-,<file of statistics>}] [-s] {-,<file of samples>}

When using with APM, the "file of samples" is usually one of the tstat*.out files containing the simulated statistics, and the "file of statistics" is generally not used (though "-p -" is sometimes used to enter statistics manually to compute empirical p-values (see below)).

If -p is specified, the program will compute empirical p-values after everything else is done. If the p-value statistics are being taken from standard input (as directed by using '-' instead of the file name), it will prompt for them.

The -s option is required only if -p is specified, to avoid ambiguity. If -p is not used, just a file name may be given.

If -p is specified, then there must be at least one file specification (for either the p-value statistics or the samples). Both kinds of data cannot be read in from the standard input.

EXAMPLES

    'hist <file>' reads samples from <file> and does not compute
       empirical p-values
'hist -p <file> -s -' reads samples from stdin and statistics for empirical p-values from <file> 'hist -p <file 1> -s <file 2>' reads statistics for p-values from <file 1> and samples from <file 2> 'hist -p - -s <file>' reads the samples from <file> and the statistics for the p-values from stdin 'hist' is illegal
'hist -p - -s -' is illegal (both files can't be unspecified)

BUGS

None known.

REFERENCES

Just myself. But for references regarding APM, see the accompanying REFERENCES file.

APM Release 2.0 Last change: 23 Aug 1993


6. sim: simulate pedigrees using apm program output

DESCRIPTION

This program simulates pedigrees by generating random genotypes for each member over a number of iterations. It computes the statistic over all families for each locus and for each iteration, and then it computes the mean and variance of these statistics.

The program has also been modified to exclude parent-child pairs from the APM statistic. It has been found that this yields greater power in detecting linkage, and since parent-child pairs are excluded in the apm program, they must be excluded here as well.

INPUT FILE FORMATS

The output files out1.dat, outsqr.dat, and out1p.dat produced by apm are intended for use with the sim program. The formats of all of these files are the same (they differ in the function used to weight the allele frequencies: f(p) = 1, f(p) = 1/sqrt(p), f(p) = 1/p, respectively):

    <magic number>
    <number of pedigrees>   <number of loci>
    <number of alleles for locus #1>   <name of locus #1>
    <frequency of allele #1>
    <frequency of allele #2>
    .
    .
    <number of alleles for locus #2>   <name of locus #2>
    .
    .
    <title of pedigree #1>
    <number of members>   <number of affecteds>   <number of loci>
    <list of mothers>
    <list of fathers>
    <locus number>   <number of affecteds at this locus>
    <affected #1>
    <affected #2>
    .
    .
    <cumulative mean>   <cumulative variance>
    <locus number>
    .
    .
    <title of pedigree #2>
    .
    .
    <statistic for first locus> <p-value>
    <statistic for second locus> <p-value>
    .
    .

OUTPUT FILE FORMATS

There is one output file for each locus, named tstat<n>.out (where <n> is the locus number). Each file contains the simulated statistics for each iteration - useful if you need them for input to a statistics package.

There is also one output file, sim.out, which is appended each time you run sim. It contains all the results of the runs that you normally see at the standard output.

BUGS

There aren't any that I know of.

REFERENCES

See the accompanying REFERENCES file.

APM Release 2.0 Last change: 5 Jul 1993


References

Hall JM, Lee MK, Newman B, Morrow JE, Anderson LA, Huey B, King M- C (1990) Linkage of early-onset familial breast cancer to chromosome 17q21. Science 250:1684-1689

Lange K (1986a) The affected sib-pair method using identity by state relations. Am J Hum Genet 39:148-150

Lange K (1986b) A test statistic for the affected-sib-set method. Ann Hum Genet 50:283-290

Lange K, Weeks DE (1990) Linkage methods for identifying genetic risk factors. World Rev Nutr Diet 63:236-249

Weeks DE, Lange K (1988) The affected-pedigree-member method of linkage analysis. Am J Hum Genet 42:315-326

Weeks DE, Lange K (1991) An overview of the affected-pedigree- member method of linkage analysis. Proceedings of the 23rd symposium on the interface, Seattle, Interface Foundation of North America, pp. 386-391

Weeks DE, Lange K (1992) A multilocus extension of the affected- pedigree-member method of linkage analysis. Am J Hum Genet 50:859-868

Schroeder MD, Brown DL, Weeks DE (1994) Improved programs for the affected-pedigree-member method of linkage analysis. Genet Epidemiol 11:69-74


This html file is prepared by Frank Visser <fvisser@hgmp.mrc.ac.uk>
and slightly modified by Wentian Li <wli@linkage.cpmc.columbia.edu >
Sept 1995


back to software list