Perl as a Tool for Linkage Analysis

(Meeting Abstract)

W. Li, F. Haghighi

American Journal of Human Genetics, suppl.65, A260 (1999)


For an analyst, linkage analysis entails a series of steps involving preparation of input files for specification of both the disease model and pedigree structure, and manipulation of output files to obtain the desired results. These steps are often repetitive, and when accomplished manually can be extremely time consuming and error-prone. We propose to automate such tasks with the aid of Perl scripts. Perl is a programming language that allows for easy manipulation of numbers and text, files and directories and external programs. It is very portable. Besides UNIX, it is availabe on MS-DOS, Win 95/NT, Machintosh and many other operating systems. Perl is widely used in computer system administration, Web based applications, and bioinformatics.

We describe three examples where Perl scripts are used to facilitate linkage analysis: (1) Genome-wide linkage analysis maybe automated given the adoption of a consistent file-name notation. This involves analyzing a large pool of markers covering all 23 chromosomes, where we iteratively traverse all the chromosomes and analyze the marker data using programs from the LINKAGE package or GENEHUNTER. (2) Multiple models are often considered in the effort to find gene(s) underlying complex disorders with unknown etiologies. These models can range in complexity and can include combinations of many factors, such as different levels of disease severity, age and sex dependent penetrances, and environmental effects. For traditional linkage analysis, the input files, namely the parameter file defining the disease model (i.e. "datafile.dat") and the pedigree file corresponding to the different liability classes (i.e. "pedfile.dat") can be quickly generated via a Perl script given a user defined disease model. (3) In assessing a putative linkage result for a marker, it is useful to examine the ranking and magnitude of individual family lod scores. The rankings maybe obtained by parsing the linkage output file (i.e. "final.out" or "outfile.dat ") and extracting the maximum individual family lod scores by a Perl script.

The Perl scripts featured here and others are available to the research community at http://linkage.rockefeller.edu/soft/perl.