| LECTURE 3: INTRODUCTION TO GENETIC ASSOCIATION ANALYSIS |
| aa | aA | AA | |
| cases | 10 | 190 | 800 |
| controls | 3 | 100 | 900 |
| allele "a" | allele "A" | |
| case | 210 | 1790 |
| control | 106 | 1900 |
| aa+aA | AA | aa | aA+AA | |||
| case | 200 | 800 | case | 10 | 990 | |
| control | 103 | 900 | control | 3 | 1000 |
p-value=9.1 x 10-10 (allele 2-by-2 table) (or 1.3 x 10-9 if Yates' correction is used)
p-value=1.2 x 10-9 (genotype, recessive) (or, 1.8 x 10-9 if Yates' correction is used)
p-value= 0.051 (genotype, dominant) (or 0.094 if Yates' correction is used)
min(p-value(rec), p-value(dom))= 1.2 x 10-9
[COMMENTS: an alternative to Pearson's chi-square test is the Fisher's exact test -- more accurate for smaller sample sizes. ]
for the three 2-by-2 tables:
OR = (210/1790)/( 106/1900)= 2.10
OR = (200/800)/( 103/900)= 2.18
OR = (10/990)/( 3/1000)= 3.47
[COMMENTS: Odd ratio is an approximation of "relative risk", defined as Prob(affected|risk)/Prob(normal|risk). but due to the case-control sample collection design, the relative risk can never be calculated exactly. ]
OR value itself is not enough, we need to know the range of OR's (confidence interval). here is a R/SPLUS script for this calculation. This formula is due to: B Woolf (1955), "On estimating the relation between blood and disease", Annals of Human Genetics, 19:251-253. For the above three 2-by-2 tables:
95%CI: (1.65, 2.68)
95%CI: (1.69, 2.82)
95%CI: (0.92, 12.27)
[COMMENTS: when one bound is smaller than 1 and another larger than 1,
the result is not significant at p-value=0.05. ]
| From "Hydrogen" to "Oxygen": What to Expect from the Next Two Lectures? |
1. If we know which gene is responsible for the disease and if we
can "see" the gene, know whether it contains mutation or not,
the association analysis is direct. However, the marker we
examine may not have a one-to-one correspondence with the
mutation status at the disease gene. It has a distance
between the disease gene and the linkage disequilibrium may
not be complete. The marker is typed in terms of genotype,
and "phase" is usually unknown.
2. What we consider a homogeneous group of people from
which the control samples are collected may not be homogeneous
after all. Same for the case sample population. How do we
deal with heterogeneity?