Is either of these methods (technically) wrong? How do I decide which one to use for each data set (i.e. is it valid to use method #1 on 3-allele-data sets)? Or do I just use whatever method is easiest? Doesn't this lead to discrepancies in the results? This seems like a big thing, especially when comparing real-world observation data to assumed HW distributions.
I am more than flabbergasted by calculations of Hardy-Weinberg equilibria. The formula's theory assumes a binomal distribution of allele frequencies in a population, and hence allows the comparison of observed phenotype frequencies to an ideal model distribution.
p2+2pq+q2=1
with
p2 = Frequency of homozygous phenotype PP
2pq = Frequency of heterozygous phenotype PQ
q2 = Frequency of homozygous phenotype QQ
There are two different methods of working with the HW equation that I have come across, and they do not seem to be compatible; used on the same data sets they yield wildly different results. I feel like I'm not seeing the forest for the trees, to be honest.
Is either of these methods (technically) wrong? How do I decide which one to use for each data set (i.e. is it valid to use method #1 on 3-allele-data sets)? Or do I just use whatever method is easiest? Doesn't this lead to discrepancies in the results? This seems like a big thing, especially when comparing real-world observation data to assumed HW distributions.
Example data set (EST/California) and results of HW calculations. I'm using observed phenotype in the place of the original data's observed genotype.
As suggested by above-linked data set source and Beebee/Rowe's Introduction to Molecular Ecology, alleles are simply added up to get values for p and q respectively:
From this we can directly infer p/q: p=94/(94+34)=0.73 and q=34/(94+34)=0.27; with the condition p+q=1 remaining true.
Using these values for p and q in the HW equation yields:
p2+2pq+q2=1(0.73)2+2(0.73)(0.27)+(0.27)2=1
And a population-wide phenotype distribution of
The second method of utilising the HW equation is a lot more direct, and just about every text book I could find uses this method to solve 3+ allele problems (as, for example, blood types); Wikipedia commends its usefulness in calculating heterozygote frequencies of genetic diseases in large populations.
In this calculation frequencies are directly inferred as per the definition of the HW equation. This means p2 is equal to the frequency of the phenotype PP (see above).
p2=3764p=ā0.578pā0.76
q=1āp=0.24
Leading to a HW distribution of
(0.76)2+2(0.76)(0.24)+(0.24)2=1
With a new ideal phenotype distribution:
Inverting the calculation by starting with q2=7/64 leads to even more diverging results:
Using both p2 and q2 inevitably violates the condition (p+q)2=1.