Publication | Closed Access
Automated binning of microsatellite alleles: problems and solutions
1.5K
Citations
16
References
2006
Year
Genetic TestingGeneticsGenomicsRaw Allele LengthsEffective Repeat LengthGenetic AnalysisAutomated BinningMolecular EcologyComputational GenomicsBiostatisticsPublic HealthHaplotype DeterminationDna SequencingAllele ClassesSequence AnalysisStatistical GeneticsGenetic VariationPopulation GeneticsBioinformaticsComputational BiologyMedicine
Automated microsatellite genotyping must avoid rising allele‑calling errors, yet current binning methods assume perfect collinearity between expected and measured fragment lengths, which is rarely true and disrupts the expected 2‑, 3‑ or 4‑base‑pair periodicity. The authors aim to mitigate these binning errors by developing a method that permits repeat units to be fractionally shorter or longer than their theoretical length. Their algorithm adjusts bin boundaries based on observed fragment lengths, allowing fractional repeat unit lengths rather than forcing whole‑base increments. Applied to a large human dataset, the method improves binning accuracy, with the effective repeat length within 5% of the assumed length only 68.3% of the time, demonstrating its effectiveness across many dinucleotide loci.
Abstract As genotyping methods move ever closer to full automation, care must be taken to ensure that there is no equivalent rise in allele‐calling error rates. One clear source of error lies with how raw allele lengths are converted into allele classes, a process referred to as binning. Standard automated approaches usually assume collinearity between expected and measured fragment length. Unfortunately, such collinearity is often only approximate, with the consequence that alleles do not conform to a perfect 2‐, 3‐ or 4‐base‐pair periodicity. To account for these problems, we introduce a method that allows repeat units to be fractionally shorter or longer than their theoretical value. Tested on a large human data set, our algorithm performs well over a wide range of dinucleotide repeat loci. The size of the problem caused by sticking to whole numbers of bases is indicated by the fact that the effective repeat length was within 5% of the assumed length only 68.3% of the time.
| Year | Citations | |
|---|---|---|
Page 1
Page 1