President's Corner - Paralogous Genes

Published on

One of the greatest technical challenges facing clinical DNA testing labs these days is sequencing of genes that are present in more than one copy in the haploid genome.  Such genes are often called paralogs.  Perhaps the toughest situation is a functional gene adjacent to a pseudogene with nearly identical sequence and with sequence similarity extending throughout both exons and introns.  Such genes are susceptible to gene conversion events, production of hybrids between the functional gene and pseudogene and copy number variations in the functional gene.  Testing such genes is very complicated.  My strong hunch is that no clinical lab today does a perfect job of assaying such genes. 
The problem of paralogs is highly relevant to exome and genome sequencing.  Labs need to know which genes can be sequenced accurately and which not.  At PreventionGenetics we have tackled this problem head on.
We have carefully constructed a list of paralogous genes using multiple informatics approaches.  These approaches involve both BLAT and Nextgen short read mapping algorithms.  Our geneticists know when a called variant is in a paralogous region.  In cases where we have validated an accurate Sanger sequencing test for one of these paralogous regions, we of course confirm the variant using Sanger.  We have already validated tests for some of the paralogous genes involved most commonly in clinical genetics such as PKD1, CYP21A2 and NEB.  In cases where we have not yet validated a test, we do not report the variants.  We are also gradually whittling down the list of paralogous genes that we cannot yet accurately sequence by tweaking our analysis pipelines and validating new tests. 
In the more distant future, we hope that long read sequencing technology and de novo sequence assembly will come to the rescue.  Until then, both clinicians and testing labs will need to use extreme caution.