Exomiser - Using model organisms for deeper genomic insights
Every one of us possesses variants throughout our genetic code. Millions of them. Most are harmless, some may even be beneficial, but a tragic fraction carry the devastating consequences of rare and potentially deadly diseases. It is the identification of these dangerous, pathogenic variants that is the basis for genomic medicine.
Whereas as a genome contains all of the genetic material of an organism - 3.2 billion letters, or ‘base pairs’, in a human - an exome is the part formed by exons. These are the parts of the genome that are actually transcribed into RNA, and thus proteins. The typical exome sequence of any individual is just 2% of the genome and commonly contains more than 30,000 variants. When an exome is sequenced and analysed we identify thousands of exomic variants relative to the human reference (or ‘normal’) genome.
The challenge therefore becomes sorting through them and deciding which variant is a harmless one, and which one is causing the disease.
SIFTing for a solution
One solution for this has come in the form of the Java based, open-source tool, Exomiser, which uses algorithms to annotate and prioritise variants from whole-exome sequencing. The program was developed in 2014 by the Monarch Initiative, a cross institutional collaboration between the UK’s Wellcome Trust Sanger Institute, Berlin’s Charite Universitatsmedizin and a number of leading American and British Universities.
Exomiser compares genetic variants with Human Phenotype Ontology (HPO) terms. HPO terms are standardised descriptions of how genetic variants may manifest in a patient, as well as those of common model organisms, such as mice or zebrafish. For example, the HPO term HP0004925 codes for Chronic Lactic Acidosis, or chronic build up of lactic acid in the muscles. By using these standardised terms, symptoms between patients, and even between model organisms can be compared.
Genetic research has highlighted a huge number of associations between specific genes and their phenotypic symptoms. For example, the BBS5 gene, which is required for the healthy development of cilia, and its role in Bardet-Biedl Syndrome. By using model organisms alongside humans, we can add almost 30,000 more genes with known phenotype associations to the dataset, allowing for faster and more accurate identification of causative genes.
Exomiser also uses ‘semantic comparison methodology’, which compares genes based on the similarity of their function rather than their position in the genome sequence. Exomiser uses two pre-computed scores taken from the Sorting Intolerant from Tolerant (SIFT) and Polymorphism Phenotyping version 2 (PolyPhen-2) databases, which both predict how amino acid substitution affects protein function.
Exomiser then assigns a Phenotypic Relevance Score by cross referencing any variants found in the patient against the effects of that gene in model organisms, as well as other humans. Based upon the variant’s known pathogenicity and the likelihood of it being linked to the patients’ recorded phenotypes. Together these form the basis of Exomiser’s final hiPHIVE Score in which it designates the variant’s potential pathogenicity.
The Exomiser suite assigns further scores to support its final hiPHIVE score. The first of these is the ‘variant score’, which is based on allele frequency - the frequency that a variant is seen in the population. Next comes a ‘gene phenotype score’ that decides how critical the gene in question is to the patient’s recorded HPO terms, for example if an individual carries a mutation in both the BRCA1 and BRCA2 genes, the risk of breast cancer leaps up to as much as 90% in later life. Finally, a ‘gene variant score’ is applied to the most potentially dangerous possible genes. The various scores are brought together to form a gene combined score upon which it is ultimately ranked.
This score and its related phenotypic information is then checked against the Online Mendelian Inheritance of Man (OMIM) and Orphanet databases, which catalogue human genes linked to disorders and disease.
The Challenge of Genomic Data
Despite the great strides that scientists have been making in recent years, only around 35% of the human coding genes have been sequenced and identified. Exomiser adds tens of thousands of genes from model organisms used throughout research, and uses HPO terms to bring uniformity to their descriptions. It then uses this data to boost the identification rates of various phenotypes and their causative genes.
The technology can work across exome sequences, gene panels and even be used on the vastly larger and less widely understood whole genome sequences, though it may still struggle with elements such as structural or copy number variants. Exomiser is more than capable of supporting a clinical scientist to drill swiftly down through the millions of possibilities to find the variant that is causing a disease.
Exomiser performs best in situations where a patient’s phenotypes (or ‘symptoms’) have been well defined throughout their health record. Whilst Exomiser can be run in isolation on any computer, it is at its most streamlined when optimised into a wider suite of tools, ideally integrated into a comprehensive clinical genomics analysis platform such as SapientiaTM, from Cambridge based Sanger spin out Congenica.
Such platforms enable clinicians to apply Exomiser alongside numerous other open source and proprietary solutions to their clinical or research investigations. Bringing with it, in the case of Sapientia, the platform’s own extensive knowledge base on top of sources such as OMIM or Orphanet integrations, putting more diagnostic power at the clinician’s fingertips than any other solution.
Ever since the first successful identification of a disease causing variant from Whole Exome Sequencing (WES) in 2010 the industry as a whole has made impressive advances. The extra diagnostic power that Exomiser can bring to a clinician can be the critical element by essentially doubling the number of gene associations available. It can save time and money for institutes and experts and provide the diagnoses and treatment possibilities for which rare disease patients are so desperate, and in the long-run, save lives.