A Blog on Cutting Edge Molecular Biosciences Research-Weekly Digests from High Impact Journals

Monday, November 26, 2012

Can Genome-Wide Association Studies (GWAS) Solve Complex Diseases?


       The field of Human Genetics has developed as the study of Mendelian-monogenic diseases. Their identification  not only enables diagnosis-prognosis of a given disease but also uncovers the molecular mechanism of the underlying condition. On the other end of the spectrum there are common-complex diseases. These multi-factorial diseases, such as cancer, heart disease, diabetes and stroke also have some level of genetic component as shown by the family pedigrees of the patients. However, they are caused by complex interactions of several genetic variants as well as environmental factors.

Figure: Frequency of the variants and the sizes of their effects have inverse correlation (image from http://www.sciencemag.org)

     The Mendelian diseases and the associated variants/mutations are rare because they are selected against in the population (unless the disease is not harmful but in that case it will be classified as a trait). Because the Mendelian variants are simple and have high effects, they predict the disease with high confidence. On the other hand, variants for diseases such as cancer or diabetes have low effects but they are predicted to be rare or common. Since these diseases are responsible for large fractions of mortality and morbidity rates and the associated health care costs, the possibilities for improving patient care are highly sought after. Genome-wide association studies, which look for the variants associated with a given disease, therefore has been a major hope against complex diseases. However, as the news article in this week's Science points out, the success rates are below expectations. For example, a diabetes study carried on 2700 patients, have not found any new variants above 1.5% in frequency with strong effects on diabetes risk. These low effect variants not only fail to predict the disease with high confidence but also fall short to explain the mechanism of the diseases as Mendelian variants do.



     One strategy to overcome the challenges can be/has been to combine the expression values of multiple genes and use gene-panels or gene-signatures to predict diseases. The developments in Microarray and Next-Gen RNA sequencing technologies can move the field in this direction at least in the diagnosis/prognosis side of the problem.

Genetic Influences on Disease Remain Hidden
  • Jocelyn Kaiser
Science 23 November 2012338 (6110), 1016-1017. [DOI:10.1126/science.338.6110.1016]

Tuesday, May 15, 2012

Personalized Cancer Medicine: Individual Patient Therapy through Next-Gen Sequencing



On Dancey et al., 2012 in Cell and 
On Roychowdhury et al.,2011 in Sci Transl Med

Human Cancer is a complex, multi-factorial and heterogeneous disease. Among others these features make it challenging to manage and treat. However, in its most simplified form cancer is a disease of acquired or inherited mutations.

There can be hundreds of thousands of mutations in a given cancer. Most of these are "passenger" mutations which are gained along the progression of the cancer and have no functional effect. The "driver" mutations on the other hand are important for the progression of cancer and are causally responsible for disturbing the normal balance of the cell and leading to the tumor formation. The driver mutations are also important for diagnostic and prognostic purposes and even for therapeutic targeting. There are a couple of thousands of recurrent mutations reported in human cancer (International Cancer Genome Consortium (ICGC) database).

However, although not the norm, as few as 4 or 5 driver mutations can be enough to carry a given cancer. Therefore, it makes sense that perturbing the pathways of these few mutations that the cancer depends on can have a big therapeutic effect. However, since the disease is very heterogeneous, these few driver mutations are different for every patient. With the sanger sequencing it is impossible to detect the mutated genes out of the 3 billion bases that make up the genome. High throughput next generation sequencing technology (NGS) is a game changer in this sense. With NGS, it is now possible to know the bulk sequence of a given human genome in a couple of weeks.

The current standard chemotherapy for cancer involves harsh treatment that targets all the dividing cells of the body. This one-size-fits-all therapy does not take into account that every cancer is different aside from the histopathology and the clinical history of the patients.


Personalized Medicine (PM) is a term that gained a different meaning with the advent of the Next Generation Sequencing technologies (NGS). It depends on the ability to sequence all or almost all of the mutational repertoire of a given patient and make an informed treatment choice based on this knowledge (which is different for each patient).

However, determining the mutation landscape of a cancer is far from being enough. Among the thousands of driver mutations identified in human cancer, only a small subset is researched enough to have any diagnostic prognostic and therapeutic implications. Therefore, substantial clinical and basic research is needed to increase effectiveness of PM and improve cancer patient care.

Although the cost of the high throughput sequencing associated with PM has been a major concern,  there are now several competing companies and technologies which let the cost dropping every day. In a pilot study published recently by Roychowdhury et al., a package including (i) shallow genome sequencing of the tumor, (ii) exome sequencing of the tumor and the matched normal tissue, (iii) paired-end transcriptome sequencing of the tumor cost $5400 and decreased to $3600 in the 6 months during the course of the study (the study was published in November 2011).


The closest alternative to the NGS, the Sanger sequencing of a single gene or small gene panel, has some value if the tumor has the mutation, but this approach gives no additional information. Even if the tumor is positive for the mutation, there might be other mutations complicating the treatment associated with the mutation. For example, for colorectal cancer patients the EGFR antibodies are ineffective in the background of  KRAS mutations and RTK activation makes some cancer cells resistant to the inhibitors of mutated BRAF.


One additional novelty came coincidentally with the PM is the xenograft tumor models. A piece of the patients tumor is grafted to a test animal to create a personalized model of one patient's cancer.  This will enable trial and error to find best treatment before testing on the patient. In addition when combined with NGS it will enable researching on the individual patient's disease even after the treatment is over and to find the best treatment in  cases with similar genomic profile.

The hope is that with the cancer genome sequencing is getting cheaper and more prevalent in basic and clinical research, the repertoire of recurrent and actionable mutations will continue to increase and the personalized medicine will be more effective in treating this deadly disease.


Personalized Oncology Through Integrative High-Throughput Sequencing: A Pilot Study
Sameek Roychowdhury, Matthew K. Iyer, Dan R. Robinson, Robert J. Lonigro, Yi-Mi Wu, Xuhong Cao, Shanker Kalyana-Sundaram, Lee Sam, O. Alejandro Balbin, Michael J. Quist, Terrence Barrette, Jessica Everett, Javed Siddiqui, Lakshmi P. Kunju, Nora Navone, John C. Araujo, Patricia Troncoso, Christopher J. Logothetis, Jeffrey W. Innis, David C. Smith, Christopher D. Lao, Scott Y. Kim, J. Scott Roberts, Stephen B. Gruber, Kenneth J. Pienta, Moshe Talpaz, and Arul M. Chinnaiyan 

Sci Transl Med 30 November 2011


The Genetic Basis for Cancer Treatment Decisions
Janet E. Dancey, Philippe L. Bedard, Nicole Onetto, Thomas J. Hudson,
Cell, Volume 148, Issue 3, 3 February 2012,

Wednesday, March 7, 2012

Clonal Evolution of a Tumor in Two Separate Hosts


On Weigert et al., 2012  in Cancer Discovery

Follicular lymphoma is the tumor of Follicle Center B-cells. Although Bcl2-IGH rearrangement is the main hallmark of this disease, additional genetic events are required for the tumor formation. The studies on the  molecular evolution and the time-course of these genetic events had been missing because of the indolent nature of the disease and because of the scarcity of patient specimens. Weigert et al., in the recent issue of Cancer Discovery present a rare and interesting case which demonstrates clonal evolution in follicular lymphoma.

In 2000, a 41-year old female chronic myelogenous leukemia (CML) patient had started receiving bone marrow transplantation and leukocyte infusions from her HLA-matched sister. This myeloablative therapy combined with cytokine and Gleevec therapies had resulted in complete molecular remission of her CML. 9 years after the first transplant and 7 years after the last leukocyte infusion, the donor sister was diagnosed with grade II follicular lymphoma. Interestingly, in 6 months, recipient sister was also diagnosed with the same grade follicular lymphoma. Molecular genetic analyses showed that the lymphomas originated from the tumorigenic cells of the donor sister which were dormant at the time of infusions.

This kind of donor-derived malignancies after hematopoietic transfer is rare but has been reported in the literature. What makes this study unique is that the authors had access to the DNA samples from the leukocyte infusions as well as to the tumor DNA samples from the patients. By deep sequencing the gene-coding regions, they were able to observe clonal evolution of the genetic changes in two separate hosts.

With deep sequencing they detected 12 non-synonymous mutations and 2 coding insertions/deteletions common to both patients. All but one of these changes were present in the leukocyte infusion thus they were acquired before the transplantation. The authors also detected 3 mutations unique to the donor's lymphoma and 4 mutations unique to the recipient's, suggesting divergent evolution after the delivery of last leukocyte infusion.

One of the mutations unique to the recipient caused a premature stop codon in ARID1A gene and loss of the protein product. Intriguingly, although the donor did not have this mutation, she also had reduced ARID1A protein. Further analyses shown that the donor had copy-number loss at this locus (deletion) in at least a fraction of the tumor cells. In this way, the authors reported convergent evolution of a tumor in two hosts by showing the loss of the same protein via different routes.

Another implication of the study is the development of the tumors with surprisingly similar latency. 7 years after the transplant, the donor presented with the tumor.The time lag was only 6 months for the other patient to develop the disease despite the immunosuppression associated with the initial CML treatment.


  • Oliver Weigert
  • Nadja Kopp
  • Andrew A. Lane
  • Akinori Yoda
  • Suzanne E. Dahlberg,
  • Donna Neuberg
  • Anita Y. Bahar
  • Bjoern Chapuy
  • Jeffery L. Kutok,
  • Janina A. Longtine
  • Frank C. Kuo
  • Terry Haley
  • Maura Salois
  • Timothy J. Sullivan,
  • David C. Fisher
  • Edward A. Fox
  • Scott J. Rodig
  • Joseph H. Antin,
  • and David M. Weinstock
    Molecular Ontogeny of Donor-Derived Follicular Lymphomas Occurring after Hematopoietic Cell Transplantation
    Cancer Discovery; 2(1); 47–55

Wednesday, February 22, 2012

An Epigenetic Factor Directs Meiotic Recombination

Image Courtesy of Science Photo Library C009/9397

On Grey et al., 2011 in PLoS Biology

Although the principles of inheritance as described by Gregor Mendel opened the era of Genetics, the works of another prominent scientist Thomas Hunt Morgan, helped establish the still enigmatic concepts of genetic linkage and homologous recombination (i.e. cross over). Morgan discovered that linked genes (genes that are on the same chromosome), do not always cosegregate and hypothesized the phenomenon of cross over. Although he had assumed the cross over frequency would be similar between even-spaced loci, it is later discovered that there are "recombination hotspots" where cross over frequency is higher than elsewhere on a given chromosome.

Using genome-wide linkage disequilibrium studies, about 30000 hotspots were identified in human genome. The recombinations are clustered in 1-2kb regions indicating the location of the hotspots. Although a consensus site of 7 -13 bases was shown to be present in a fraction of these regions, the biological mechanism of the hotspot formation was elusive.

As a significant step towards the elucidation of the biology of recombination hotspots, a group led by Bernard de Massy, of CNRS France discovered  a mechanistic factor behind the hotspot formation with a series of papers published in Science and in PLoS Biology. First (PLoS Biology, 2009), the group used mouse genetics to narrow down the genetic locus required for the recombination phenotype of a particular hotspot. Then they identified this as a trans-acting factor affecting cross overs genome-wide.

As recombination hotspots are associated with histone H3K4 trimethylation, PRDM9, a histone metyhltransferase expressed during meiotic prophase, was selected as a candidate gene in the locus. Next the authors correlated the variations in the PRDM9 gene with recombination phenotypes observed in mouse strains as well as in genotyped individuals. It is notable that these variations are mostly found in the DNA-binding zinc-finger domain, which is proposed to determine the location of the hotspot in the genome. In addition, they showed in vitro interactions between PRDM9 variants and the hotspot consensus sequences which supports the recruitment of the histone modification activity to the hotspot.

There is no genome-wide binding analysis for PRDM9 yet, but a next direction in this research field might involve genome-wide Chromatin IP for PRDM9, which could reveal any overlap between recombination hotspots and PRDM9 binding.

In the last paper of the series, in PloS Biology, they used the same mouse strains to confirm the role of PRDM9 in hotspot formation. PRDM9 variants in these strains contain 24 non-synonymous changes, all but one being in the DNA binding domain. By engineering the PRDM9 allele in transgenic mice they switched the recombination phenotype as scored by chromosome-wide cross-over analysis. However, they noted that the PRDM9 target sequence was also important in the recombination phenotype as changes in DNA sequence effected PRDM9 binding in vitro.

These exciting advances in meiotic recombination will have implications in the fields of population genetics, natural selection and will further prime the progress in meiosis and recombination at the molecular level.


Grey, Corinne AND Barthès, Pauline AND Chauveau-Le Friec, Gaëlle  AND Langa, Francina AND Baudat, Frédéric AND de Massy, Bernard
Mouse PRDM9 DNA-Binding Specificity Determines Sites of Histone H3 Lysine 4 Trimethylation for Initiation of Meiotic Recombination
PLoS Biol, 2011 Oct;9(10):e1001176. doi: 10.1371/journal.pbio.1001176


Baudat F, Buard J, Grey C, Fledel-Alon A, Ober C, Przeworski M, Coop G, de Massy B.PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science, 2010 Feb 12;327(5967):836-40

Grey C, Baudat F, de Massy B. Genome-wide control of the distribution of meiotic recombination.  PLoS Biol 7(2): e1000035.doi:10.1371/journal.pbio.1000035

Sunday, January 22, 2012

Stable (Heritable) DNA Transfer without Antibiotic Selection?


On Mendiburo et al., 2011 in Science

     Centromeres are the chromatin regions that are specialized to carry out the chromosome segregation during cell division. Because the spindle attachment sites are stable and inherited, initial analyses suggested that centromeres were sequence specific and the centromere formation is dictated by DNA sequence alone.  However, in very rare occasions centromeres can leave this stable spot and move to a new chromosomal site. These "neocentromeres" are as valid as natural centromeres and segregate the chromosomes with high fidelity.  This finding implies that the source of the specificity and the heritability of the centromeres might not be genetic but rather epigenetic.
A clue comes from the finding that specialized histone H3, called centromere protein A (CENP-A), is incorporated to the nucleosomes at centromeric loci. Mendiburo et al, hypothesized that this special histone could mark the centromeres on the chromosome. (Mendiburo et al., 2012) To test this they fused  CENP-A to bacterial LacI protein and engineered a LacO site  (which binds to the LacI) in their cell culture model. Surprisingly CENP-A was localized to the LacO and was incorporated to the local nucleosomes and recruited even the endogenous CENP-A to the new site.  This neocentromere was inherited and associated with spindle assembly normally. Although adding a second centromere causes mitotic failure and cell death, they also show that plasmids engineered like this can act as artificial chromosomes.
This brings up the idea that recombinant DNA transfected to eukaryotic cultured cells can be made heritable and stable without the need for heavy antibiotic selection and genome integration. Antibiotic selection is very strong and kills all the cells without the plasmid and the cells which looses the plasmid and only genome integrated copies remain. Some cells are fragile and do not survive this process. Another disadvantage is that genome integration is random and might cause aberrations if the integration site is vital. This is especially important in the case of gene therapy because genome integration could cause additional defects while mending the disease.



  • María José Mendiburo

  • Jan PadekenStefanie Fülöp,Aloys Schepers and Patrick HeunDrosophila CENH3 Is Sufficient for Centromere FormationScience 4 November 2011334 (6056), 686-690. [DOI:10.1126/science.1206880]

    Saturday, January 7, 2012

    Long Non-Coding RNAs


    On Ørom and Shiekhattar, 2011 in Curr Opin Genet Dev

    Approximately 1.5 % of our genome is transcribed to protein-coding mRNA.  The existence of noncoding RNAs such as ribosomal and transfer RNAs as well as small nucleolar RNAs has been known for a long while now. More recently the RNA interference machinery and the associated microRNAs have been associated with important functions.  Now thanks to next-gen sequencing, it is becoming clear that remainder of the genome is not all junk but could code for long noncoding RNAs (long ncRNA or lncRNA).

    Ørom and Shiekhattar give a review of the current research on these RNAs and they elaborate on the link with enhancers (Ørom and Shiekhattar, 2011). The authors note that the current understanding on long ncRNAs is very primitive. lncRNAs are very heterogenous which makes it hard to classify and decode the information they carry. Their distinctive property is that they lack protein-coding ORFs and they are longer than small regulatory RNAs .

    Some of the complexities include changes in size and copy number (average length ~1kb).  Specific studies such as the ones on the HOX gene cluster and the beta-globin locus show that some lncRNAs can have regulatory functions on the expression of nearby protein coding genes. Some of the studied lncRNAs act in cis while others in trans which adds to the heterogeneity. In addition some of the lncRNAs have features of protein coding mRNAs such as polyAdenylation and splicing.

    In both mouse and human cells, some lncRNAs are associated with enhancers. The elucidation of histone marks around the active enhancers supports this hypothesis. Although correlative analyses implicate these lncRNAs in the positive regulation of nearby genes, a mechanistic understanding is missing. What kind of code or information do lncRNAs have, to mediate their functions? This seems to be the foremost challange  in the field.  Their hetreogeneity and the possibility of multiple mechanisms granted, one hypothesis suggests that 3-dimensional structures of lncRNAs might interact with transcription factors and function to alter gene expression. However, published studies are not enough to reach a conclusion about this theory, yet.



    Ulf Andersson Ørom, Ramin Shiekhattar, Long non-coding RNAs and enhancers, Current Opinion in Genetics & Development, Volume 21, Issue 2, April 2011, Pages 194-198, ISSN 0959-437X, 10.1016/j.gde.2011.01.020.