Publication

Features of Recent Codon Evolution: A Comparative Polymorphism-Fixation Study

Features of amino-acid and codon changes can provide us important insights on protein evolution. So far, investigators have often examined mutation patterns at either interspecies fixed substitution or intraspecies nucleotide polymorphism level, but not both. Here, we performed a unique analysis of a combined set of intra-species polymorphisms and inter-species substitutions in human codons. Strong difference in mutational pattern was found at codon positions 1, 2, and 3 between the polymorphism and fixation data. Fixation had strong bias towards increasing the rarest codons but decreasing the most frequently used codons, suggesting that codon equilibrium has not been reached yet. We detected strong CpG effect on CG-containing codons and subsequent suppression by fixation. Finally, we detected the signature of purifying selection against A vertical bar U dinucleotides at synonymous dicodon boundaries. Overall, fixation process could effectively and quickly correct the volatile changes introduced by polymorphisms so that codon changes could be gradual and directional and that codon composition could be kept relatively stable during evolution.

A compiled and systematic reference map of nucleosome positions across the Saccharomyces cerevisiae genome

Nucleosomes have position-specific functions in controlling gene expression. A complete systematic genome-wide reference map of absolute and relative nucleosome positions is needed to minimize potential confusion when referring to the function of individual nucleosomes (or nucleosome-free regions) across datasets. We compiled six high-resolution genome-wide maps of Saccharomyces cerevisiae nucleosome positions from multiple labs and detection platforms, and report new insights. Data downloads, reference position assignment software, queries, and a visualization browser are available online http://atlas.bx.psu.edu/.

Identification and Nomenclature of the Consensus Nucleosomes Across the Yeast Genome

Interaction of Transcriptional Regulators with Specific Nucleosomes across the Saccharomyces Genome

A canonical nucleosome architecture around promoters establishes the context in which proteins regulate gene expression. Whether gene regulatory proteins that interact with nucleosomes are selective for individual nucleosome positions across the genome is not known. Here, we examine on a genomic scale several protein-nucleosome interactions, including those that (1) bind histones (Bdf1/SWR1 and Srm1), (2) bind specific DNA sequences (Rap1 and Reb1), and (3) potentially collide with nucleosomes during transcription (RNA polymerase 11). We find that the Bdf1/SWR1 complex forms a dinucleosome complex that is selective for the +1 and +2 nucleosomes of active genes. Rap1 selectively binds to its cognate site on the rotationally exposed first and second helical turn of nucleosomal DNA. We find that a transcribing RNA polymerase creates a delocalized state of resident nucleosomes. These findings suggest that nucleosomes around promoter regions have position-specific functions and that some gene regulators have position-specific nucleosomal interactions.

Nucleosome positioning and gene regulation: advances through genomics

Knowing the precise locations of nucleosomes in a genome is key to understanding how genes are regulated. Recent 'next generation' ChIP-chip and ChIP-Seq technologies have accelerated our understanding of the basic principles of chromatin organization. Here we discuss what high-resolution genome-wide maps of nucleosome positions have taught us about how nucleosome positioning demarcates promoter regions and transcriptional start sites, and how the composition and structure of promoter nucleosomes facilitate or inhibit transcription. A detailed picture is starting to emerge of how diverse factors, including underlying DNA sequences and chromatin remodelling complexes, influence nucleosome positioning.

A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome

Most nucleosomes are well-organized at the 5' ends of S. cerevisiae genes where "-1" and "+1" nucleosomes bracket a nucleosome-free promoter region (NFR). How nucleosomal organization is specified by the genome is less clear. Here we establish and inter-relate rules governing genomic nucleosome organization by sequencing DNA from more than one million immunopurified S. cerevisiae nucleosomes ( displayed at http://atlas.bx.psu.edu/). Evidence is presented that the organization of nucleosomes throughout genes is largely a consequence of statistical packing principles. The genomic sequence specifies the location of the -1 and +1 nucleosomes. The +1 nucleosome forms a barrier against which nucleosomes are packed, resulting in uniform positioning, which decays at farther distances from the barrier. We present evidence for a novel 3' NFR that is present at 95% of all genes. 3' NFRs may be important for transcription termination and anti-sense initiation. We present a high-resolution genome-wide map of TFIIB locations that implicates 3' NFRs in gene looping.

Association of ADH and ALDH genes with alcohol dependence in the Irish Affected Sib Pair Study of alcohol dependence (IASPSAD) sample

Background: The genes coding for ethanol metabolism enzymes [alcohol dehydrogenase (ADH) and aldehyde dehydrogenase (ALDH)] have been widely studied for their influence on the risk to develop alcohol dependence (AD). However, the relation between polymorphisms of these metabolism genes and AD in Caucasian subjects has not been clearly established. The present study examined evidence for the association of alcohol metabolism genes with AD in the Irish Affected Sib Pair Study of alcohol dependence.Methods: We conducted a case-control association study with 575 independent subjects who met Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, AD diagnosis and 530 controls. A total of 77 single nucleotide polymorphisms (SNPs) in the seven ADH (ADH1-7) and two ALDH genes (ALDH1A1 and ALDH2) were genotyped using the Illumina GoldenGate protocols. Several statistical procedures were implemented to control for false discoveries.Results: All markers with minor allele frequency greater than 0.01 were in Hardy-Weinberg equilibrium. Numerous SNPs in ADH genes showed association with AD, including one marker in the coding region of ADH1C (rs1693482 in exon6, Ile271Gln). Haplotypic association was observed in the ADH5 and ADH1C genes, and in a long haplotype block formed by the ADH1A and ADH1B loci. We detected two significant interactions between pairs of markers in intron 6 of ADH6 and intron 12 of ALDH2 (p = 5 x 10(-5)), and 5' of both ADH4 and A DH1A (p = 2 x 10(-4)).Conclusion: We found evidence for the association of several ADH genes with AD in a sample of Western European origin. The significant interaction effects between markers in ADH and ALDH genes suggest possible epistatic roles between alcohol metabolic enzymes in the risk for AD.

GeneTrack - a genomic data processing and visualization framework

Motivation: High-throughput ChIP-chip and ChIP-seq methodologies generate sufficiently large data sets that analysis poses significant informatics challenges, particularly for research groups with modest computational support. To address this challenge, we devised a software platform for storing, analyzing and visualizing high resolution genome-wide binding data. GeneTrack automates several steps of a typical data processing pipeline, including smoothing and peak detection, and facilitates dissemination of the results via the web. Our software is freely available via the Google Project Hosting environment at http://genetrack.googlecode.com.

NELF and GAGA factor are linked to promoter-proximal pausing at many genes in Drosophila

Recent analyses of RNA polymerase II (Pol II) revealed that Pol II is concentrated at the promoters of many active and inactive genes. NELF causes Pol II to pause in the promoter-proximal region of the hsp70 gene in Drosophila melanogaster. In this study, genome-wide location analysis (chromatin immunoprecipitation-microarray chip [ChIP-chip] analysis) revealed that NELF is concentrated at the 5' ends of 2,111 genes in Drosophila cells. Permanganate genomic footprinting was used to determine if paused Pol II colocalized with NELF. Forty-six of 56 genes with NELF were found to have paused Pol II. Pol 11 pauses 30 to 50 nucleotides downstream from transcription start sites. Analysis of DNA sequences in the vicinity of paused Pol II identified a conserved DNA sequence that probably associates with TFIID but detected no evidence of RNA secondary structures or other conserved sequences that might directly control elongation. ChIP-chip experiments indicate that GAGA factor associates with 39% of the genes that have NELF. Surprisingly, NELF associates with almost one-half of the most highly expressed genes, indicating that NELF is not necessarily a repressor of gene expression. NELF-associated pausing of Pol H might be an obligatory but sometimes transient checkpoint during the transcription cycle.

Nucleosome organization in the Drosophila genome

Comparative genomics of nucleosome positions provides a powerful means for understanding how the organization of chromatin and the transcription machinery co-evolve. Here we produce a high-resolution reference map of H2A.Z and bulk nucleosome locations across the genome of the fly Drosophila melanogaster and compare it to that from the yeast Saccharomyces cerevisiae. Like Saccharomyces, Drosophila nucleosomes are organized around active transcription start sites in a canonical -1, nucleosome-free region, +1 arrangement. However, Drosophila does not incorporate H2A.Z into the -1 nucleosome and does not bury its transcriptional start site in the +1 nucleosome. At thousands of genes, RNA polymerase II engages the +1 nucleosome and pauses. How the transcription initiation machinery contends with the +1 nucleosome seems to be fundamentally different across major eukaryotic lines.