The transcription or the expression of a gene(the process by which the DNA sequence is converted into a functional product like protein or RNA) is controlled by the region of the DNA generally present upstream of the gene. This region consists of several short segments(also known as motifs) which act as binding sites to proteins called transcription factors. It is generally believed that genes that share the same multiple regulators must show similar expression profiles or vice versa the genes that are show close expression patterns could be regulated by the same set of transcription factors.
If we look closer at the regulatory regions of a known set of co-expressed genes in a particular tissue, will it give a rule for how the architecture(the min or maximum number of binding sites, spacing of these binding sites, orientation of these binding sites etc.) of such regulatory regions look like and explain something about their evolution ?
This is exactly what the authors have done in this paper in science(subscription required). They have used the 19 genes that are co-expressed in muscle cells of developing urochordate Ciona embryo. Of these 19 genes, 17 function in the same macromolecular complex, underscoring the requirement for tight coexpression. These 19 genes include six single-copy loci (sequences in a genome that do not share homology with any other sequences in the same genome). Seven genes are composed of two or three members(paralogs) of multicopy gene families. We also know that genes that are expressed in Ciona are predominantly regulated by three different binding elements in their regulatory regions. These elements are 1) cAMP response element called CRE 2)MyoD motif 3) Tbx6 motif. These elements can be described in terms of DNA sequence bases they are composed of.
The authors study the distribution, composition and strength of these motifs in the upstream regulatory regions of the 19 genes. They found that there was high degree of heterogenity in these regulatory genes. There was no common feature they could discern from all of these 19 loci. So how to account for the co-exp of these genes, the authors show that it is done by conserving locus specific distribution of these features. This can be see more clearly in the following picture (B).
(B) Distribution of cis-regulatory function at the 19 loci of this study. Cs, Ciona. savignyi; Ci, Ciona. intestinalis. Labels below axes indicate distance to transcription start site. Area of circle is proportional to estimated motif activity. Motifs are depicted as circles, and color indicates motif type: CRE (red), MyoD (green), and Tbx6 (blue).
We do not see any commonality in the locus, but we see that the architecture of the regulatory regions are conserved in the specific locus, for example the Ck.ci(creatine kinase gene from Ciona intesinalis) and Ck.cs (from Ciona savignyi) have the same distribution of the motifs. This locus specific conservation the authors saw only in the six single copy genes. There was a higher degree of heterogenity in the paralogous cluster of genes in terms of both sequence and functional turn over.
Thus the authors conclude “Thus, the syntactical rules governing this regulatory function are flexible but become highly constrained evolutionarily once they are established in a particular element.”
Brown, C.D., Johnson, D.S., Sidow, A. (2007). Functional Architecture and Evolution of Transcriptional Elements That Drive Gene Coexpression. Science, 317(5844), 1557-1560. DOI: 10.1126/science.1145893