More About gClusters
For a quick look at a genomics data-mashup created with gClusters, here is an example for the neural-precursor specific gene, neuralized: neur_gene_data_mashup. Click once on the image to make it move out of the way, so that you can see the full mashup.
gClusters, a Genomics Data Mashup web application: Content Managment Systems (CMSes), such as Drupal and Frontier, are extremely flexible and powerful platforms for building sophisticated web portals for integrating complex data ("data-mashups") from different online databases for use in genomics research. One such web app that I and my colleagues have previously developed is gClusters (short for Genes with Clusters), which was designed specifically for transcriptional genomics, in particular "DNA transcription code" analysis. We and others have found that DNA transcription codes are specific combinations and arrangements of DNA binding site clusters for specific transcription factors. Once the code for one gene in the pathway has been identified (in this case, the proneural gene achaete), it can be used to define a search algorithm for computationally searching the entire genome for genes with similar codes in their promoter regions.
After reading the introduction on the demo page, click on the various tabs at the top, such as Early Neural Genes. These are the same set of genes as shown in the Alphabetical tab, but they have been prioritized according how much and what type of gene function info they each have that is related to the pathway of interest. This is further described on the home page for the app. Early Neural Genes have been prioritized because their Gene Function data indicates that they are involved in early steps of the neural differentiation pathway, such as neuroblast formation, Notch signaling, etc. By contrast, Late Neural Genes are genes whose Gene Function data indicates that they are involved in later steps, such as synaptogenesis and axonogenesis. This automated prioritization allows the user to quickly identify which of the genes associated with the computationally-identified clusters are likely to function in the pathway of interest. This is critical, because only a very small percentage of clusters can be tested for function in transgenic experiments.
After you click on the Early Neural Genes tab, just under that tab you will see the "Links to Genes", and then the prioritized list of genes that have binding site clusters in their promoters that match the search pattern derived from the model promoter. Notice that the second gene in that prioritized list is "neur". Clicking on the neur gene link takes you to the same neur gene page as the neur_gene_data_mashup shown above. (Click once on the enlarged graphical image to make it collapse down to reveal all of the data.) Each gene page provides a "data-mashup" of three types of data: 1) "Cluster Info" shows the the computationally identified binding site clusters near that gene, 2) The graphical display shows the same clusters juxtaposed with their associated gene, and 3) the Gene Function data, which is the full set of available genetic, molecular, and biochemical data for that gene, which has been programatically mined from Flybase, an online genomics database for fruit flies (which are a well-studied genetic model organism). This includes all available data in 5 ontology categories: Biological Process, Molecular Function, Genetic Interactions, Gene Expressed In, and Mutants Affect. Note that each of these controlled vocabulary terms is an active link to the online Gene Ontology database for Flybase, where those terms are defined.
I was at a GMOD meeting (Generic Model Organism Database) in San Diego the week when the earthquake hit, giving a talk called: Using Drupal and Flex to build custom user-configurable interfaces for transcriptional genomics research"
The next week we were using Drupal to build data mashup apps to aid Haiti like this: Haiti need-have data-mashup
Take Home: We have the technology and expertise to greatly speed up the rescue effort in Haiti, but we need full access to supply and delivery system data to be effective.