- Example data
- Input data
- Genomic view
- New features
TFmotifView is a web server that allows to study the distribution of known transcription factor (TF) motifs in genomic regions of interest.
In the recent years, the binding specificities of many TFs have been deciphered and summarised as positions weight matrices also called TF motifs. Despite the availability of thousands of known TF motifs in databases, it remains non-trivial to quickly query and visualise the enrichment of known TF motifs in genomic regions of interest. Towards this goal, we have developed TFmotifView for which we have processed all vertebrate’s TF motifs from the JASPAR database.
Based on genomic regions of interest and selected TF motifs (settings), TFmotifView performs an overlap of the genomic regions with TF motif occurrences to generate three different outputs: (1) an enrichment table and scatterplot calculating the significance of motif occurrences in genomic regions compared to control regions, (2) a genomic view of the organisation of TF motifs in each genomic region, and (3) a metaplot summarising the position of TF motifs relative to the center of the regions.
TFmotifView requires four inputs on the left panel to start the analysis:
- Genome assembly: The genome assembly should match the genomic coordinates of your regions of interest. It can be selected from the drop-down list and is available for the human (hg38, hg19) and mouse (mm10, mm9) genomes.
- Genomic regions: It corresponds to the genomic coordinates of your regions of interest, in which you want to search for TF motifs. It should be in a minimal BED format including for each region on separate lines: chromosome, region start and region end. Regions can be copy/pasted or uploaded from a BED file.
- TF motifs: Motifs of interest from the JASPAR 2020 database that you want to use to scan your genomic regions of interest. They can be selected from the drop-down list, based on the TF names. The list of available motifs can be filtered by entering the beginning of its name. Additionally, exact motif sequences (using A,C,G,Ts; not allowing for mismatches), can be generated. Once selected, motif names and logos appear on the left panel. It is then possible to change the colour used for visualisation, the p-value threshold used to scan the regions of interest or remove the motif from the selection. If no motifs are selected, the analysis will run using all motif representatives of each motif clusters.
- Control regions: Since TF motifs are present in high numbers all over the genome, it is important to contrast the presence of motifs in your regions of interest to their presence in control regions. Control regions can be either generated randomly (with matched CpG content) or provided by the user as genomic coordinates in a BED format. Good control regions should have similar genomic characteristics as the genomic regions of interest used as input.
The analysis can then be launched by clicking the “start analysis” button.
An example can be loaded by clicking the “load example data” button. Input genomic regions are NRF1 ChIP-seq peaks from mouse embryonic stem cells (Domcke/Bardet et al. 2015). Raw data is available on GEO (GSM1891641). The data was aligned to the mouse reference genome assembly mm10 using bowtie2 (Langmead et al. 2012) and peaks were called using peakzilla (Bardet/Steinmann et al. 2013). Input peak file is available here. Selected motifs are the one of the corresponding TF: NRF1, as well as other TFs expressed in mouse embryonic stem cells: MYCN, GABPA, CTCF, REST and SP1. Control regions are generated randomly. Results from example data can be directly loaded following http://bardet.u-strasbg.fr/tfmotifview/?results=example
Genomic regions provided as input as well as generated global control regions are available for download in BED format. An analysis of the G+C content of the regions is performed and summarized as histograms for input and control regions as well as for all bins in the corresponding genome for comparison. The plots can be downloaded as a PDF file. A shift in the distribution of G+C content in input vs. control region might biais the results and generated random control regions with matched G+C content might be a more suitable solution.
The enrichment table shows for each motif (lines), the number and percent of input and control regions that have at least one motif. It uses two separate sets of control regions. The global controls correspond to the provided or randomly generated regions (with matched CpG content). The local controls correspond to regions directly flanking the genomic regions of interest expected to have similar genomic features. The enrichment of motifs in genomic regions over global or local control regions is then calculated as a fold change and an associated hypergeometric p-value. Motif names link to their JASPAR page and motifs (lines) are ranked according to the global p-value but can be modified using the column arrows. The table can be downloaded as a tab delimited text file. The results can be visualised in a scatterplot representing the number of regions of interest that have at least one motif (in percent) versus their fold change over the global control regions (in log2) with the global p-value being used to colour the point. For each point, information about the corresponding motif and p-value can be obtained by pointing with the cursor.
The genomic view enables to visualise the position and organisation of the selected motifs within the genomic regions used as input. Only regions that have at least one motif are displayed. Genomic regions are represented as grey bars and specific motifs as colourful boxes. Information about individual motifs can be access on click such as, motif name, strand, exact sequence and p-value score. Gene transcription start sites of protein coding transcripts are indicated as directional arrows. The legend indicates the motif sequence logos and the colours used for display. If many genomic regions are use as input, the next and previous buttons can be used to navigate through the pages. The results can be downloaded in a PDF format (limited to maximum 10 pages).
Additionally, in order to integrate this motif information with other genomic datasets, a button is available to generate genomic tracks representing the position of the selected motifs along the whole genome as BED files that is directly uploaded onto the UCSC genome browser.
The metaplot enables to visualise the enrichment of the selected motifs relative to the center of the regions. This is useful for genomic regions that have a defined center such as ChIP-seq peak regions where the center/summit of the peaks is expected to correspond to the position of the TF binding to its motif. For each motif, the percent of regions having a motif is calculated at each position along the centered genomic regions. The left panel compares the enrichment for all selected motifs in the genomic regions of interest whereas the right panel shows the enrichment of each motif separately compared to the control regions (grey line). The legend indicates the motif sequence logos and the colours used for display. The results can be downloaded in a PDF format.
- How many instances of TFmotifView can I run in parallel?
Only one instance of TFmotifView can be run per user. If you left the page, you will be able to reconnect using the provided result URL only once the calculations have finished.
23/04/2020 TFmotifView paper published in NAR
24/03/2020 Update to the JASPAR 2020 motif database
19/07/2019 Stable version online
04/04/2019 Beta version online
Leporcq C, Spill Y, Balaramane D, Toussaint C, Weber M, Bardet AF. TFmotifView: a webserver for the visualization of transcription factor motifs in genomic regions. Nucleic Acid Research (2020) PMID: 32324215 DOI: 10.1093/nar/gkaa252
TFmotifView is free to use for everyone.
Any question or suggestion? Do not hesitate to contact us or check out our website!