gene_x 0 like s 24 view s
Tags: pipeline
There are several alternative R packages and tools to perform motif enrichment analysis for RNA-binding proteins (RBPs), beyond PWMEnrich::motifEnrichment(). Here are the most notable ones:
| Tool / Package | Enrichment | Custom Motifs | CLI or R? | RNA-specific? |
| ------------------------ | ----------------- | --------------- | --------- | -------------- |
| **PWMEnrich** | ✅ | ✅ | R | ✅ |
| **RBPmap** | ✅ | ❌ (uses own db) | Web/CLI | ✅ | ----> try RBPmap_results + enrichments!
| **Biostrings/TFBSTools** | ❌ (only scanning) | ✅ | R | ❌ | #ATtRACT + Biostrings / TFBSTools
| **rmap** | ✅ (CLIP-based) | ❌ | R | ✅ |
| **Homer** | ✅ | ✅ | CLI | ⚠ RNA optional |
| **MEME (AME, FIMO)** | ✅ | ✅ | Web/CLI | ⚠ Generic |
Get 3UTR.fasta, 5UTR.fasta, CDS.fasta and transcripts.fasta
mRNA Transcript
┌────────────┬────────────┬────────────┐
│ 5′ UTR │ CDS │ 3′ UTR │
└────────────┴────────────┴────────────┘
↑ ↑ ↑ ↑
Start Start Stop End
of Codon Codon of
Transcript Transcript
✅ Option 1: Use GENCODE and python scripts (CHOSEN!)
~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/MKL-1_wt.EV_vs_parental-up.txt #20086
~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/MKL-1_wt.EV_vs_parental-down.txt #634
~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/WaGa_wt.EV_vs_parental-up.txt #23832
~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/WaGa_wt.EV_vs_parental-down.txt #375
#Filtering the down-regulated genes to include only protein_coding genes before extracting 3' UTRs, because
#1. Only protein_coding genes have well-annotated 3' UTRs
#3' UTRs are defined as the region after the CDS (coding sequence) and before the poly-A tail.
#Non-coding RNAs (e.g., lncRNA, snoRNA, miRNA precursors) do not have CDS, and therefore don't have canonical 3' UTRs.
#2. In GENCODE, most UTR annotations are only provided for transcripts of gene_type = "protein_coding".
grep ",\"protein_coding\"," ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/MKL-1_wt.EV_vs_parental-up.txt > MKL-1_wt.EV_vs_parental-up_protein_coding.txt
grep ",\"protein_coding\"," ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/MKL-1_wt.EV_vs_parental-down.txt > MKL-1_wt.EV_vs_parental-down_protein_coding.txt
grep ",\"protein_coding\"," ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/WaGa_wt.EV_vs_parental-up.txt > WaGa_wt.EV_vs_parental-up_protein_coding.txt
grep ",\"protein_coding\"," ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/WaGa_wt.EV_vs_parental-down.txt > WaGa_wt.EV_vs_parental-down_protein_coding.txt
#Visit and Download: GENCODE FTP site https://www.gencodegenes.org/human/
* GTF annotation file (e.g., gencode.v48.annotation.gtf.gz)
* Corresponding genome FASTA (e.g., GRCh38.primary_assembly.genome.fa.gz)
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_48/gencode.v48.annotation.gtf.gz
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_48/GRCh38.primary_assembly.genome.fa.gz
gunzip gencode.v48.annotation.gtf.gz
gunzip GRCh38.primary_assembly.genome.fa.gz
python extract_transcript_parts.py MKL-1_wt.EV_vs_parental-down_protein_coding.txt ~/REFs/gencode.v48.annotation.gtf ~/REFs/GRCh38.primary_assembly.genome.fa MKL-1_down
python extract_transcript_parts.py MKL-1_wt.EV_vs_parental-up_protein_coding.txt ~/REFs/gencode.v48.annotation.gtf ~/REFs/GRCh38.primary_assembly.genome.fa MKL-1_up #5988
python extract_transcript_parts.py WaGa_wt.EV_vs_parental-down_protein_coding.txt ~/REFs/gencode.v48.annotation.gtf ~/REFs/GRCh38.primary_assembly.genome.fa WaGa_down #93
python extract_transcript_parts.py WaGa_wt.EV_vs_parental-up_protein_coding.txt ~/REFs/gencode.v48.annotation.gtf ~/REFs/GRCh38.primary_assembly.genome.fa WaGa_up #6538
✅ Option 2-5 see at the end!
Why 3' UTR?
🧬 miRNA, RBP, or translation/post-transcriptional regulation
➡️ Use 3' UTR sequences
Because:
Most miRNA binding and many RBP motifs are located in the 3' UTR.
It’s the primary region for mRNA stability, localization, and translation regulation.
🧠 Example: You're looking for binding enrichment of miRNAs or RNA-binding proteins (PUM, HuR, etc.)
✅ Input = 3UTR.fasta
🧪 If you're testing PBRs related to:
- Translation initiation, upstream ORFs, or 5' cap interaction:
➡️ Use 5' UTR
- Coding mutations, protein-level motifs, or translational efficiency:
➡️ Use CDS
- General transcriptome-wide motif search (no preference):
➡️ Use transcripts, or test all regions separately to localize signal
Recommended Workflow with RBPmap https://rbpmap.technion.ac.il (Too slow!)
RBPmap itself does not compute enrichment p-values or FDR; it's a motif scanning tool.
To get statistically meaningful RBP enrichments, combine RBPmap with custom permutation testing or Fisher’s exact test + multiple testing correction.
1. Prepare foreground (target) and background sequences
Extract 3′ UTRs of:
📉 Downregulated mRNAs (foreground) — likely targeted by upregulated miRNAs
⚪ A control set of 3′ UTRs — e.g., non-differentially expressed protein-coding genes
grep ",\"protein_coding\"," ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/MKL-1_wt.EV_vs_parental-all.txt > MKL-1_wt.EV_vs_parental-all_protein_coding.txt
grep ",\"protein_coding\"," ~/DATA/Data_Ute/Data_RNA-Seq_MKL-1+WaGa/results_2025_1/degenes/WaGa_wt.EV_vs_parental-all.txt > WaGa_wt.EV_vs_parental-all_protein_coding.txt
cut -d',' -f1 MKL-1_wt.EV_vs_parental-all_protein_coding.txt | sort > all_genes.txt #19239
cut -d',' -f1 MKL-1_wt.EV_vs_parental-up_protein_coding.txt | sort > up_genes.txt #5988
cut -d',' -f1 MKL-1_wt.EV_vs_parental-down_protein_coding.txt | sort > down_genes.txt #112
cat up_genes.txt down_genes.txt | sort | uniq > regulated_genes.txt
comm -23 all_genes.txt regulated_genes.txt > background_genes.txt
grep -Ff background_genes.txt MKL-1_wt.EV_vs_parental-all_protein_coding.txt > MKL-1_wt.EV_vs_parental-background_protein_coding.txt #13139
cut -d',' -f1 WaGa_wt.EV_vs_parental-all_protein_coding.txt | sort > all_genes.txt #19239
cut -d',' -f1 WaGa_wt.EV_vs_parental-up_protein_coding.txt | sort > up_genes.txt #6538
cut -d',' -f1 WaGa_wt.EV_vs_parental-down_protein_coding.txt | sort > down_genes.txt #93
cat up_genes.txt down_genes.txt | sort | uniq > regulated_genes.txt
comm -23 all_genes.txt regulated_genes.txt > background_genes.txt
grep -Ff background_genes.txt WaGa_wt.EV_vs_parental-all_protein_coding.txt > WaGa_wt.EV_vs_parental-background_protein_coding.txt #12608
python extract_transcript_parts.py MKL-1_wt.EV_vs_parental-background_protein_coding.txt ~/REFs/gencode.v48.annotation.gtf ~/REFs/GRCh38.primary_assembly.genome.fa MKL-1_background
python extract_transcript_parts.py WaGa_wt.EV_vs_parental-background_protein_coding.txt ~/REFs/gencode.v48.annotation.gtf ~/REFs/GRCh38.primary_assembly.genome.fa WaGa_background
foreground.fasta: 你的目标(前景)序列,例如下调基因的 3′UTRs。
background.fasta: 你的背景对照序列,例如未显著差异表达的基因的 3′UTRs。
2. Run RBPmap separately on both sets (in total of 6 calculations)
* Submit both sets of UTRs to RBPmap.
* Use the same settings (e.g., “human genome”, “high stringency”, "Apply conservation filter" etc.)
* Choose all RBPs
* Download motif match outputs for both sets
3. Count motif hits per RBP in each set
You now have:
For each RBP:
a: number of target 3′ UTRs with a motif match
b: number of background UTRs with a motif match
c: total number of target UTRs
d: total number of background UTRs
4. Perform Fisher’s Exact Test per RBP
For each RBP, construct a 2x2 table:
Motif Present Motif Absent
Foreground (targets) a c - a
Background b d - b
5. Adjust p-values for multiple testing
Use Benjamini-Hochberg (FDR) correction (e.g., in Python or R) across all RBPs tested.
6.✅ Summary
Step Tool
Prepare Database of RNA-binding motifs ATtRACT
3′ UTR extraction extract_transcript_parts.py
Motif scan RBPmap or FIMO
Count motif hits Your own parser (Python or R)
Fisher’s exact test scipy.stats or fisher.test()
FDR correction multipletests() or p.adjust()
python rbp_enrichment.py rbpmap_downregulated.tsv rbpmap_background.tsv rbp_enrichment_results.csv
Quick Drop-In Plan (RBPmap Alternative with FIMO for motif scan)
1. [ATtRACT + FIMO (MEME suite)]
ATtRACT: Database of RNA-binding motifs.
FIMO: Fast and scriptable motif scanning tool.
#Download RBP motifs (PWM) from ATtRACT DB; Convert to MEME format (if needed); Use FIMO to scan UTR sequences
grep "Homo_sapiens" ATtRACT_db.txt > attract_human.txt
#cut -f12 attract_human.txt | sort | uniq > valid_ids.txt
python convert_attract_pwm_to_meme.py
fimo --thresh 1e-4 --oc fimo_foreground_MKL-1_down attract_human.meme ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/MKL-1_down.3UTR.fasta
fimo --thresh 1e-4 --oc fimo_foreground_MKL-1_up attract_human.meme ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/MKL-1_up.3UTR.fasta
fimo --thresh 1e-4 --oc fimo_background_MKL-1_background attract_human.meme ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/MKL-1_background.3UTR.fasta
fimo --thresh 1e-4 --oc fimo_foreground_WaGa_down attract_human.meme ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/WaGa_down.3UTR.fasta
fimo --thresh 1e-4 --oc fimo_foreground_WaGa_up attract_human.meme ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/WaGa_up.3UTR.fasta
fimo --thresh 1e-4 --oc fimo_background_WaGa_background attract_human.meme ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/WaGa_background.3UTR.fasta
#end
#TODO_TOMORROW: mv PBS_analysis RBP_analysis
#Test
python run_enrichment.py \
--attract ATtRACT_db.txt \
--fimo_fg fimo_foreground_WaGa_down/fimo.tsv \
--fimo_bg fimo_foreground2/fimo.tsv \
--output rbp_enrichment_test.csv
python run_enrichment.py \
--attract ATtRACT_db.txt \
--fimo_fg fimo_foreground_MKL-1_up/fimo.tsv \
--fimo_bg fimo_background_MKL-1_background/fimo.tsv \
--output rbp_enrichment_MKL-1_up.csv
python run_enrichment.py \
--attract ATtRACT_db.txt \
--fimo_fg fimo_foreground_MKL-1_down/fimo.tsv \
--fimo_bg fimo_background_MKL-1_background/fimo.tsv \
--output rbp_enrichment_MKL-1_down.csv
python run_enrichment.py \
--attract ATtRACT_db.txt \
--fimo_fg fimo_foreground_WaGa_up/fimo.tsv \
--fimo_bg fimo_background_WaGa_background/fimo.tsv \
--output rbp_enrichment_WaGa_up.csv
python run_enrichment.py \
--attract ATtRACT_db.txt \
--fimo_fg fimo_foreground_WaGa_down/fimo.tsv \
--fimo_bg fimo_background_WaGa_background/fimo.tsv \
--output rbp_enrichment_WaGa_down.csv
#工具 功能 关注点 应用场景
FIMO 精确查找 motif 出现位置 motif 在什么位置出现 找出具体结合位点
AME 统计 motif 富集情况 哪些 motif 在某组序列中更富集 比较 motif 是否显著出现更多
如你还在做差异表达后的RBP富集分析,可以考虑先用 FIMO 扫描,再用你自己写的代码 + Fisher’s exact test 做类似 AME 的工作,或直接用 AME 做分析
# Generate the attract_human.meme inkl. Gene_name!
#python generate_named_meme.py pwm.txt attract_human.txt
python generate_attract_human_meme.py pwm.txt ATtRACT_db.txt
#ERROR during running ame --> DEBUG!
#--control ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/WaGa_all.3UTR.fasta \
ame --control --shuffle-- \
--oc ame_out \
--scoring avg \
--method fisher --verbose 5 ../Data_RNA-Seq_MKL-1+WaGa/motif_analysis/WaGa_down.3UTR.fasta attract_human.meme
2. GraphProt2 (ALTERNATIVE_TODO)
ML-based tool using sequence + structure
Pre-trained models for many RBPs
✅ Advantages:
Local, GPU/CPU supported
More biologically realistic (includes structure)
miRNAs motif analysis using ATtRACT + FIMO
✅ Goal
* Extract their sequences
* Generate a background set
* Run RBP enrichment (e.g., with RBPmap or FIMO)
* Get p-adjusted enrichment stats (e.g., Fisher + BH)
5.1 (Optional)
Input_1. DE results (differential expression file from smallRNA-seq)
Example file: smallRNA_upregulated.txt
Format: 1st column = miRNA ID (e.g., hsa-miR-21-5p), optionally with other stats.
Input_2. Reference FASTA (Reference sequences from miRBase or GENCODE)
From miRBase:
mature.fa.gz → contains mature miRNA sequences
hairpin.fa.gz → for pre-miRNAs
python extract_miRNA_fasta.py smallRNA_upregulated.txt mature.fa up_mature_miRNAs.fa
python extract_miRNA_fasta.py smallRNA_downregulated.txt hairpin.fa down_precursor_miRNAs.fa
5.2 (Advanced)
Extract Sequences + Background Set
Inputs:
* up_miRNA.txt and down_miRNA.txt: DE results (first column = miRNA name, e.g., hsa-miR-21-5p)
* mature.fa or hairpin.fa from miRBase
Outputs:
* mirna_up.fa
* mirna_down.fa
* mirna_background.fa
python prepare_miRNA_sets.py up_miRNA.txt down_miRNA.txt mature.fa mirna
🔬 What You Can Do Next
Goal Tool Input
* RBP motif enrichment in pre-miRNAs RBPmap, FIMO, AME up_precursor_miRNAs.fa
* Motif comparison (up vs down miRNAs) DREME, MEME, HOMER Up/down mature miRNAs
* Build background for enrichment Random subset of other miRNAs Filtered from hairpin.fa
✅ RBP Enrichment from RBPmap Results
🔹 Use RBPmap output (typically CSV or TSV)
🔹 Compare hit counts in input vs background
🔹 Perform Fisher's exact test + Benjamini-Hochberg correction
🔹 Plot significantly enriched RBPs
📁 Requirements
You’ll need:
File Description
rbpmap_up.tsv RBPmap result file for upregulated set
rbpmap_background.tsv RBPmap result file for background set
📝 These should have columns like:
Motif Name or Protein
Sequence Name or Sequence ID
(If different, I’ll show you how to adjust.
python analyze_rbpmap_enrichment.py rbpmap_up.tsv rbpmap_background.tsv enriched_up.csv enriched_up_plot.png
✅ Output
enriched_up.csv
RBP FG_hits BG_hits pval padj enriched
ELAVL1 24 2 0.0001 0.003 ✅
HNRNPA1 15 10 0.04 0.06 ❌
enriched_up_plot.png
Barplot showing top significant RBPs (lowest FDR)
🧰 Customization Options
Would you like:
* Support for multiple RBPmap files at once?
* To match by RBP family?
* A full report (PDF/HTML) of top hits?
* Let me know, and I’ll tailor the next script!
RBP enrichments via FIMO (The same to the workflow in 4)
1. Collect the 3′ UTR sequences: Use the 3UTR.fasta file generated earlier, filtered to protein-coding and downregulated genes.
2. Prepare Motif Database (MEME format)
* ATtRACT: https://attract.cnic.es
* RBPDB: http://rbpdb.ccbr.utoronto.ca
* Ray2013 (CISBP-RNA motifs) — available via MEME Suite
* [RBPmap motifs (if downloadable)]
#Example format: rbp_motifs.meme
2. Run FIMO to Scan for RBP Motifs (Similar to RBPmap)
fimo --oc fimo_up rbp_motifs.meme mirna_up.fa
fimo --oc fimo_down rbp_motifs.meme mirna_down.fa
fimo --oc fimo_background rbp_motifs.meme mirna_background.fa
#This produces fimo.tsv in each output folder.
3. Run RBP motif enrichment using MEME Suite using AME (Analysis of Motif Enrichment):
ame \
--control control_3UTRs.fasta \
--oc ame_out \
--scoring avg \
--method fisher \
3UTR.fasta \
rbp_motifs.meme
Where:
* 3UTR.fasta = your downregulated genes’ 3′ UTRs
* control_3UTRs.fasta = background UTRs (e.g., random protein-coding genes not downregulated)
* rbp_motifs.meme = motif file from RBPDB or Ray2013
4. Interpret Results: Output includes RBP motifs enriched in your downregulated mRNAs' 3′ UTRs.
You can then link enriched RBPs to known interactions with your upregulated miRNAs, or explore their regulatory roles.
5. ✅ Bonus: Predict Which mRNAs Are Targets of Your miRNAs
Use tools like: miRanda, TargetScan, miRDB
Then intersect predicted targets with your downregulated genes to identify likely functional interactions.
6. Summary
Goal Input Tool / Approach
RBP enrichment 3UTR.fasta of downregulated genes AME with RBP motifs
Background/control 3′ UTRs from non-differential or upregulated genes
Link miRNA to targets Use TargetScan / miRanda Intersect with down genes
7. Would you like:
* Ready-to-use RBP motif .meme file?
* Script to generate background sequences?
* Visualization options for the enrichment results?
Other options to get sequences of 3UTR, 5UTR, CDS and mRNA transcripts
✅ Option 2: Use Ensembl BioMart (web-based, no coding) --> Lasting too long!
Go to Ensembl BioMart https://www.ensembl.org/biomart/martview/7b826bcbd0cec79021977f8dc12a8f61
Select:
Database: Ensembl Genes
Dataset: Homo sapiens genes (GRCh38 or latest)
Click on “Filters” → expand Region or Gene to limit your selection (optional).
Click on “Attributes”:
Under Sequences, check:
Sequences
3' UTR sequences
Optionally add gene IDs, transcript IDs, etc.
Click “Results” to view/download the FASTA of 3' UTRs.
✅ Option 3: Use GENCODE (precompiled annotations) and gffread
Use a tool like gffread (from the Cufflinks or gffread package) to extract 3' UTRs:
#gffread gencode.v44.annotation.gtf -g GRCh38.primary_assembly.genome.fa -w all_utrs.fa -U
#gffread -w three_prime_utrs.fa -g GRCh38.fa -x cds.fa -y proteins.fa -U -F gencode.gtf
grep -P "\tthree_prime_utr\t" gencode.v48.annotation.gtf > three_prime_utrs.gtf
gtf2bed < three_prime_utrs.gtf > three_prime_utrs.bed
bedtools getfasta -fi GRCh38.primary_assembly.genome.fa -bed three_prime_utrs.bed -name -s > three_prime_utrs.fa
gffread gencode.v48.annotation.gtf -g GRCh38.primary_assembly.genome.fa -U -w all_with_utrs.fa
Add -U flag to extract UTRs, and filter post hoc for only 3' UTRs if needed.
✅ Option 4: Use Bioconductor in R (UCSC-ID, not suitable!)
# Install if not already installed
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("GenomicFeatures")
BiocManager::install("txdbmaker")
#sudo apt-get update
#sudo apt-get install libmariadb-dev
#(optional)sudo apt-get install libmysqlclient-dev
install.packages("RMariaDB")
# Load library
library(GenomicFeatures)
# Create TxDb object for human genome
txdb <- txdbmaker::makeTxDbFromUCSC(genome="hg38", tablename="refGene")
# Extract 3' UTRs by transcript
utr3 <- threeUTRsByTranscript(txdb, use.names=TRUE)
# View or export as needed
✅ Option 5: Extract 3′ UTRs Using UCSC Table Browser (GUI method)
🔗 Website:
UCSC Table Browser
🔹 Step-by-Step Instructions
1. Set the basic parameters:
Clade: Mammal
Genome: Human
Assembly: GRCh38/hg38
Group: Genes and Gene Predictions
Track: GENCODE v44 (or latest)
Table: knownGene or wgEncodeGencodeBasicV44
Choose knownGene for RefSeq-like models or wgEncodeGencodeBasicV44 for GENCODE
2. Region:
Select: genome (default)
3. Output format:
Select: sequence
4. Click "get output"
🔹 Sequence Retrieval Options:
On the next page (after clicking "get output"), you’ll see sequence options.
Configure as follows:
✅ Output format: FASTA
✅ Which part of the gene: Select only
→ UTRs → 3' UTR only
✅ Header options: choose if you want gene name,
⚡️ Bonus: Combine with miRNA-mRNA predictions
Once you have RBPs enriched in downregulated mRNAs, you can intersect:
* Which RBPs overlap miRNA binding regions (e.g., via CLIPdb or POSTAR)
* Check if miRNAs and RBPs compete or co-bind
This can lead to identifying miRNA-RBP regulatory modules.
点赞本文的读者
还没有人对此文章表态
没有评论
Analysis of the RNA binding protein motifs for RNA Seq
Processing Data_Tam_RNAseq_2025_LB_vs_Mac_ATCC19606
Processing for Data_Tam_DNAseq_2025_ATCC19606
Comprehensive smallRNA-7 profiling using exceRpt pipeline with full reference databases (v3)
© 2023 XGenes.com Impressum