Genome build: hg19

Data download link: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegTfbsClustered/ 
Documentation: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegTfbsClusteredV3 
Motif data: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/factorbookMotifPos.txt.gz  
pfm: https://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/factorbookMotifPwm.txt.gz 
_________________________________________________________________________________________________________

Motifs processing - motif_processing.sh 

*  Motifs with –log10(qvalue) score less than 0.69 are filtered out (–log10(0.2) = 0.69) 
*  Motif lacking pfm files are filtered out
*  qvalue is calculated using the score column 10^(-score) 
*  Binding sequence obtained using samtools 

Pfm processing - pwm_processing.sh 

*  Generated individual pfm files from above link for each motif in the motif bed file 
*  Added > to motifID  

Peaks processing - tf_peak_processing.sh 

*  WgEncodeRegTfbsClusteredInputsV3.tab.gz: includes such information as the experiment's underlying Uniform TFBS table name, factor targeted, antibody used, cell type, treatment (if any), and laboratory source. 
*  WgEncodeRegTfbsClusteredV3.bed.gz: This format consists of standard BED5 fields, followed by an experiment count field (expCount) and finally two fields containing comma-separated lists. 
   *  The first list field (expNums) contains numeric identifiers for experiments, keyed to the wgEncodeRegTfbsClusteredInputsV3 table 
   *  The second list field (expScores) contains the scores for the corresponding experiments. 
*  Normalized the expNums and expScores in WgEncodeRegTfbsClusteredV3.bed
*  Numbered the experiments in WgEncodeRegTfbsClusteredInputsV3.tab.gz 
*  Mapped the expNums between WgEncodeRegTfbsClusteredInputsV3 and WgEncodeRegTfbsClusteredV3 files 
*  Split by cell types 
*  Split by Transcription Factors 
*  Added Canonical TF terms from factorbookMotifCanonical that connects different terms used for the same factor 

Motif overlap with peak data - overlap.py 

*  Did bedtools intersect between motifs and peaks 
*  bedtools intersect –a motif_data –b peak_data –wo 
*  Added header 
*  Rearranged columns: concatenated peak file columns as 1 column