Newer
Older
**genome_aln** workflow , which agree to FAIR principles , was built in Nexflow dsl2 language, with singularity container for used softwares, optimized in terms of computing resources (cpu, memory), and its use on a informatic farm with a slurm scheduler.
- Paired reads quality control and adapters trimming was carried out using fastqc [1] and Trim Galore [2] respectively.
- Reads were mapped to genome using BWA mem2 [3].
- followed by filtering steps (using SAMtools [4], BAMTools [5] and Pysam [6]) in order to remove reads :
+ with mapping to mitochondrial DNA
+ that are marked as duplicates with Picard [7]
+ that aren’t marked as primary alignments
+ that are unmapped
+ that map to multiple locations
+ containing > 4 mismatches
+ that are soft-clipped
+ that have an insert size > 2kb
+ that map to different chromosomes
+ that aren’t in FR orientation
+ where only one read of the pair fails the above criteria
- normalised bigWig files scaled to 1 million mapped reads with BEDTools [8] and bedGraphToBigWig [9].
[peak_calling](https://forgemia.inra.fr/lpgp/peak_calling) workflow is available for next step.
## Install flow and build singularity image
Clone genome_aln git and build local singularity image (with system admin rights) based on the provided singularity definition file.
```bash
git clone https://forgemia.inra.fr/lpgp/genome_aln.git
sudo singularity build ./genome_aln/singularity/genome_aln.sif ./genome_aln/singularity/genome_aln.def
```
design.csv file must have *ID*, *R1* and *R2* header and write with comma separator.
|:-|:-|:-|
|A|/path/to/targetA_R1.fa.gz|/path/to/targetA_R2.fa.gz|
|B|/path/to/targetB_R1.fa.gz|/path/to/targetB_R2.fa.gz|
|C|/path/to/targetC_R1.fa.gz|/path/to/targetC_R2.fa.gz|
### Usage example ATAC seq
nextflow run /work/project/lpgp/Nextflow/genome_aln/ \
-profile slurm \
--clip_r1 10 \
--clip_r2 10 \
--three_prime_clip_r1 3 \
--three_prime_clip_r2 3 \
nextflow run /work/project/lpgp/Nextflow/genome_aln/ \
-profile slurm \
--clip_r1 10 \
--clip_r2 10 \
--three_prime_clip_r1 3 \
--three_prime_clip_r2 3 \
## Defaults parameters
Please refer to [Trim Galore](https://github.com/FelixKrueger/TrimGalore), and [BWA mem2](https://github.com/bwa-mem2/bwa-mem2) for complete arguments explanation.
# sequences
input = false
genome = false
# bam input
bam = false
# method (ATAC or CHIP)
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
method = false
# fastqc
skip_fastqc = false
# trimming
skip_trimming = false
clip_r1 = 0
clip_r2 = 0
three_prime_clip_r1 = 0
three_prime_clip_r2 = 0
# bwa_mem2 options
bwa_mem2_index = false
keep_bwa_mem2_index = false
bwa_mem2_min_score = false
# skip markduplicates
skip_markduplicates = false
# skip markduplicates
skip_filters = false
# skip bigwig
skip_bigwig = false
# save directory
out_dir = "${PWD}/results"
## References
1. FastQC - a quality control application for FastQ files [Internet]. Available from: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
2. Krueger F, Galore T. A wrapper tool around cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files [Internet]. Available from: http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
3. Vasimuddin Md HL Sanchit Misra. Efficient architecture-aware acceleration of BWA-MEM for multicore systems [Internet]. BIEEE Parallel and Distributed Processing Symposium (IPDPS). 2019. Available from: https://github.com/bwa-mem2/bwa-mem2
4. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10.
5. Krueger F, Galore T. A wrapper tool around cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files [Internet]. Available from: http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
6. Pysam [Internet]. Available from: https://github.com/pysam-developers/pysam
7. Picard toolkit. Broad Institute, GitHub repository. https://broadinstitute.github.io/picard/; Broad Institute; 2019.
8. Bedtools: A powerful toolset for genome arithmetic [Internet]. Available from: https://bedtools.readthedocs.io/en/latest/
9. bedGraphToBigWig [Internet]. Available from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v385/bedGraphToBigWig