Skip to content
Snippets Groups Projects
README.md 4.78 KiB
Newer Older
Aurelien Brionne's avatar
Aurelien Brionne committed
# genome_aln V1.4
Aurelien Brionne's avatar
Aurelien Brionne committed

Aurelien Brionne's avatar
Aurelien Brionne committed

Aurelien Brionne's avatar
Aurelien Brionne committed
**genome_aln** workflow , which agree to FAIR principles , was built in Nexflow dsl2 language, with singularity container for used softwares, optimized in terms of computing resources (cpu, memory), and its use on a informatic farm with a slurm scheduler.
Aurelien Brionne's avatar
Aurelien Brionne committed

Aurelien Brionne's avatar
Aurelien Brionne committed
- Paired reads quality control and adapters trimming was carried out using fastqc [1] and Trim Galore [2] respectively.
- Reads were mapped to genome using BWA mem2 [3].
- followed by filtering steps (using SAMtools [4], BAMTools [5] and Pysam [6]) in order to remove reads :
    + with mapping to mitochondrial DNA
    + that are marked as duplicates with Picard [7]
    + that aren’t marked as primary alignments
    + that are unmapped
    + that map to multiple locations
    + containing > 4 mismatches
    + that are soft-clipped
    + that have an insert size > 2kb
    + that map to different chromosomes
    + that aren’t in FR orientation
    + where only one read of the pair fails the above criteria
- normalised bigWig files scaled to 1 million mapped reads with BEDTools [8] and bedGraphToBigWig [9].

Aurelien Brionne's avatar
Aurelien Brionne committed

[peak_calling](https://forgemia.inra.fr/lpgp/peak_calling) workflow is available for next step.


Aurelien Brionne's avatar
Aurelien Brionne committed
## Install flow and build singularity image

Clone genome_aln git and build local singularity image (with system admin rights) based on the provided singularity definition file.

```bash
git clone https://forgemia.inra.fr/lpgp/genome_aln.git
sudo singularity build ./genome_aln/singularity/genome_aln.sif ./genome_aln/singularity/genome_aln.def
```
Aurelien Brionne's avatar
Aurelien Brionne committed
## Usages examples
Aurelien Brionne's avatar
Aurelien Brionne committed
design.csv file must have *ID*, *R1* and *R2* header and write with comma separator.

Aurelien Brionne's avatar
Aurelien Brionne committed
|ID|R1|R2|
Aurelien Brionne's avatar
Aurelien Brionne committed
|:-|:-|:-|
|A|/path/to/targetA_R1.fa.gz|/path/to/targetA_R2.fa.gz|
|B|/path/to/targetB_R1.fa.gz|/path/to/targetB_R2.fa.gz|
|C|/path/to/targetC_R1.fa.gz|/path/to/targetC_R2.fa.gz|

### Usage example ATAC seq
Aurelien Brionne's avatar
Aurelien Brionne committed

```bash
#!/bin/bash
Aurelien Brionne's avatar
Aurelien Brionne committed
#SBATCH -J atacseq
Aurelien Brionne's avatar
Aurelien Brionne committed
#SBATCH -p unlimitq
module load containers/singularity/3.9.9
Aurelien Brionne's avatar
Aurelien Brionne committed
module load bioinfo/Nextflow/21.10.6
Aurelien Brionne's avatar
Aurelien Brionne committed
nextflow run /work/project/lpgp/Nextflow/genome_aln/ \
-profile slurm \
Aurelien Brionne's avatar
Aurelien Brionne committed
--input "${PWD}/design.csv" \
Aurelien Brionne's avatar
Aurelien Brionne committed
--genome "genome.fa.gz" \
--method "ATAC" \
Aurelien Brionne's avatar
Aurelien Brionne committed
--clip_r1 10 \
--clip_r2 10 \
--three_prime_clip_r1 3 \
--three_prime_clip_r2 3 \
Aurelien Brionne's avatar
Aurelien Brionne committed
--out_dir "${PWD}/results"
Aurelien Brionne's avatar
Aurelien Brionne committed
```

Aurelien Brionne's avatar
Aurelien Brionne committed
### Usage example CHIP seq
Aurelien Brionne's avatar
Aurelien Brionne committed

```bash
#!/bin/bash
Aurelien Brionne's avatar
Aurelien Brionne committed
#SBATCH -J chipseq
Aurelien Brionne's avatar
Aurelien Brionne committed
#SBATCH -p unlimitq
module load containers/singularity/3.9.9
Aurelien Brionne's avatar
Aurelien Brionne committed
module load bioinfo/Nextflow/21.10.6
Aurelien Brionne's avatar
Aurelien Brionne committed
nextflow run /work/project/lpgp/Nextflow/genome_aln/ \
-profile slurm \
Aurelien Brionne's avatar
Aurelien Brionne committed
--input "${PWD}/design.csv" \
Aurelien Brionne's avatar
Aurelien Brionne committed
--genome "genome.fa.gz" \
--method "CHIP" \
Aurelien Brionne's avatar
Aurelien Brionne committed
--clip_r1 10 \
--clip_r2 10 \
--three_prime_clip_r1 3 \
--three_prime_clip_r2 3 \
Aurelien Brionne's avatar
Aurelien Brionne committed
--out_dir "${PWD}/results"
```
Aurelien Brionne's avatar
Aurelien Brionne committed

Aurelien Brionne's avatar
Aurelien Brionne committed
## Defaults parameters

Please refer to [Trim Galore](https://github.com/FelixKrueger/TrimGalore), and [BWA mem2](https://github.com/bwa-mem2/bwa-mem2) for complete arguments explanation.
Aurelien Brionne's avatar
Aurelien Brionne committed

Aurelien Brionne's avatar
Aurelien Brionne committed
```bash
Aurelien Brionne's avatar
Aurelien Brionne committed
# sequences
input = false
genome = false

# bam input
bam = false

# method (ATAC or CHIP)
Aurelien Brionne's avatar
Aurelien Brionne committed
# filtering step add CIGAR S code exclusion for ATAC
Aurelien Brionne's avatar
Aurelien Brionne committed
method = false

# fastqc
skip_fastqc = false

# trimming
skip_trimming = false
clip_r1 = 0
clip_r2 = 0
three_prime_clip_r1 = 0
three_prime_clip_r2 = 0

# bwa_mem2 options
bwa_mem2_index = false
keep_bwa_mem2_index = false
bwa_mem2_min_score = false

# skip markduplicates
skip_markduplicates = false

# skip markduplicates
skip_filters = false

# skip bigwig
skip_bigwig = false

# save directory
out_dir = "${PWD}/results"
Aurelien Brionne's avatar
Aurelien Brionne committed
```
Aurelien Brionne's avatar
Aurelien Brionne committed

## References

1. FastQC - a quality control application for FastQ files [Internet]. Available from: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
2. Krueger F, Galore T. A wrapper tool around cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files [Internet]. Available from: http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
3. Vasimuddin Md HL Sanchit Misra. Efficient architecture-aware acceleration of BWA-MEM for multicore systems [Internet]. BIEEE Parallel and Distributed Processing Symposium (IPDPS). 2019. Available from: https://github.com/bwa-mem2/bwa-mem2
4. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10.
5. Krueger F, Galore T. A wrapper tool around cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files [Internet]. Available from: http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
6. Pysam [Internet]. Available from: https://github.com/pysam-developers/pysam
7. Picard toolkit. Broad Institute, GitHub repository. https://broadinstitute.github.io/picard/; Broad Institute; 2019.
8. Bedtools: A powerful toolset for genome arithmetic [Internet]. Available from: https://bedtools.readthedocs.io/en/latest/
9. bedGraphToBigWig [Internet]. Available from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v385/bedGraphToBigWig