Skip to content
Snippets Groups Projects
README.md 3.64 KiB
Newer Older
Jonathan Kitt's avatar
Jonathan Kitt committed
# SVSNiPeR
Jonathan Kitt's avatar
Jonathan Kitt committed

Jonathan Kitt's avatar
Jonathan Kitt committed
## Structural Variations detection using SNP genotyping data in R
*Last updated 2021-11-02*

### What this package does

The *SVSNiPeR* package provides helper functions to use in SNP genotyping array
data analysis. The goal is to detect genomic regions subject to structural variations.

### How to install this package
Jonathan Kitt's avatar
Jonathan Kitt committed
To download SVSNiPeR, use the following command :  

`devtools::install_git("https://forgemia.inra.fr/jonathan.kitt/svniper")`
Jonathan Kitt's avatar
Jonathan Kitt committed
#### Single genotyping array

If your projects consists of a single genotyping array, use the following 
analysis pipeline.
Jonathan Kitt's avatar
Jonathan Kitt committed

Jonathan Kitt's avatar
Jonathan Kitt committed
`library(svsniper)`

Jonathan Kitt's avatar
Jonathan Kitt committed
**1) Read list of SNPs and list of samples**

The analysis requires a list of physical positions for the SNPs, as shown below:

| probeset_id | chromosome | position |
| ----------- | ---------- | -------- |
| AX-12345678 | chr1A      | 123456   |

Jonathan Kitt's avatar
Jonathan Kitt committed
We also recommend you use a list of genotyped samples, as show below:

Jonathan Kitt's avatar
Jonathan Kitt committed
| unique_id | file_name     | sample_name | definition |
| --------- | ------------- | ----------- | ---------- |
| id01      | sample01.CEL  | sample01    | reference  |
| id02      | sample02.CEL  | sample02    | sample     |

A genotyped sample is defined either as a *sample*, or as a *reference*, which
will be used to normalise calculatations in further steps.
Jonathan Kitt's avatar
Jonathan Kitt committed

Jonathan Kitt's avatar
Jonathan Kitt committed
**1) Read Axiom output files**

Three files are obtained using the Axiom genotyping pipeline:  
- AxiomGT1.calls.txt  
- AxiomGT1.confidences.txt  
- AxiomGT1.summary.txt

To read these files, use the following commands:

Jonathan Kitt's avatar
Jonathan Kitt committed
`axiom_calls <- readr::read_tsv(path_to_axiom_calls_file)`  
Jonathan Kitt's avatar
Jonathan Kitt committed
`axiom_confidences <- svsniper::read_confidences(path_to_axiom_confidences_file)`  
Jonathan Kitt's avatar
Jonathan Kitt committed
`axiom_summary <- svsniper::read_summary(path_to_axiom_summary_file)`  

Jonathan Kitt's avatar
Jonathan Kitt committed

**Optional step: filter SNPs**

You may want to remove SNPs with bad confidence scores, and/or, depending on
the type of analysis you want to run, SNPs with high minor allele frequencies.  
In order to filter out SNPs, three functions are available:

Jonathan Kitt's avatar
Jonathan Kitt committed
a) `svsniper::count_confidences(axiom_confidences, threshold = 0.15`
Jonathan Kitt's avatar
Jonathan Kitt committed
This function will count the number of samples with a confidence score above the
defined threshold (defaults to 0.15, the value used in the Affymetrix Axiom 
tools), and returns a table as shown below:  

Jonathan Kitt's avatar
Jonathan Kitt committed
| probeset_id | threshold_pass | threshold_fail |
| ----------- | -------------- | -------------- |
| AX-12345678 | 96             | 0              |
Jonathan Kitt's avatar
Jonathan Kitt committed

Jonathan Kitt's avatar
Jonathan Kitt committed
b) `svsniper::count_alleles(axiom_calls)`  
Jonathan Kitt's avatar
Jonathan Kitt committed

This function will return a table as shown below:

| probeset_id | count_aa | count_ab | count_bb | count_na | count_otv |
| ----------- | -------- | -------- | -------- | -------- | --------- |
| AX-12345678 | 41       | 2        | 53       | 0        | 0         |
Jonathan Kitt's avatar
Jonathan Kitt committed

c) `svsniper::calculate_maf(allele_count)`

This function takes as argument a table obtained using the `svsniper::count_alleles()`
Jonathan Kitt's avatar
Jonathan Kitt committed
function, and returns a table as show below:

| probeset_id | count_aa | count_ab | count_bb | count_na | count_otv | maf   |
| ----------- | -------- | -------- | -------- | -------- | --------- | ----- |
| AX-12345678 | 41       | 2        | 53       | 0        | 0         | 0.436 |

The `count_alleles` and `calculate_maf` functions can be called in a pipe :

`svsniper::count_alleles(axiom_calls) %>% svsniper::calculate_maf()`

We recommend saving a list of filtered SNPs for use in downstream analysis.

**2) Extract a and b signal values**

In order to calculate the signal intensity, we must first extract a and b 
signal values for each SNP and each genotyped sample. We can then remove the SNPs
we filtered out in the previous step

`signal_a <- svsniper::extract_a(axiom_summary)`
`signal_b <- svsniper::extract_b(axiom_summary)`