Newer
Older
## Structural Variations detection using SNP genotyping data in R
### What this package does
The *SVSNiPeR* package provides helper functions to use in SNP genotyping array
data analysis. The goal is to detect genomic regions subject to structural variations.
### How to install this package
`devtools::install_git("https://forgemia.inra.fr/jonathan.kitt/svniper")`
### How to use this package
#### Single genotyping array
If your projects consists of a single genotyping array, use the following
analysis pipeline.
**1) Read list of SNPs and list of samples**
The analysis requires a list of physical positions for the SNPs, as shown below:
| probeset_id | chromosome | position |
| ----------- | ---------- | -------- |
| AX-12345678 | chr1A | 123456 |
We also recommend you use a list of genotyped samples, as show below:
| unique_id | file_name | sample_name | definition |
| --------- | ------------- | ----------- | ---------- |
| id01 | sample01.CEL | sample01 | reference |
| id02 | sample02.CEL | sample02 | sample |
A genotyped sample is defined either as a *sample*, or as a *reference*, which
will be used to normalise calculatations in further steps.
**1) Read Axiom output files**
Three files are obtained using the Axiom genotyping pipeline:
- AxiomGT1.calls.txt
- AxiomGT1.confidences.txt
- AxiomGT1.summary.txt
To read these files, use the following commands:
`axiom_confidences <- svsniper::read_confidences(path_to_axiom_confidences_file)`
`axiom_summary <- svsniper::read_summary(path_to_axiom_summary_file)`
**Optional step: filter SNPs**
You may want to remove SNPs with bad confidence scores, and/or, depending on
the type of analysis you want to run, SNPs with high minor allele frequencies.
In order to filter out SNPs, three functions are available:
a) `svsniper::count_confidences(axiom_confidences, threshold = 0.15`
This function will count the number of samples with a confidence score above the
defined threshold (defaults to 0.15, the value used in the Affymetrix Axiom
tools), and returns a table as shown below:
| probeset_id | threshold_pass | threshold_fail |
| ----------- | -------------- | -------------- |
| AX-12345678 | 96 | 0 |
This function will return a table as shown below:
| probeset_id | count_aa | count_ab | count_bb | count_na | count_otv |
| ----------- | -------- | -------- | -------- | -------- | --------- |
| AX-12345678 | 41 | 2 | 53 | 0 | 0 |
c) `svsniper::calculate_maf(allele_count)`
This function takes as argument a table obtained using the `svsniper::count_alleles()`
function, and returns a table as show below:
| probeset_id | count_aa | count_ab | count_bb | count_na | count_otv | maf |
| ----------- | -------- | -------- | -------- | -------- | --------- | ----- |
| AX-12345678 | 41 | 2 | 53 | 0 | 0 | 0.436 |
The `count_alleles` and `calculate_maf` functions can be called in a pipe :
`svsniper::count_alleles(axiom_calls) %>% svsniper::calculate_maf()`
We recommend saving a list of filtered SNPs for use in downstream analysis.
**2) Extract a and b signal values**
In order to calculate the signal intensity, we must first extract a and b
signal values for each SNP and each genotyped sample. We can then remove the SNPs
we filtered out in the previous step
`signal_a <- svsniper::extract_a(axiom_summary)`
`signal_b <- svsniper::extract_b(axiom_summary)`