# Week3 Seq DB and BLAST Algorithnm
<p align="justify"> Statement before class: All the contents here marked are belong to the course [<font color=red> ` Bioinformatics: Introduction and Methods`</font>](https://www.coursera.org/learn/bioinformatics-pku/home/welcome). If you like this high-quality course contents, please turn to the official course for more details. Here, i recored the knowledge learnt for convenience. If any copyright problem exist, please tell me, i’ll delete all immediately, Thanks!</p>
<br>
## Sequence Database Search
<br>
### Seq DB
Query seq (unknown) --> seq in DB (known)

<font color=red>`c*m*n opertations needed in total(one pair-wise align)`</font>
<br>
### BLAST
BLAST : Basic Local Alignment Search Tool
BLAST - find highest scoring <font color=red>`locally optimal alignments`</font>
- fast, - search large DB, - sensitive and selective, - robust (defualt work well)

<br>
## BLAST Algorithnm: A Primer
<br>
**General idea**
1. Find matches (<u>seed</u>) between <u>query and subject</u>
2. Extend seed into <font color=red>`High Scoring Segmant Paors (HSPs)`</font> — run Smith-Waterman on the specified region only
3. Acess the reliability of the alignment
<br>
**Details:**
1. Seq split into continuous “seed words”
2. DB is pre-indexed for quick locate all seeds
3. Hit clusters -> extend seed -> best legth seed matched max score
4. Low complexity seq -> false pisitive hits
<center>
$$
K = \frac{1}{L} log_N \left(\frac{L!}{\Pi_i n_i!}\right)
$$
</center>
<br>
eg: CACACACACACACACA, with window length 6
<center>
$$
K = \frac{1}{6} log_4 \left(\frac{6!}{n_A!^*n_C!^*n_G!^*n_T!}\right)
$$
</center>
<center>
$$
=\frac{1}{6} log_4 \left(\frac{6!}{3!^*3^*0!^*0!}\right)
$$
</center>
<center>
$$
=\frac{1}{6} log_4 \left(\frac{6!}{3^*3!}\right)
$$
</center>
<center>
$$
=\frac{1}{6} log_420
$$
</center>
<center>
$$
=0.36
$$
</center>
<br>
5. improve sensitivity -> seed word -> neighbourhood words
6. Qulity Assessment -> statitical significance

m is the length of query seq, n is the size of DB, e is natural logarithm, S is score of the alignment.
<center>
$$
p = 1 - e ^ {-E}
$$
</center>
<br>
## Report

Bast work pipeline

- <font color=red>`Filtering:`</font> Filt low complexity and reaptes auery, -F set filter
- <font color=red>`Seeding:`</font> set seed length <font color=red>`w`</font> (AA is 3, nucleotide is 11 <u>default</u>), if <font color=red>`n`</font> is query length, the number of seed words is <font color=red>`n-w+1`</font>, -W parameter set seed length
- <font color=red>`Search word hits`</font> , default scoring matrix is BLOSIM62, match =5, mis = -4/+2/-3, set -T remain the scores greater than T, <font color=red>`mhile not allow gap`</font>
- <font color=red>`Scanning`</font> HashTable: direct addressing method (<font color=red>`key and values`</font> ); Deteministic finite automaton/finite state machine: much faster
- <font color=red>`Extending -> HSP`</font> set -S paramater, if extended seed score still greater than S, then remain. the S is HSP
- <font color=red>`Significance evaluation`</font> :Raw scores (little meaning), Bit scroes :
<center>
$$
S^` = \frac{λS-lnK}{ln2}
$$
</center>
E values:
<center>
$$
E = mn2^{-S^`}
$$
</center>

BLAST tools:

[Applied Bioinformatics Course web site](http://abc.cbi.pku.cn/)
<br>
**<center>Correlated**
<br>
[Bioinformatics/ Introduction and Methos (Week 1 Bioinformatics Introduction)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethosweek1md)
[Bioinformatics/ Introduction and Methos (Week 2 Sequence Alignment)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethosweek2md)
[Bioinformatics/ Introduction and Methos (Week 3 Seq DB and BLAST Algorithnm)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek3md)
[Bioinformatics/ Introduction and Methos (Week 4 Markov Model)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek4md)
[Bioinformatics/ Introduction and Methos (Week 5 From Sequencing to NGS)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek5md)
[Bioinformatics/ Introduction and Methos (Week6 Variant Database)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek6md)
[Bioinformatics/ Introduction and Methos (Week7 Transcriptome Analysis, and RNA-Seq)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek7md)
[Bioinformatics/ Introduction and Methos (Week8 Prediction and Analysis of Noncoding RNA)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek8md)
[Bioinformatics/ Introduction and Methos (Week9 Ontology, Gene Ontology and KEGG Pathway Database)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek9md)
[Bioinformatics/ Introduction and Methos (Week 10 Bioinformatics Database and Resources)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek10md)
[Bioinformatics/ Introduction and Methos (Week 11 New Gene Evolution Detected by Genomic Computation: Basic Concepts and Examples)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek11md)
[Bioinformatics/ Introduction and Methos (Week 12 From Dry to Wet, an Evolutionary Story. Evolution function analysis of DNA methyltransferase)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek12md)</center>

Bioinformatics: Introduction and Methods (Week 3)