# Week 2 Sequence Alignment
<p align="justify"> Statement before class: All the contents here marked are belong to the course [<font color=red> `Bioinformatics: Introduction and Methods `</font>](https://www.coursera.org/learn/bioinformatics-pku/home/welcome). If you like this high-quality course contents, please turn to the official course for more details. Here, i recored the knowledge learnt for convenience. If any copyright problem exist, please tell me, i’ll delete all immediately, Thanks!</p>
<br>
## Sequence Alignment, Part I
<br>
### Essential Concepts
<table><tr><td bgcolor=#ebffeb>620 databases and 1459 web server tools to aid computational research in the life sciences.</td></tr></table>
- Biology
— What is the biological question or problem?
- Data
— What is the input data?
— What other <u>supprotive data</u> can be used?
- Model
— How is the problem formulated computationally?/What’s the data model?
- Algorithm
— What is the computational algorithm? How ablout its performance/limiation?
<br>
Two seqs similarity how detemine?
1. Similar seq -> structure -> function
line up all residues -> max level of similarity -> function/evolutionary relationship

2. gap = insertion or deletion (indel)
gap penalty = d + (n-1)*e (d = <font color=red>`opening a gap`</font> ; e = <font color=red>`extending a gap`</font> )
3. Final Score = sum(substitution scores) + (-1)*(sum(gap penalty))
### Global Alignment by Dynamic Programming
#### Pairwaise Seq Alignment: in Maths
“align” -> “比对过程”,“alignment” -> “比对结果”
seq length = n , then possible number kinds of alignment score will be :
<br>
<center>
$$
(\frac{2n}{2}) = \frac{(2n)!}{(n!)^2}
$$
</center>
<br>
#### Steps
1. Problem -> sub-problems
2. Solve sub-problems optimally recursively
3. <u>Optimal solutions</u> to constrcut an optimal solution for <u>original problem</u>
<br>
<center>
$$
F(0,0) = 0
$$
$$
F(i,y) = max\begin{cases}
F(i-1,j+1) + s(x_i,y_j) \\\\
F(i-1,j)+d \\\\
F(i,j-1)+d
\end{cases}
$$
</center>
<br>
Xi refer to ith base in seqA, Yj refer to yth base in seqB


<font color=red>`F(1,1), is only related to its immediate left, upper, and upper-left cells (F(1,0), F(0,1), and F(0,0), respectively`</font>
<font color=red>`For F(2,1)=-3,First,if from F(1,1) = AtoA + AtoG = 2 + (-5)= -3;Second, if from F(1,0)=A to gap +AtoA = (-5)+2 = -3; Thirdly, if from F(2,0) = two gap +AtoG =(-10)+(-5)=-15 `</font>
``` shell
F(2,1) = -3
1. F(1,1) => A A = 2 + (-5) = -3
A G
2. F(1,0) => A A = (-5) + 2 = -3
_ A
3. F(2,0) => A _ A = (-5) + (-5) + (-5)
_ A G
############### !important! ###############
Global alignmnet (Needleman-Wunsch) :
Horizontal and vertical add gap penalty;
diagonal add match point.
```

Which is called [**<u>Needleman-Wunsch algorithm</u>** ](https://www.wikiwand.com/zh/尼德曼-翁施算法?wprov=srpw1_0)
<br>
## Sequence Alignment, Part II
<br>
### From Global to Local
Which is called [**<u>Smith-Waterman algorithm</u>**](https://www.wikiwand.com/zh/史密斯-沃特曼算法)


<br>
### Alignment with Affine Gap Penalty and Calculation of Time Complexity for the Needleman-Wunsch Algorithm
| M | Match (not nessarily identified) |
| :--: | :------------------------------: |
| X | Insert at seq X (delet at seq Y) |
| Y | Insert at seq Y (delet at seq Y) |
| d | Gap open |
| e | Gap extension |

Time Complexity for the Needleman-Wunsch Algorithm:
Seq X length is m, Seq Y length is n, then the <u>O(mn)</u> operations needed in total.
<br>
<font color=red>`After that, Needleman-Wunsch and Smith-Waterman algorithms need to be learnt from beginning!`</font>


<br>
## Supplementary
<br>
### Homology
<br>
Common ancestor
ortholog -> speciation
paralog -> duplication
<br>
### Similarity & Identity
<br>
Like AA classes, which is useful to learn molecular biology:

1. Similarity Matrix
nucleotides <- match/mismatch seq alignment
phylogeny reconstruction <- complicated model
AA <- PAM matrix (<font color=red>`1PAM = one step of evolution`</font> ),BLOSUM++62++ (conserved seq multiple alignment)
PAM2 = (PAM1)^2,
log odds of PAM250: log odds = log(p/(1-p))
2. Dot matrix
Focus on the local word match or nor.
<br>
**<center>Correlated**
<br>
[Bioinformatics/ Introduction and Methos (Week 1 Bioinformatics Introduction)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethosweek1md)
[Bioinformatics/ Introduction and Methos (Week 2 Sequence Alignment)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethosweek2md)
[Bioinformatics/ Introduction and Methos (Week 3 Seq DB and BLAST Algorithnm)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek3md)
[Bioinformatics/ Introduction and Methos (Week 4 Markov Model)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek4md)
[Bioinformatics/ Introduction and Methos (Week 5 From Sequencing to NGS)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek5md)
[Bioinformatics/ Introduction and Methos (Week6 Variant Database)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek6md)
[Bioinformatics/ Introduction and Methos (Week7 Transcriptome Analysis, and RNA-Seq)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek7md)
[Bioinformatics/ Introduction and Methos (Week8 Prediction and Analysis of Noncoding RNA)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek8md)
[Bioinformatics/ Introduction and Methos (Week9 Ontology, Gene Ontology and KEGG Pathway Database)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek9md)
[Bioinformatics/ Introduction and Methos (Week 10 Bioinformatics Database and Resources)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek10md)
[Bioinformatics/ Introduction and Methos (Week 11 New Gene Evolution Detected by Genomic Computation: Basic Concepts and Examples)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek11md)
[Bioinformatics/ Introduction and Methos (Week 12 From Dry to Wet, an Evolutionary Story. Evolution function analysis of DNA methyltransferase)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek12md)</center>

Bioinformatics: Introduction and Methods (Week 2)