# Week 11 New Gene Evolution Detected by Genomic Computation: Basic Concepts and Examples
<br>
<p align="justify"> Statement before class: All the contents here marked are belong to the course [<font color=orange>Bioinformatics: Introduction and Methods</font>](https://www.coursera.org/learn/bioinformatics-pku/home/welcome). If you like this high-quality course contents, please turn to the official course for more details. Here, i recored the knowledge learnt for convenience. If any copyright problem exist, please tell me, i’ll delete all immediately, Thanks!</p>
<br>
## New Gene Evolution Detected by Genomic Computation: Basic Concepts and Examples
this comparison immediately told us that organisms evolve in number of genes and size of genomes. This suggests that there is a general process of birth and death of genes in evolution, i.e., the new gene origination become a very important general problem.
Whati is new gene?




Using this definition, then we right now can identify the new genes from the 12 Drosophila species, because all their genomic sequences have been reported in the database and public available to everybody. The left side is a pipeline we defined when we derived computer pipeline to identify new genes.

<br>
<font color=orange>`11 molecular mechanisms which can create new genes`</font>

<br>
**Summary**
- A new gene is a gene that originated recently in a genome and can be identified by syntenic alignment of genomic sequences feom a group of closly species.
- A number of molecular mechanisms can generate new genes and more than one mechanisms can be invloved in making a new gene.
- New genes can be biologically important as old or ancient genes. In insects, essential fuunctions can evlove rapidly any time in evolution.
<br>
## New Gene Evolution Detected by Genomic Computation: A Driver for Human Braiin Evolution
<br>
Bioinformatics pipeline

<br>
**Summary**
- Evolution of brain was accompanied with origin of new genes.
- New genes are upgrated in the neocortex, in particular the prefrontal cortex regions, throughout evolution of vertebrates.
- Many new genes, in particular human-specific, new genes expressed in the prefrontal cortex and temporal lobe, the brain structure involved for cognitive functions.
<br>
## A Human-Specific de novo Gene Associated with Addition
> microRNA binding sites tend to occur in tandem
FLJ33706 -> SNP -> exons -> splicing sites -> mRNA strcture -> alignment tree -> UCSC genome browser -> enhancer and transcription binding sites -> prromoter region -> ORF (Open Reading Frame) -> PI and enrichment -> Predicted Secondary Structure -> Predicted 3D Structure -> Nevertheless, FLJ33706 was the first human-specific de novo protein coding gene associated with brain functions.
<br>
## Origination of de novo Genes from Noncoding RNAs
<br>
Using bioinformatic analysis of high-throughput data to infer the origination of de novo genes.
<font color=orange>`Genome-wide identification of human- and human-chimpanzeespecific de novo genes`</font>

<br>
## Presentation
<br>
<font color=orange>`What is phylogeny?`</font>
- <font color=orange>`Phylogenetics:`</font> study of <u>evolutionary relationships</u> among groups of orgnisms or genes, which discovered through <u>molecular sequencing data and morphological data metrices.</u>
- <font color=orange>`Phylogenetic tree:`</font> A <u>graph</u> depiciting the <u>ancestor-descendant</u> relationships between organisms or gene seqs. The seqs are the tips of tree. **Branches** of the tree connect the tips to their ancestral seqs.
<br>
<font color=orange>`Why do phylogeny estimation?`</font>
- Detection of orthology and paralogy
- Estimating divergence times
- Reconstructing ancient protiens
- Fiding the residues that are important to natural selection
- Detecting recombination points
- Identifying mutations likely to be associated with disesse
- Determining the identity of new pathogens
<br>
<font color=orange>`How to estimate phylogeny?`</font>
- Assumption
- Time incresees since two seqs diverged from their last common ancestor, so does the number of differences between them
- Basic idea
- Count num of differences between seqs and groups those that are most similar
- Complexity
- The rate of seq evolution is not constant over time
- Natural selection or changing mutatinal biases exist
- Many of the sites in a DNA seq are not helpful

<br>
**Neighbour-joining (NJ) algorithm**

Tool: PAUP, MEGA, PHYLIP
<br>
**Parsimony**
Tool: PAUP, NONA, MEGA, PHYLIP
<br>
**Maximum likelihood**
<br>
**Assessing confidence - the bootstrap**
随机取出碱基并拿新的序列建树,看与原来的初始树是否一致,一般重复取这个流程100次看结果也就是boostrap = 100

<br>
**Hypothesis testing**

<br>
**Bayesian phylogenetics**
- To maximize the posterior probability *P(tree|data)*,
<center>
- $$
P(T,at_·)=\frac{P(x^·|T,t_·)P(T,t_·)}{P(x^·)}
$$
</center>
- Strong connection to maximun likelihood method
- primary analysis **produces measure of uncertainty**
- Allows **complex models** of seq evolution to be implemented
- Does not rely on **moleculat clock** assumption to estimate divergence times
- nuisance parameters are integrated out to obtain **marginal posterior**(后验概率) probability of a tree

Tools:MrBayes, BAMBE
- Markov Chain Monte Carlo (MCMC): 逐步改变某个参数或内容的话得到的结果比之前那就用新的设定内容,但不好确定这个逐步过程什么时候结束,结束太早无法得到全局最优值,结束太晚费事,也不知参量调整的范围是多大

<br>
**Summary**
- Estimation of phylogenies has become a regular strp in analysis of a new gene seq
- **MCMC-based approches** are extending the field by answering previously intractable questions
- These new techs seem poised to teach us a great deal about the tree of life and molecular genetics
<br>
**<center>Correlated**
<br>
[Bioinformatics/ Introduction and Methos (Week 1 Bioinformatics Introduction)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethosweek1md)
[Bioinformatics/ Introduction and Methos (Week 2 Sequence Alignment)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethosweek2md)
[Bioinformatics/ Introduction and Methos (Week 3 Seq DB and BLAST Algorithnm)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek3md)
[Bioinformatics/ Introduction and Methos (Week 4 Markov Model)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek4md)
[Bioinformatics/ Introduction and Methos (Week 5 From Sequencing to NGS)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek5md)
[Bioinformatics/ Introduction and Methos (Week6 Variant Database)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek6md)
[Bioinformatics/ Introduction and Methos (Week7 Transcriptome Analysis, and RNA-Seq)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek7md)
[Bioinformatics/ Introduction and Methos (Week8 Prediction and Analysis of Noncoding RNA)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek8md)
[Bioinformatics/ Introduction and Methos (Week9 Ontology, Gene Ontology and KEGG Pathway Database)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek9md)
[Bioinformatics/ Introduction and Methos (Week 10 Bioinformatics Database and Resources)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek10md)
[Bioinformatics/ Introduction and Methos (Week 11 New Gene Evolution Detected by Genomic Computation: Basic Concepts and Examples)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek11md)
[Bioinformatics/ Introduction and Methos (Week 12 From Dry to Wet, an Evolutionary Story. Evolution function analysis of DNA methyltransferase)](https://www.haoxi.info/archives/bioinformaticsintroductionandmethodsweek12md)</center>

Bioinformatics: Introduction and Methods (Week 11)