Hence, from the above descriptions, we can know that the ADLDs obtained by our newly proposed method are quite visual and intuitional and maybe a powerful and effective tool for visual comparison of protein sequences and numerical sequences in other research fields.
In this paper, a novel ADLDs based graphical representation of protein sequences is proposed, which is utilized to analyze the similarity/dissimilarity of protein sequences.
Number Name Abbreviation Access number Length 1 Human Human ADT80430.1 603 2 Gorilla Gorilla NP_008222 603 3 Pigmy chimpanzee Pi-chim NP_008209 603 4 Common chimpanzee C-chim NP-008196 603 5 Fin-whale Fin-whale NP_006899 606 6 Blue-whale Blue-whale NP_007066 606 7 Rat Rat AP_004902.1 610 8 Mouse Mouse NP_904338 607 9 Opossum Opossum NP_007105 602 10 Sheep Sheep ABW22903.1 606 11 Goat Goat BAN59258.1 606 12 Lemur Lemur CAD13431.1 603 13 Cattle Cattle ADN11902.1 606 14 Hare Hare CAD13291.1 603 15 Gallus Gallus BAE16036.1 605 16 Rabbit Rabbit NP_007559.1 603 TABLE 6: The similarity/dissimilarity matrix for the 16 ND5 proteins based on the ADLDs based method.
Therefore, in order to overcome the main drawbacks of existing methods, in this paper, a novel graphical representation of protein sequences called ADLD (Alignment Diagonal Line Diagram) is introduced based on PCA, and then a new ADLD based method is proposed and utilized to analyze the similarity/dissimilarity of protein sequences.
(4) For any two numerical sequences, we can draw a graph, named ADLD, and then abstract some numerical characteristics of it, which can be utilized to analyze the similarity/dissimilarity of these two sequences.
Therefore, in order to improve the intuition of the ASD, we will propose a simplified variant diagram of the ASD, which is called the Alignment Diagonal Line Diagram (ADLD).
Obviously, in an ASD, if keeping all of the SFs and FPs only and omitting all those other APs, then we will obtain a simplified variant diagram of the ASD, and, for convenience, we call it the Alignment Diagonal Line Diagram (ADLD).
For convenience of analysis, in an ADLD, suppose that there are [K.sub.1] different SFs and [K.sub.2] different FPs on its AT, K different BTs locating above its AT, and K different BTs locating below its AT; then we get the following.
(1) For these [K.sub.1] different SFs and [K.sub.2] different FPs on the AT of the ADLD, we will number these [K.sub.1] SFs and [K.sub.2] FPs from left to right and utilize [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]} to represent these K1 SFs and K2 FPs separately.
And, in addition, we would also call these SFs on the BTs of the ADLD the BSFs.
From Figure 2(a), it is easy to see that there are two SFs in the ADLD of the sequence pair (chimpanzee, human); one is ASF1, that is, the line segment from the point (1,1) to the point (32,32), and the other is [BSF.sup.1.sub.-4], that is, the line segment from the point (35,31) to the point (125,121).
Observing Figure 2(b), we can easily find that there are also two SFs in the ADLD of the sequence pair (human, gorilla).