Creating a Distance Matrix

곰탱이장 2024. 9. 14. 10:12

ROSALIND | Creating a Distance Matrix

It appears that your browser has JavaScript disabled. Rosalind requires your browser to be JavaScript enabled. Creating a Distance Matrix solved by 2527 2012년 11월 12일 9:40:57 오후 by Rosalind Team Topics: Alignment, Phylogeny Introduction to Distan

rosalind.info

Problem

For two strings $s_{1}$ and $s_{2}$ of equal length, the p-distance between them, denoted $d_{p} (s_{1}, s_{2})$ , is the proportion of corresponding symbols that differ between $s_{1}$ and $s_{2}$ .

For a general distance function $d$ on $n$ taxa $s_{1}, s_{2}, \dots, s_{n}$ (taxa are often represented by genetic strings), we may encode the distances between pairs of taxa via a distance matrix $D$ in which $D_{i, j} = d (s_{i}, s_{j})$ .

Given: A collection of $n$ ( $n \leq 10$ ) DNA strings $s_{1}, \dots, s_{n}$ of equal length (at most 1 kbp). Strings are given in FASTA format.

Return: The matrix $D$ corresponding to the p-distance $d_{p}$ on the given strings. As always, note that your answer is allowed an absolute error of 0.001.

Sample Dataset

>Rosalind_9499
TTTCCATTTA
>Rosalind_0942
GATTCATTTC
>Rosalind_6568
TTTCCATTTT
>Rosalind_1833
GTTCCATTTA

Sample Output

0.00000 0.40000 0.10000 0.10000
0.40000 0.00000 0.40000 0.30000
0.10000 0.40000 0.00000 0.20000
0.10000 0.30000 0.20000 0.00000

이 문제는 서열 간의 다름 정도가 얼마나 다른지를 써놓은 매트릭스를 출력하는 문제이다. p-거리는 다른염기의수/총길이이다. 이를 알면 아주 간단하게 풀 수 있다.

from Bio import SeqIO

if __name__=='__main__':
    seqs=[]
    with open(r"파일경로",'r') as fa:
        for s in SeqIO.parse(fa,'fasta'):
            seqs.append(s.seq)

ans=[[0 for _ in range(len(seqs))] for _ in range(len(seqs))]

for i in range(len(seqs)):
    for j in range(i+1,len(seqs)):#대칭이므로 반만 하고
        diff=0
        for idx in range(len(seqs[i])):
            if seqs[i][idx] != seqs[j][idx]:
                diff+=1
        err=diff/len(seqs[i])
        ans[i][j] = err
        ans[j][i] = err#2개를 넣기
wr=open(r'파일경로','w')
for k in ans:
    wr.write(' '.join(map(str,k)))
    wr.write('\n')