Creating a Distance Matrix
https://rosalind.info/problems/pdst/
ROSALIND | Creating a Distance Matrix
It appears that your browser has JavaScript disabled. Rosalind requires your browser to be JavaScript enabled. Creating a Distance Matrix solved by 2527 2012년 11월 12일 9:40:57 오후 by Rosalind Team Topics: Alignment, Phylogeny Introduction to Distan
rosalind.info
Problem
For two strings s1s1 and s2s2 of equal length, the p-distance between them, denoted dp(s1,s2)dp(s1,s2), is the proportion of corresponding symbols that differ between s1s1 and s2s2.
For a general distance function dd on nn taxa s1,s2,…,sns1,s2,…,sn (taxa are often represented by genetic strings), we may encode the distances between pairs of taxa via a distance matrix DD in which Di,j=d(si,sj)Di,j=d(si,sj).
Given: A collection of nn (n≤10n≤10) DNA strings s1,…,sns1,…,sn of equal length (at most 1 kbp). Strings are given in FASTA format.
Return: The matrix DD corresponding to the p-distance dpdp on the given strings. As always, note that your answer is allowed an absolute error of 0.001.
Sample Dataset
>Rosalind_9499
TTTCCATTTA
>Rosalind_0942
GATTCATTTC
>Rosalind_6568
TTTCCATTTT
>Rosalind_1833
GTTCCATTTA
Sample Output
0.00000 0.40000 0.10000 0.10000
0.40000 0.00000 0.40000 0.30000
0.10000 0.40000 0.00000 0.20000
0.10000 0.30000 0.20000 0.00000
이 문제는 서열 간의 다름 정도가 얼마나 다른지를 써놓은 매트릭스를 출력하는 문제이다. p-거리는 다른염기의수/총길이이다. 이를 알면 아주 간단하게 풀 수 있다.
from Bio import SeqIO
if __name__=='__main__':
seqs=[]
with open(r"파일경로",'r') as fa:
for s in SeqIO.parse(fa,'fasta'):
seqs.append(s.seq)
ans=[[0 for _ in range(len(seqs))] for _ in range(len(seqs))]
for i in range(len(seqs)):
for j in range(i+1,len(seqs)):#대칭이므로 반만 하고
diff=0
for idx in range(len(seqs[i])):
if seqs[i][idx] != seqs[j][idx]:
diff+=1
err=diff/len(seqs[i])
ans[i][j] = err
ans[j][i] = err#2개를 넣기
wr=open(r'파일경로','w')
for k in ans:
wr.write(' '.join(map(str,k)))
wr.write('\n')