https://rosalind.info/problems/kmer/
ROSALIND | k-Mer Composition
It appears that your browser has JavaScript disabled. Rosalind requires your browser to be JavaScript enabled. k-Mer Composition solved by 3385 2012년 7월 13일 12:00:00 오전 by Gabriel Valiente Topics: String Algorithms Generalizing GC-Content Figure
rosalind.info
Problem
For a fixed positive integer kk, order all possible k-mers taken from an underlying alphabet lexicographically.
Then the k-mer composition of a string ss can be represented by an array AA for which A[m]A[m] denotes the number of times that the mmth k-mer (with respect to the lexicographic order) appears in ss.
Given: A DNA string ss in FASTA format (having length at most 100 kbp).
Return: The 4-mer composition of ss.
Sample Dataset
>Rosalind_6431
CTTCGAAAGTTTGGGCCGAGTCTTACAGTCGGTCTTGAAGCAAAGTAACGAACTCCACGG
CCCTGACTACCGAACCAGTTGTGAGTACTCAACTGGGTGAGAGTGCAGTCCCTATTGAGT
TTCCGAGACTCACCGGGATTTTCGATCCAGCCTCAGTCCAGTCTTGTGGCCAACTCACCA
AATGACGTTGGAATATCCCTGTCTAGCTCACGCAGTACTTAGTAAGAGGTCGCTGCAGCG
GGGCAAGGAGATCGGAAAATGTGCTCTATATGCGACTAAAGCTCCTAACTTACACGTAGA
CTTGCCCGTGTTAAAAACTCGGCTCACATGCTGTCTGCGGCTGGCTGTATACAGTATCTA
CCTAATACCCTTCAGTTCGCCGCACAAAAGCTGGGAGTTACCGCGGAAATCACAG
Sample Output
4 1 4 3 0 1 1 5 1 3 1 2 2 1 2 0 1 1 3 1 2 1 3 1 1 1 1 2 2 5 1 3 0 2 2 1 1 1 1 3 1 0 0 1 5 5 1 5 0 2 0 2 1 2 1 1 1 2 0 1 0 0 1 1 3 2 1 0 3 2 3 0 0 2 0 8 0 0 1 0 2 1 3 0 0 0 1 4 3 2 1 1 3 1 2 1 3 1 2 1 2 1 1 1 2 3 2 1 1 0 1 1 3 2 1 2 6 2 1 1 1 2 3 3 3 2 3 0 3 2 1 1 0 0 1 4 3 0 1 5 0 2 0 1 2 1 3 0 1 2 2 1 1 0 3 0 0 4 5 0 3 0 2 1 1 3 0 3 2 2 1 1 0 2 1 0 2 2 1 2 0 2 2 5 2 2 1 1 2 1 2 2 2 2 1 1 3 4 0 2 1 1 0 1 2 2 1 1 1 5 2 0 3 2 1 1 2 2 3 0 3 0 1 3 1 2 3 0 2 1 2 2 1 2 3 0 1 2 3 1 1 3 1 0 1 1 3 0 2 1 2 2 0 2 1 1
이 문제는 4-mer compostion을 사전순으로 구하고 각 composition이 서열에 몇개나 들어가는지 알아내는 문제이다. 4-mer compostion는필자는 DFS를 응용하여 구했다. 그리고 이렇게 구한 것을 딕셔너리의 키로 하고 밸류를 0으로 하여, 서열을 돌며 4개의 그 서브 서열을 키로 넣어 1씩 더하게 하여 개수를 구했다.
from Bio import SeqIO
def making_mer(s):
if len(s)==4:
mer[s]=0
return
for i in 'ACGT':
n=s+i
making_mer(n)
n=s
if __name__=='__main__':
with open(r'파일경로','r') as fa:
seq=str(next(SeqIO.parse(fa,'fasta')).seq)
mer={}
making_mer('')
ans=[]
for i in range(len(seq)-3):
s=seq[i:i+4]
mer[s]=mer[s]+1
for i in mer.values():
print(i,end=' ')
'문제해결(PS) > ROSALIND' 카테고리의 다른 글
Finding a Shared Spliced Motif (0) | 2024.09.11 |
---|---|
Speeding Up Motif Finding (2) | 2024.09.08 |
Counting Phylogenetic Ancestors (0) | 2024.09.07 |
Error Correction in Reads (1) | 2024.09.06 |
Catalan Numbers and RNA Secondary Structures(다시 풀기) (3) | 2024.09.04 |