문제해결(PS)/ROSALIND

Genome Assembly with Perfect Coverage

곰탱이장 2024. 11. 18. 19:22

https://rosalind.info/problems/pcov/

 

ROSALIND | Genome Assembly with Perfect Coverage

It appears that your browser has JavaScript disabled. Rosalind requires your browser to be JavaScript enabled. Genome Assembly with Perfect Coverage solved by 825 Cyclic Chromosomes Recall that although chromosomes taken from eukaryotes have a linear struc

rosalind.info

Problem

A circular string is a string that does not have an initial or terminal element; instead, the string is viewed as a necklace of symbols. We can represent a circular string as a string enclosed in parentheses. For example, consider the circular DNA string (ACGTAC), and note that because the string "wraps around" at the end, this circular string can equally be represented by (CGTACA), (GTACAC), (TACACG), (ACACGT), and (CACGTA). The definitions of substrings and superstrings are easy to generalize to the case of circular strings (keeping in mind that substrings are allowed to wrap around).

Given: A collection of (error-free) DNA kk-mers (k50k≤50) taken from the same strand of a circular chromosome. In this dataset, all kk-mers from this strand of the chromosome are present, and their de Bruijn graph consists of exactly one simple cycle.

Return: A cyclic superstring of minimal length containing the reads (thus corresponding to a candidate cyclic chromosome).

Sample Dataset

ATTAC
TACAG
GATTA
ACAGA
CAGAT
TTACA
AGATT

Sample Output

GATTACA

 

 이 문제는 하나의 genome에서 떼어져나온 서열들을 다시 재조합하여 genome을 역추산하는 문제이다. 주어진 서열들은 서로 하나의 prefix, 하나의 suffix만 남겨두고 겹쳐진다. 그리고 이 겹쳐진 것들이 한 바퀴를 돌아오는 느낌이다. 

이 경우에는 ATTAC - TTACA에서는 A(TTAC)A 인식으로 겹친다. 필자는 주어진 서열 한개마다 끝에 있는 suffix를 한개씩 이어붙여서 circular 문자열을 만들 수 있겠다고 생각해서 그런식으로 코드를 구현했다.

def making_graph(kmers):
    graph=dict()
    for i in kmers:
        for j in kmers:
            if i[1:] == j[:-1]:
                graph[i]=j
    
    return graph

def making_circle(graph):
    start = kmers[0]
    circle = start[-1]
    now_node=graph[start]

    while True:
        if now_node == start:
            break
        circle+=now_node[-1]
        next_node = graph[now_node]
        now_node = next_node

    return circle


if __name__ == "__main__":

    with open(r"파일경로",'r') as rf:
        kmers = [i.rstrip() for i in rf.readlines()]

    graph = making_graph(kmers)
    with open(r"파일경로",'w') as wf:
        print(making_circle(graph),file=wf)

'문제해결(PS) > ROSALIND' 카테고리의 다른 글

Quartets  (1) 2024.11.19
Matching a Spectrum to a Protein  (1) 2024.11.18
Global Alignment with Scoring Matrix  (1) 2024.11.18
Counting Unrooted Binary Trees  (2) 2024.11.17
Counting Optimal Alignments  (0) 2024.11.16