문제해결(PS)/ROSALIND

Constructing a De Bruijn Graph

곰탱이장 2024. 10. 20. 13:34

https://rosalind.info/problems/dbru/

 

ROSALIND | Constructing a De Bruijn Graph

It appears that your browser has JavaScript disabled. Rosalind requires your browser to be JavaScript enabled. Constructing a De Bruijn Graph solved by 1148 2012년 7월 2일 12:00:00 오전 by Mikhail Dvorkin Topics: Genome Assembly Wading Through the Rea

rosalind.info

Problem

Consider a set SS of (k+1)(k+1)-mers of some unknown DNA string. Let SrcSrc denote the set containing all reverse complements of the elements of SS. (recall from “Counting Subsets” that sets are not allowed to contain duplicate elements).

The de Bruijn graph BkBk of order kk corresponding to SSrcS∪Src is a digraph defined in the following way:

  • Nodes of BkBk correspond to all kk-mers that are present as a substring of a (k+1)(k+1)-mer from SSrcS∪Src.
  • Edges of BkBk are encoded by the (k+1)(k+1)-mers of SSrcS∪Src in the following way: for each (k+1)(k+1)-mer rr in SSrcS∪Src, form a directed edge (r[1:k]r[1:k], r[2:k+1]r[2:k+1]).

Given: A collection of up to 1000 (possibly repeating) DNA strings of equal length (not exceeding 50 bp) corresponding to a set SS of (k+1)(k+1)-mers.

Return: The adjacency list corresponding to the de Bruijn graph corresponding to SSrcS∪Src.

Sample Dataset

TGAT
CATG
TCAT
ATGC
CATC
CATC

Sample Output

(ATC, TCA)
(ATG, TGA)
(ATG, TGC)
(CAT, ATC)
(CAT, ATG)
(GAT, ATG)
(GCA, CAT)
(TCA, CAT)
(TGA, GAT)

 

 이 문제는 주어진 DNA와 이들의 역상보적 DNA의 k+1-mer 들의 de Brujin graph의 edge들을 출력하는 문제이다. de Brujin graph의 node는 (k+1)-mer DNA를 함유하는 set S와 그들의 역상보적인 DNA를 함유하는 set Src의 합집합의 요소로 만든 k-mer들이고 edge는 k+1-mer인 r에서 r[1:k] -> r[2:k+1]인 방향성이 있는 edge이다.

 위의 개념을 그대로 차분히 코드로 구현해내면 쉽게 구할 수 있다.

from Bio.Seq import Seq

if __name__ == "__main__":
    with open(r"파일경로",'r') as f:
        seqs=set()
        for i in f.readlines():
            seqs.add(i.rstrip())
            ri = Seq(i.rstrip())
            seqs.add(str(ri.reverse_complement()))

anss=[]
for node in seqs:
    anss.append((node[0:len(node)-1],node[1:len(node)]))

wf = open(r"파일경로",'w')
anss.sort()
for ans in anss:
    print(f'({ans[0]}, {ans[1]})',file=wf)