Inferring Peptide from Full Spectrum

문제해결(PS)/ROSALIND

Inferring Peptide from Full Spectrum

곰탱이장 2024. 11. 2. 10:56

Problem

Say that we have a string $s$ containing $t$ as an internal substring, so that there exist nonempty substrings $s_{1}$ and $s_{2}$ of $s$ such that $s$ can be written as $s_{1} t s_{2}$ . A t-prefix contains all of $s_{1}$ and none of $s_{2}$ ; likewise, a t-suffix contains all of $s_{2}$ and none of $s_{1}$ .

Given: A list $L$ containing $2 n + 3$ positive real numbers ( $n \leq 100$ ). The first number in $L$ is the parent mass of a peptide $P$ , and all other numbers represent the masses of some b-ions and y-ions of $P$ (in no particular order). You may assume that if the mass of a b-ion is present, then so is that of its complementary y-ion, and vice-versa.

Return: A protein string $t$ of length $n$ for which there exist two positive real numbers $w_{1}$ and $w_{2}$ such that for every prefix $p$ and suffix $s$ of $t$ , each of $w (p) + w_{1}$ and $w (s) + w_{2}$ is equal to an element of $L$ . (In other words, there exists a protein string whose $t$ -prefix and $t$ -suffix weights correspond to the non-parent mass values of $L$ .) If multiple solutions exist, you may output any one.

Sample Dataset

1988.21104821
610.391039105
738.485999105
766.492149105
863.544909105
867.528589105
992.587499105
995.623549105
1120.6824591
1124.6661391
1221.7188991
1249.7250491
1377.8200091

Sample Output

KEKEP

이 문제는 단백질 질량의 스펙트럼과 잘려진 B-ion,Y-ion의 질량을 나타내는 스펙트럼으로 잘려진 부분들의 펩타이드를 구하는 문제이다. 이 문제에서 유의할 점은 크기 순서대로 주어지나, 이는 B-ion,Y-ion 순이 아니란 점이다. 가장 가까이 붙어있는 스펙트럼이 실은 각각 y-ion, b-ion으로 아미노산 하나의 차이가 아닐 수도 있다. 또한 어떠한 b-ion을 사용하여 아미노산을 구했다면 그에 대응하는 y-ion도 쓰면 안된다.

b-ion, y-ion은 특정한 곳을 중심으로 잘렸을 때의 ion들을 얘기한다. 더 쉽게 얘기하면 b-ion + y-ion은 전체 단백질이다.

그러므로 이 문제는 가장 짧은 펩타이드를 가장 짧은 b-ion이라 가정하고 각 스펙트럼에서 하나의 아미노산만큼 차이가 나는 스펙트럼으로 옮겨가며 b-ion,y-ion을 제거해가면서 펩타이드를 찾아야한다.

필자는 이를 DFS로 풀었고 쓴 b-ion,y-ion을 방문처리 하는 식으로 풀었다.

monoisotopic_mass_table={
    'A':71.03711, 'C':103.00919, 'D':115.02694, 'E':129.04259,
    'F':147.06841, 'G':57.02146, 'H':137.05891, 'I':113.08406, 'K':128.09496,
    #'L':113.08406,
    'M':131.04049, 'N':114.04293, 'P':97.05276, 'Q':128.05858,
    'R':156.10111, 'S':87.03203, 'T':101.04768, 'V':99.06841, 
    'W':186.07931, 'Y':163.06333 
}

def make_pep(now_spec,pep,used_specs):
    if len(pep) == n:
        print(pep)
        return
    for next_spec in by_specs:
        if next_spec not in used_specs:#안 쓴 ion이라면
            diff = round(next_spec-now_spec,5)
            if round(next_spec-now_spec,5) in new_table_keys:#하나의 아미노산 차이라면
                new_used_specs = used_specs.copy()#카피해서 써야 다시 돌아와도 또 씀
                new_used_specs.append(round(total_spec-next_spec,5))
                new_used_specs.append(next_spec)

                make_pep(next_spec,pep+new_table[diff],new_used_specs)#아미노산 추가 후 다음 스펙트럼



new_table={}
for k,v in monoisotopic_mass_table.items():
    new_table[round(v,5)]=k
new_table_keys = new_table.keys()



if __name__ == '__main__':
    specs=[]
    with open(r"파일경로",'r') as f:
        for i in f.readlines():
            specs.append(round(float(i.rstrip()),5))
        total_spec = specs[0]
        by_specs = specs[1:]


n=(len(specs)-3)//2
used_specs=[by_specs[0],round(total_spec-by_specs[0],5)]
pep=''
now_spec = by_specs[0]
make_pep(now_spec,pep,used_specs)

'문제해결(PS) > ROSALIND' 카테고리의 다른 글

Finding Disjoint Motifs in a Gene (2)	2024.11.03
Independent Segregation of Chromosomes (1)	2024.11.02
Constructing a De Bruijn Graph (1)	2024.10.20
Creating a Character Table (1)	2024.10.19
Comparing Spectra with the Spectral Convolution (2)	2024.10.17

현재글Inferring Peptide from Full Spectrum

곰탱이의 공부정리

공부한 것 정리하기

생물정보학, 바이오파이썬, 다이나믹 프로그래밍, 순열, BFS, 확률, 문자열, 브루트포스, phylogeny, 데이크스트라, Rosalind, rosalin, 트리, 백준, 백트랙킹, 조합, 그래프, dfs, DP, 계통학,

Today :
Yesterday :

곰탱이의 공부정리