Longest Increasing Subsequence(가장 긴 증가 서열)

문제해결(PS)/ROSALIND

Longest Increasing Subsequence(가장 긴 증가 서열)

곰탱이장 2024. 8. 22. 21:52

ROSALIND | Longest Increasing Subsequence

It appears that your browser has JavaScript disabled. Rosalind requires your browser to be JavaScript enabled. Longest Increasing Subsequence solved by 3342 A Simple Measure of Gene Order Similarity In “Enumerating Gene Orders”, we started talking abou

rosalind.info

Problem

A subsequence of a permutation is a collection of elements of the permutation in the order that they appear. For example, (5, 3, 4) is a subsequence of (5, 1, 3, 4, 2).

A subsequence is increasing if the elements of the subsequence increase, and decreasing if the elements decrease. For example, given the permutation (8, 2, 1, 6, 5, 7, 4, 3, 9), an increasing subsequence is (2, 6, 7, 9), and a decreasing subsequence is (8, 6, 5, 4, 3). You may verify that these two subsequences are as long as possible.

Given: A positive integer $n \leq 10000$ followed by a permutation $π$ of length $n$ .

Return: A longest increasing subsequence of $π$ , followed by a longest decreasing subsequence of $π$ .

Sample Dataset

5
5 1 4 2 3

Sample Output

1 2 3
5 4 2

이 문제는 언뜻 봤을 때 쉬운 문제인줄 알았다. 처음에는 DFS로 접근해서 모든 증가 서열을 만들었고 이 중 가장 길이가 긴 것을 저장하는 식으로 했다. 하지만 제출해야 되는 파일에서 수열의 길이가 8천,9천 단위이고 계산하는데에 시간이 매우 오래 걸리는걸 보고 잘못 되었다는 것을 깨달았다.

그래서 생각하다 결국 해답을 찾지 못하여 깃허브에서 여러 정답 코드들을 보았고, 개 중 가장 마음에 드는 풀이를 서술하도록 하겠다. 아래는 그 풀이의 코드의 링크이다.

https://github.com/danhalligan/rosalind.info/blob/main/rosalind/bioinformatics_stronghold/lgis.py#L3

rosalind.info/rosalind/bioinformatics_stronghold/lgis.py at main · danhalligan/rosalind.info

Solutions to the bioinformatic coding challenges at rosalind.info - danhalligan/rosalind.info

github.com

이 풀이는 다이나믹 프로그래밍(DP)를 이용하였다. DP를 이용한다면 증가하는 서열을 저장할 일도, 그 길이를 일일이 저장할 일도 없이, 간편하고 리니어하게 증가하는 서열을 구성할 수 있다. 필자는 다이나믹 프로그래밍을 써야겠다고 생각해야 쓸 수 있는 수준인지라, 다이나믹 프로그래밍이 어디에서 자주 쓰이는지 더 공부할 필요가 있다고 생각했다.

def lgis(seq):
    l=len(seq)
    m=[1]*l#최장거리 저장
    p=[-1]*l#직전 숫자 저장

    for i in range(l):
        for j in range(i):
            if seq[j] < seq[i] and m[i] < m[j]+1:#더 작은거 확인/최단거리 더 짧음
                m[i] = m[j]+1#최장거리 갱신
                p[i] = j#최장거리 직전 숫자 저장
    
    long = max(m)#최장거리
    point = m.index(long) #최장의 끝의 인덱스

    sub=[]#역추적용
    while point != -1:
        sub.append(seq[point])
        point = p[point]
    
    sub.reverse()#뒤에서부터 넣었으니 뒤집어주기
    return(sub)

def wrt(seq):#파일로 써주는 형식으로
    st=''
    for i in seq:
        st=st+str(i)+' '
    st=st+'\n'
    wfile.write(st)

file = open(r'파일경로','r')
wfile=open(r'파일경로','w')
l=next(file)#길이인데 귀찮아서 안 씀
data=str(next(file))
data=list(map(int,(data.rstrip()).split(' ')))#원하는 형식으로 바꿔줌

s1=lgis(data)
wrt(s1)

s2=lgis([-x for x in data])#역으로 하면 작은 순으로 됨
s2 = [-x for x in s2]
wrt(s2)

'문제해결(PS) > ROSALIND' 카테고리의 다른 글

Partial Permutations(부분 순열) (2)	2024.08.30
Genome Assembly as Shortest Superstring(가장 짧은 문자열로 게놈합성) (2)	2024.08.29
Locating Restriction Sites(제한부위 위치) (0)	2024.08.16
Enumerating Gene Orders(유전자 순서 열거하기) (0)	2024.08.15
Open Reading Frames (오픈 리딩 프레임) (0)	2024.08.14

현재글Longest Increasing Subsequence(가장 긴 증가 서열)

곰탱이의 공부정리

공부한 것 정리하기

다이나믹 프로그래밍, 트리, 확률, 생물정보학, DP, 조합, phylogeny, 브루트포스, 데이크스트라, rosalin, 계통학, BFS, 백준, 그래프, 백트랙킹, 문자열, Rosalind, 순열, 바이오파이썬, dfs,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

곰탱이의 공부정리