Problem
A partial split of a set SS of nn taxa models a partial character and is denoted by A∣BA∣B, where AA and BB are still the two disjoint subsets of taxa divided by the character. Unlike in the case of splits, we do not necessarily require that A∪B=SA∪B=S; (A∪B)c(A∪B)c corresponds to those taxa for which we lack conclusive evidence regarding the character.
We can assemble a collection of partial characters into a generalized partial character table CC in which the symbol xx is placed in Ci,jCi,j if we do not have conclusive evidence regarding the jjth taxon with respect to the iith partial character.
A quartet is a partial split A∣BA∣B in which both AA and BB contain precisely two elements. For the sake of simplicity, we often will consider quartets instead of partial characters. We say that a quartet A∣BA∣B is inferred from a partial split C∣DC∣D if A⊆CA⊆C and B⊆DB⊆D (or equivalently A⊆DA⊆D and B⊆CB⊆C). For example, {1,3}∣{2,4}{1,3}∣{2,4} and {3,5}∣{2,4}{3,5}∣{2,4} can be inferred from {1,3,5}∣{2,4}{1,3,5}∣{2,4}.
Given: A partial character table CC.
Return: The collection of all quartets that can be inferred from the splits corresponding to the underlying characters of CC.
Sample Dataset
cat dog elephant ostrich mouse rabbit robot
01xxx00
x11xx00
111x00x
Sample Output
{elephant, dog} {rabbit, robot}
{cat, dog} {mouse, rabbit}
{mouse, rabbit} {cat, elephant}
{dog, elephant} {mouse, rabbit}
이 문제는 완성되지 않은 캐릭터 테이블에서 추론될 수 있는 quartet을 구하는 문제이다. quartet이라 함은 {a,b} {c,d} 형식으로 4개가 묶인 형태를 말한다. 이러한 quartet은 C|D 일때 A<C , B<D 일때 A|B라고 추론할 수 있는 점을 이용하여 구할 수 있다. 즉 우리는 캐릭터 테이블에서 전체 split을 구한 다음, 그 split의 두 부분에서 각 2개씩 뽑아서 추론할 수 있는 모든 quartet을 출력하면 된다. 그러면 이 문제는 단순한 고르기 문제가 된다.
def chose_two(q):#두개씩 고르는 함수
return {tuple(sorted([q[i], q[j]])) for i in range(len(q) - 1) for j in range(i + 1, len(q))}
def making_quartets(now_split):#콰르텟을 만드는 함수
quartet_set = set()
q0, q1 = now_split
if len(q0) < 2 or len(q1) < 2:#2개 고를 수 없음
return quartet_set
q0_chosen_two = chose_two(q0)
q1_chosen_two = chose_two(q1)
for pair1 in q0_chosen_two:
for pair2 in q1_chosen_two:
quartet_set.add(tuple(sorted([pair1, pair2])))#중복방지
return quartet_set
if __name__ == "__main__":
with open(r'파일경로', 'r') as rf:
ele_list = rf.readline().strip().split()
partial_charactor = [line.strip() for line in rf]
total_split = []
for char in partial_charactor:
now_split = [[], []]
for idx, val in enumerate(char):
if val == '0':
now_split[0].append(ele_list[idx])
elif val == '1':
now_split[1].append(ele_list[idx])
total_split.append(now_split)
total_quartets = set()
for now_split in total_split:
total_quartets.update(making_quartets(now_split))
with open(r"파일경로", 'w') as wf:
for quartet in sorted(total_quartets):
wf.write(f"{{{quartet[0][0]}, {quartet[0][1]}}} {{{quartet[1][0]}, {quartet[1][1]}}}\n")
'문제해결(PS) > ROSALIND' 카테고리의 다른 글
Encoding Suffix Trees (0) | 2024.11.23 |
---|---|
Using the Spectrum Graph to Infer Peptides (0) | 2024.11.23 |
Matching a Spectrum to a Protein (1) | 2024.11.18 |
Genome Assembly with Perfect Coverage (0) | 2024.11.18 |
Global Alignment with Scoring Matrix (0) | 2024.11.18 |