Proceedings A2B2C - Transfer learning to annotate (a part of) the Protein Universe
Congress
Authorship:
STEGMAYER, GEORGINA SILVIADate:
2022Publishing House and Editing Place:
A2B2CSummary *
The automaticannotation of proteins is still an unresolved problem. For example, as ofAugust 2022, from 232,000,000 entries in UniProtKB only <1% of them arereviewed by expert curators. State-of-the-art annotation methods in Pfam, theprotein family database, are based on hidden Markov models that predict familydomain according to laborious hand-crafted sequence alignments. This approachhas grown the Pfam annotations at a very low rate (<5% in the last 5 years).Alternative proposals based on deep learning models (DL) have appeared recentlyto accurately predict functional annotations for unaligned amino acid sequences.However, since many Pfam families contain just very few sequences, trainingsuch models with few examples is challenging. Information provided by the agent in SIGEVAKey Words
deep learningtransfer learning