Science and Technology Production

Proceedings A2B2C - Transfer learning to annotate (a part of) the Protein Universe

Congress

Authorship:

STEGMAYER, GEORGINA SILVIA

Date:

2022

Publishing House and Editing Place:

A2B2C

Summary *

The automaticannotation of proteins is still an unresolved problem. For example, as ofAugust 2022, from 232,000,000 entries in UniProtKB only <1% of them arereviewed by expert curators. State-of-the-art annotation methods in Pfam, theprotein family database, are based on hidden Markov models that predict familydomain according to laborious hand-crafted sequence alignments. This approachhas grown the Pfam annotations at a very low rate (<5% in the last 5 years).Alternative proposals based on deep learning models (DL) have appeared recentlyto accurately predict functional annotations for unaligned amino acid sequences.However, since many Pfam families contain just very few sequences, trainingsuch models with few examples is challenging. Information provided by the agent in SIGEVA

Key Words

deep learningtransfer learning