Congreso
Autoría
APTEKMANN, ARIEL ALEJANDRO
;
Alejandro Nadra
;
Ignacio Sanchez
Fecha
2018
Editorial y Lugar de Edición
A2B2C
Resumen
Información suministrada por el agente en
SIGEVA
DNA linear motif classes, how many? how different? Sequence motifs are relatively short, recurring patterns.When found in DNA they are presumed to have a biological function. Some of them indicate sequence-specific binding sites for proteins, such as nucleases or transcription factors (TF).Frequently conservation of a sequence implies a selective pressure, which in turn suggests a function, although there are some motifs with no apparent functionality.In this work we study sequence motif databa...
DNA linear motif classes, how many? how different? Sequence motifs are relatively short, recurring patterns.When found in DNA they are presumed to have a biological function. Some of them indicate sequence-specific binding sites for proteins, such as nucleases or transcription factors (TF).Frequently conservation of a sequence implies a selective pressure, which in turn suggests a function, although there are some motifs with no apparent functionality.In this work we study sequence motif databases by modelling sequence motifs as regular expressions, which specify the length of the motif and which bases are allowed at each motif position.We develop a method for building a regular expression from position specific scoring matrices.Using this representation we tackle:How many linear motif classes remain to be discovered in nature?How many classes coexist on a genome? How different the motifs on a genome are? As a measure of motif specificity for a pair of linear motif classes, we quantify how many motif-discriminating positions prevent a subsequence from being an instance of the two classes at once.Naturally occurring pairs of DNA linear motif classes present most often one motif-discriminating position, which maximizes the potential number of coexisting linear motif classes. Increasing the size of the alphabet by means modifications increases the potential number of coexisting linear motif classes.We calculate the fraction of all possible protein subsequences that would belong to a linear motif class if the potential number of coexisting linear motif classes came into actual existence.This number is highest if the specificity requirement is no motif-discriminating positions.We propose that naturally occurring DNA linear motif classes operate under mild specificity requirements that maximize the potential number of coexisting linear motif classes.
Ver más
Ver menos
Palabras Clave
theoretical upper bound / empirical lower boundtranscription factorDNA motifspecificity