Arabinogalactan protein mining and diversity - the case of Centaurium erythraea
Abstract
Centaurium erythraea (common centaury) is a medicinal plant with extraordinary developmental plasticity in vitro, used as a model organism for studying in vitro morphogenesis in our lab. Several experimental lines of evidence have identified arabinogalactan proteins (AGPs) as one of the key players involved in centaury morphogenesis; however, the role of specific genes has yet to be determined. AGPs are ubiquitous plant cell surface glycoproteins associated with various physiological functions. AGP sequences are characterized by the presence of non-continuous hydroxyproline residues which serve as O-glycosylation anchor sites for branched arabinogalactans. Due to biased amino acid composition rich in disorder-promoting amino acids, AGP sequences lack a stable structure and consequently have lessened evolutionary constraints. Therefore, homology-based approaches to AGP sequence mining have limited success. We have recently developed a bioinformatics pipeline for AGP sequence mining, ragp, which exploits their key feature – the presence of hydroxyprolines. This pipeline combines estimation of proline hydroxylation based on local sequence context by a machine learning model with a flexible motif search. After applying this pipeline to the centaury transcriptome, AGP regions were found to associate with a variety of conserved domains. Here we introduce a streamlined way to train models for prediction of Pro hydroxylation, analyze important protein sequence features determining Pro hydroxylation status, present some of the AGP types found in centaury and discuss model limitations and future prospects.
Full Text:
PDFRefbacks
- There are currently no refbacks.
Website under continuous development.