In silico glycomics
Where there is life, there are sugars. Carbohydrates, and the biosynthetic machinery to build glycans (the glycosylation metabolic network) are found in every domain of life. In eukaryotes, even though the basic structure of the secretory glycosylation network is shared, the diversity of glycans between species is massive. Glycans are largely found in the extracellular space on proteins and lipids, where they serve many general functions in assembly of the glycocalyx and extracellular matrix, protection and interaction with the environment, and lubrication and clearance of microorganisms. Glycans also play highly specific roles in myriad fundamental protein functions such as co-regulation of proprotein convertase processing and ectodomain shedding, modulation of receptor activation and interactions, and modulation of peptide hormone stability and their ligand binding propensities. Inside the cell, in the endoplasmic reticulum and Golgi, glycans serve non-specific roles to ensure the correct folding of proteins and sorting of proteins e.g. to lysosomal compartments. Glycosylation also takes place in the nucleus and cytoplasm, and through cross-talk with phosphorylation, co-regulates most cell signalling, including playing roles in regulating the cell cycle. Thus, most cellular proteins undergo one or more types of glycosylation and there is great potential for the discovery of specific roles of glycosylation in defined cellular contexts.
The biosynthesis of glycans is a complex, non-template driven process that involves the orchestration of expression of over 700 genes, including glycosyltransferases, glycosylhydrolases, nucleotide sugar transporters and other enzymes. Of the 700 genes, the builders - approximately 250 glycosyltransferases and sulfotransferases (glycogenes) - are arguably the most important, because they directly catalyse the synthesis and modification of glycans in a stepwise manner. A naïve reading of these glycogenes results in a prediction of millions of potential glycans.
The biosynthesis of glycans takes place in a single cell, and the results of the glycosylation process for that cell (the glycome) is tailored to suit the functional needs of that cell. Dysregulation of the glycosylation process results in aberrant glycosylation, and impaired cellular functions where these functions are dependent on glycosylation. In order to understand the myriad functions of glycans, we need to understand not only how they are regulated from cell to cell, but also how this process can be dysregulated. Direct analytics upon glycans is difficult due to both the heterogeneity of glycans, but also technological challenges.
In the Joshi group, we use computational and data science approaches to uncover patterns of regulation within the glycosylation process, taking advantage of large amounts of publicly available transcriptomic, proteomic and glycomic data. For example, by mining transcriptomic data we can bring order to this large family of genes so that we can predict what the activity of these genes are, what their patterns of regulation are, and predict the potential impact of their dysregulation upon health and disease.