The Joshi group uses machine learning and data science to uncover the contributions from regulation of hundreds of glycosylation related genes to the overall glycosylation process that takes place in the endoplasmic reticulum and Golgi.

Research interests

In silico glycomics

Where there is life, there are sugars. Carbohydrates, and the biosynthetic machinery to build glycans (the glycosylation metabolic network) are found in every domain of life. In eukaryotes, even though the basic structure of the secretory glycosylation network is shared, the diversity of glycans between species is massive. Glycans are largely found in the extracellular space on proteins and lipids, where they serve many general functions in assembly of the glycocalyx and extracellular matrix, protection and interaction with the environment, and lubrication and clearance of microorganisms. Glycans also play highly specific roles in myriad fundamental protein functions such as co-regulation of proprotein convertase processing and ectodomain shedding, modulation of receptor activation and interactions, and modulation of peptide hormone stability and their ligand binding propensities. Inside the cell, in the endoplasmic reticulum and Golgi, glycans serve non-specific roles to ensure the correct folding of proteins and sorting of proteins e.g. to lysosomal compartments. Glycosylation also takes place in the nucleus and cytoplasm, and through cross-talk with phosphorylation, co-regulates most cell signalling, including playing roles in regulating the cell cycle. Thus, most cellular proteins undergo one or more types of glycosylation and there is great potential for the discovery of specific roles of glycosylation in defined cellular contexts.

The biosynthesis of glycans is a complex, non-template driven process that involves the orchestration of expression of over 700 genes, including glycosyltransferases, glycosylhydrolases, nucleotide sugar transporters and other enzymes. Of the 700 genes, the builders - approximately 250 glycosyltransferases and sulfotransferases (glycogenes) - are arguably the most important, because they directly catalyse the synthesis and modification of glycans in a stepwise manner. A naïve reading of these glycogenes results in a prediction of millions of potential glycans.

The biosynthesis of glycans takes place in a single cell, and the results of the glycosylation process for that cell (the glycome) is tailored to suit the functional needs of that cell. Dysregulation of the glycosylation process results in aberrant glycosylation, and impaired cellular functions where these functions are dependent on glycosylation. In order to understand the myriad functions of glycans, we need to understand not only how they are regulated from cell to cell, but also how this process can be dysregulated. Direct analytics upon glycans is difficult due to both the heterogeneity of glycans, but also technological challenges.

In the Joshi group, we use computational and data science approaches to uncover patterns of regulation within the glycosylation process, taking advantage of large amounts of publicly available transcriptomic, proteomic and glycomic data. For example, by mining transcriptomic data we can bring order to this large family of genes so that we can predict what the activity of these genes are, what their patterns of regulation are, and predict the potential impact of their dysregulation upon health and disease.

Glycocalyx of a bacteria

The Bacterium Bacillus subtilis taken with a Tecnai T-12 TEM. Taken by Allon Weiner, Creative Commons

SnapShot: O-Glycosylation Pathways across Kingdoms

A snapshot of the O-glycosylation pathways as conserved across the kingdoms of life. Publication

Current project areas

Taking Glycomics to the single cell level

We have recently performed a first analysis of how the capacity to perform glycosylation varies between cell types derived from many organs in human. Our analysis revealed the overall patterns of regulation for glycosyltransferases, and enabled prediction of glycosylation capacity for individual cell types. The tools we have developed enable translation of any single cell data into a prediction of the overall glycosylation capacity of a cell.

Data science to uncover patterns of regulation of glycosylation

Based upon our success in analysis of glycosylation capacities at the single cell level, we are investigating how to reveal even more detail about how glycosylation is regulated, and what the hallmarks of glycosylation dysregulation are in different diseases. Prediciton of protein glycosylation

At the Copenhagen Center for Glycomics, we have the largest repository of glycoproteomic data covering O-linked glycosylation in-house, which gives us a unique opportunity to mine this information to build useful prediction models. We first used this kind of data to build the NetOGlyc4.0 tool, and recently we have been using language models to develop the next generation of predictors for protein glycosylation.

The group's research

About the In silico Glycomics research group

Research interests

In silico glycomics

Glycocalyx of a bacteria

SnapShot: O-Glycosylation Pathways across Kingdoms

Current project areas

Taking Glycomics to the single cell level

Data science to uncover patterns of regulation of glycosylation