Data analytics and machine learning for omics

Multidimensional attractor states in biology

An attractor state is a set of variables/mechanisms that converge into a stable system behaviour despite wide variety of initial conditions. In biology, these states present as cell fate decisions during differentiation or state transitions, e.g. aerobic to anaerobic. The mechanisms guiding cells to these attractor states, however, is not well understood. In studying neutrophil differentiation, we identified the existence of specific genome elements named "genome vehicle", which are responsible for the neutrophil attractor. Of which, genes with low or moderate expression changes, that are often considered noisy and insignificant are essential components of these genome vehicles. In studying biofilm, transcriptome-wide data analysis of a Saccharomyces cerevisiae biofilm showed that low- to middle-expressed genes are key to converge to stable biofilm states. A basic non-linear Kuramoto model further corroborates that a biological network connectivity structure with low coupling strength, or expression levels, is sufficient for synchronised behaviour. Using multi-dimensional statistical analyses, 100 sharply changing genes were identified as attractor genes guiding Escherichia coli anaerobic to aerobic state transition.

Statistical laws and biological noise in omics

The large nature of omics datasets present the issue of making sense of the noise present in them. Notably, while single cells display highly variable expressions between cells, cell populations present deterministic global patterns. Statistical analyses show that increasing cell ensemble size reduces scatter in transcriptome-wide expressions and noise values, with corresponding increases in Pearson and Spearman correlations. Removing lowly expressed transcripts to account for technical variability, we demonstrate that transcriptome-wide variability reduces, approximating the law of large numbers. Additionally, the entire gene expressions of cell populations and only the highly expressed portion of single cells are Gaussian distributed, following the central limit theorem. Molecular heterogeneity has also been shown to be crucial for cell fate diversification. To understand the effect of molecular variability across embryonic developmental stages, we evaluated Pearson correlation, Shannon entropy and noise patterns on RNA-Seq datasets from oocytes to blastocysts. Our investigations reveal a phase transition from low to saturating levels of diversity and variability of transcriptome-wide expressions through the development stages. Simulating the global gene expression pattern for each development stage using a stochastic transcriptional model, we demonstrate that transcriptome-wide regulation initially begins from 2-cell stage and becomes strikingly variable from 8-cell stage due to amplification and quantal transcriptional activity.

Secondary metabolism in Malassezia and its role in skin-microbe interaction

Malassezia is a lipid dependent fungal found on the human skin accounting for its niche adaption in the oily region of the skin. Its role as a commensal or pathogen is debatable – however, 50% of the dandruff sufferers’ conditions improved when antifungal was applied to the scalp, hence, there is an association between dandruff and the Malassezia presence on the skin. Little is known about the interactions between mycobiome and skin, which may have a role in skin homeostasis. The aim of the project is to identify genes involved in secondary lipid metabolism of Malassezia and how they affect skin responses. RNA-seq, LC-MS, and cytokine response from co-cultures will be used for investigation.

Investigating the role of overflow metabolism in biofilm communities and pathogenic potential of Vibrio vulnificus

We observed a unique form of metabolism in a strain of Vibrio vulnificus, and are interested better understanding the mechanism and control of this phenomenon. We would also like to investigate the potential effects of this metabolism on the bacterial community, and determine if it is transferrable between other organisms.

Dynamic modelling of biological network

Systems biology to regulate cancer resistance

TRAIL and its death receptors have previously been identified as a potential cancer therapeutic target for their ability to selectively trigger apoptotic cell death in cancer cells. However, several malignant cancer types such as fibrosarcoma (HT1080) remain non-sensitive to TRAIL. To increase their TRAIL sensitivity, a dynamic model based on the perturbation-response approach was developed. The model simulations suggest that TRAIL treatment along with PKC inhibition by bisindolylmaleimide (BIS) I is able to induce 95% cell death, which is subsequently confirm experimentally. To understand the effect of the remaining 5% of cells, a discrete spatiotemporal cellular automata model utilizing simple rules modified from Conway’s game of life was developed. This model simulates cell proliferation in time and space, demonstrating that when all cells are fixed in their initial space, the proliferation is rapid for high and moderate cell numbers, however, slow and steady for low number of cells. When mesenchymal-like random movement was introduced, the proliferation becomes significant even for low cell numbers. Experimental verification showed high proportion of mesenchymal cells in TRAIL and BIS I treatment compared with untreated or TRAIL-only treatment. In agreement with the model with cell movement, we observed rapid proliferation of the remnant cells in TRAIL and BIS I treatment over time. Nevertheless, re-treatment of TRAIL and BIS I on proliferating cancers is still largely effective.

Systems biology to control proinflammatory response

Proinflammatory response is involved in many diseases such as rheumatoid arthritis and some cancers, and is activated by many signalling pathways. Using perturbation-response approach coupled with laws of mass conservation, we can model these signalling pathways to gain further understanding on how to modulate proinflammation for treatment purposes. For instance, from modelling TNF signalling, RIP1 inhibitor Necrostatin-1 is identified to be a suitable therapeutic target to suppress but not abolish proinflammatory signalling. Modelling TLR4 signalling, signalling flux redistribution is observed which explains why removal and addition of a signalling molecule at one pathway branch enhances and impairs, respectively, signalling at another branch. A TLR4 signalling model also uncovers temporal signalling features that hint at novel intermediates and expression of TNF-α previously unknown in MyD88 KOs. This modelling approach also enables discovery of missing intermediary steps between extracellular poly (I∶C) stimulation and intracellular TLR3 binding, and the presence of a pathway which is essential for JNK and p38, but not NF-κB, activation.

Systems biology for biofilm

Biofilm formation, a self-organising cooperative behaviour of microorganisms, can be modelled by a discrete spatiotemporal cellular automata model based on simple physical rules,  similar to Conway’s game of life. The time evolution model are experimentally verified for Pseudomonas aeruginosa for both control and antibiotic azithromycin (AZM)-treated condition. Our model  suggests that AZM regulates single cell motility, thereby resulting in delayed, but not abolished, biofilm formation. In addition, the model highlights the importance of reproduction by cell to cell interaction is key for biofilm formation. This thus shows that biologically complex and non-linear behaviours may be interpreted using rules taken from theoretical disciplines.

Systems biology for pathway optimisation

Microorganisms can be used as bio-factories to produce target compounds such as chemicals, food components, and pharmaceuticals instead of using traditional approaches related to synthetic chemistry or natural product extraction. In this project, genetically modified bacteria is cultured to produce a target product. Dynamic metabolomics analysis is carried out, whereby target metabolites are quantified at various time-points using analytical instruments. A metabolic network is created using the COPASI software and the experimental data collected is modelled kinetically using ordinary differential equations (ODEs) based on enzyme kinetics to comprehend metabolic pathway functionality during bacterial growth. Modelling of the data may illustrate metabolic bottlenecks prior to the formation of the target compound, which will provide strategies to improve the yield of the target metabolites.

AI method for biological networks

Development of biochemical laws-informed AI tool for efficient, and yet explainable optimization of metabolite production

The production of valuable/novel molecules via the customization of biological pathways has been enabled unprecedentedly by DNA manipulation technologies. However, it remains challenging to evaluate the combinatorial effect of modifying both enzyme and substrate concentrations on the productivity and yield. Systems biology approach requires the reconstruction of a detailed mechanistic model, which may not be fully available, and is also inefficient and time consuming, limiting its industrial applicability. Whereas it is not easy to understand black-box AI predictions in terms of mechanism and therefore trust them. To complement each other, we are working on the development of a biochemical laws-informed AI tool for the efficient, and yet explainable optimization of pathways, while allowing for the exploration of reactions mechanism. 

State-transition modelling

State-transition modeling is a systemic approach where the transformation from one stable state to another is viewed as a change in state of the entire system as opposed to a change resulting from one or a selection of few key molecules in living cells, such as oncogenes. This is because a state-transition is expected to occur by the interplay of the entire or large portion of transcriptome: the expressions of multitudes of RNAs are expected contribute to the transition, not only the change in expression of a single or few RNAs. Although, a single mutation or molecule may be sufficient to cause a perturbation that induces a state-transition, in the holistic view, the perturbation causes an alteration to the underlying gene regulatory network that results in the entire transcriptional state-transition. Here, we will develop non-linear models to understand state-transition of living cells, such as in development or studying the origins of cancer.