Poster Presentation 29th Annual Lorne Proteomics Symposium 2024

Machine Learning Assisted Bioactive Peptide Function Prediction in Food Crops (#162)

Kenneth Ugochukwu Agbo 1 2 , Syed Afaq Ali Shah 3 , Utpal Bose 2 , Michelle Colgrave 1 2 , Angela Juhasz 1
  1. Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, School of Science, Edith Cowan University, Joondalup, Perth, Western Australia, Australia
  2. CSIRO Agriculture and Food, 306 Carmody Rd, St Lucia, QLD 4067, Queensland, Australia
  3. Centre for AI and Machine Learning, School of Science, Edith Cowan University, 270 Joondalup Dr, Joondalup, Perth, Western Australia, Australia

Bioactive peptides are short amino acid polymers that play various biological roles in plants, animals, and microbes. Some of these molecules have been identified as therapeutic agents with potential benefits in treating human diseases, such as cancer, cardiovascular diseases, and infections, while they also function as important components in plant growth and stress responses. The emergence of LC-MS/MS-based discovery proteomics analysis has led to massive data generation over the years. Machine Learning which falls under the umbrella of Artificial Intelligence has been explored as a tool for data mining to predict the functions of various bioactive peptides, relying on the historical proteomics and peptidomics data. Prediction of bioactive peptide functions relies on the computed Quantitative Structure-Activity Relationship models that use physicochemical properties, amino acid compositions, transitions, and secondary structure of such peptides to reveal relationships between chemical structures and biological activities. Various machine learning models such as support vector machines, K-nearest neighbour, logistic regression, extreme gradient boosting, decision tree and rain forest have been explored with the aim of accurate prediction of the functions of various bioactive peptides. However, maximum prediction accuracy is yet to be achieved as the existing models face overfitting or bias. Some of the major challenges facing this high throughput approach are identifying the validated negative dataset and feature selection strategies. This present work is designed to explore an alternative method of both feature selection and source of negative data sets to improve the prediction efficiency of these models. The existence of various online bioactive peptide databases will serve as the source of model datasets. Python peptides package will be used to obtain various features of peptides and compute their values. To reduce the overfitting of the models, feature selection will be performed on the dataset. Various classifiers will be built and evaluated for performance accuracy. The best-performing models will be selected and used to build an ensemble classifier for each function. The functions of the bioactive peptides from each of the selected plant samples will be predicted with the ensemble classifiers and experimentally validated via in vitro analysis. The results of this project will contribute to reducing the time consumed and the costs incurred in an in vitro functional analysis of the large number of peptides generated from the proteomics experiments. The long-term impact will improve therapeutic or nutritive bioactive peptide design.