Metabolic footprint analysis of samples from an enzyme recovery process: A combined workflow from sampling to compound identification

Research output: Book/ReportPh.D. thesisResearch

  • Sofie Bruun Knudsen
In modern biotechnology production, many diverse measurements, with a broad diversity in information content and quality, are conducted. In bulk liquid enzyme production, simple parameters are measured, e.g. pH and conductivity, but these measurements cannot detect changes at the molecular level in the process stream. An in-depth investigation of constituents in the molecular level in the process stream during the production may reveal new knowledge to understand the optimisation parameters better and gain new insight into the batches and variation of products. This knowledge could be converted into process control adjustments for increased quality assurance and greater process efficiency. This thesis was conducted to examine the chemical composition of samples from an enzyme recovery stage to solve some of the challenges encountered during this process. A workflow for footprint analysis of metabolites in samples from the recovery stream was developed. The steps in the workflow included; an initial sampling strategy; a sample preparation strategy with minimal discrimination of compounds; an analytical method capable of separating the compounds in solution; a pixel-based multivariate statistical analysis to determine the significant differences within the chemical footprint of the samples; a feature detection strategy to identify unknown compounds in solution. The steps are described throughout the thesis by using different sample sets, with the common trait that they all originated from a Bacillus fermentation. With the importance of getting representative samples from a running enzyme recovery batch, an autosampler was attached onto the recovery stream at a Novozymes factory site. Different types of sample pre-treatments were studied, and protein precipitation was decided to be the best choice out of the ones examined, due to the non-specificity and the possibility to use the precipitated pellet for a proteomic study (not conducted). The collected samples were pre-treated with different protein precipitation agents and compared using a multicriteria approach (Paper I). The best protein precipitation agent was acetonitrile due to, practical considerations, highest protein content in the precipitated pellet, fewer created compounds than in the other samples, coherence between high-quality extracted ion chromatograms and robustness. Liquid chromatography – high-resolution mass spectrometry (LC-HRMS) was selected for analysis to cover the non-volatile fraction of the samples. As prior compound information was sparse, a non-target analysis approach was selected to acquire the chemical profile of the samples, i.e. the footprint of the samples. The amount of data generated using a non-target analysis approach necessitates methodologies to analyse and interpret the large data volume. Two different data processing workflows were developed in the thesis. The first workflow was a novel data analysis strategy for non-target analysis of LC-HRMS data (Manuscript I). The workflow used the CHEMometric analysis of Selected Ion Chromatograms (CHEMSIC) method with Durbin-Watson – COmponent Detection Algorithm (DW-CODA) of selected extracted ion chromatograms combined into a metabolite profile for each sample. The workflow was tested and validated by using samples from a 20-hour batch. It was shown that no steady-state occurred within the 20-hours and that samples were separated based on the sampling time. The work highlighted that careful consideration of the data pre-processing steps, i.e. noise removal, retention time, alignment, normalisation and scaling are necessary to reach meaningful conclusions reflecting the chemical differences of the samples and not the non-sample related variation. In the second data processing workflow, a feature detection and identification approach was established (Manuscript II). A non-target analysis by LC-HRMSE to identify unknown compounds in the samples was conducted and feature detection applied. A prioritised list of features was obtained and the identification of these were based on a spectral match from a spectral library. When a spectral match was not available, the in silico fragmenter MetFrag was used to annotate molecular candidates based compounds from several open-source databases. This resulted in an overall annotation of > 95 % of the features for both positive and negative ionisation mode. The annotations were supported by an identification confidence level and chemical classification of the annotated features. This showed that the main class in positive ionisation mode was amino acids and derivatives, and for negative ionisation mode, the main class was organic heterocyclic compounds. The pixel-based analysis approach and the feature detection approach was combined to find the significant differences in samples from different Novozymes sites (6.8.2). The samples were analysed with the workflow proposed in Manuscript I, and the peaks were annotated based on the identification library created in Manuscript II. The most significant peaks that separated the samples were found to be several peptides, but also other small metabolites were found to be responsible for the difference. Based on the research conducted in this PhD study, it can be concluded that comprehensive chemical analysis can provide valuable information on samples from the recovery process. The findings presented in this thesis have enabled the possibility of obtaining a better process understanding and hopefully help control the recovery process in the future in Novozymes.
Original languageEnglish
PublisherDepartment of Plant and Environmental Sciences, Faculty of Science, University of Copenhagen
Number of pages232
Publication statusPublished - 2021

ID: 273011327