Supervisor: Dr Nicolás Hernández
Project description:
In recent years there has been a deluge of population level data arising from far ranging fields. Near-infrared (NIR) spectra samples consist of numerous overlapping absorption bands, each corresponding to different vibrational modes of the molecular components. These vibrations are highly sensitive to the physical and chemical properties of the compounds involved. As a result, spectroscopic data exhibit a strongly correlated structure due to the complex nature of spectral absorption bands, with underlying information changing smoothly across wavelengths. This characteristic distinguishes spectral data from typical high-dimensional statistical data.
These large collections of complex data, popularly denominated as Big–Data, must be represented necessarily in a coordinate spaces of high dimensions and require sophisticated analytical techniques to be transformed in valuable information. This type of data can be embedded in what is called nowadays: Functional Data. In this context, one promising research avenue is the development of novel methodologies to make robust inference for this type of data. This involves creating robust statistical tools that can handle the inherent complexity and variability of functional data.
Traditionally, spectral data analysis has mainly relied on multivariate techniques such as partial least squares (PLS) regression, given its capability of dealing with high-dimensional and correlated datasets effectively. However, this method treats spectra as a series of discrete variables rather than as a continuous function. From a physical standpoint, it is more meaningful to view the spectrum as a smooth function, composed of absorption peaks that reflect the various chemical constituents in the sample, where the absorbance at nearby wavelengths is strongly correlated. In this sense Functional Partial Least Squares (FPLS) regression models are an extension of PLS regression designed for handling functional data. In FPLS, the predictor and/or response variables are not scalar or vector-valued but functions (e.g., curves, surfaces). The key idea behind FPLS is to generalize the PLS approach to a functional setting, allowing the extraction of relevant components from high-dimensional functional data.
The main objectives of this projects are:
Food Quality Assessment. In food quality control, NIR (near-infrared) spectroscopy is used to measure properties of products like wine or olive oil. The spectrometric data could be classified into quality ordinal categories such as, Premium (High Quality), Standard (Medium Quality) and Below Standard (and Low Quality). However, practitioners often misinterpret this data, either treating it as quantitative by assigning integer values to the categories or ignoring the order altogether and treating it as nominal data. Given the importance of FPLS in the Analysis of Spectroscopic Data and the relevance of ordinal data in the field, is key to develop an appropriate model for this type of setting.
This is a methodological and applied project in one of the current hot research topics in statistics, with a strong computational focus and the potential for significant impact in the Food, Pharmaceutical, and Textile industries.
Further information: How to apply Entry requirements Fees and funding