Upcomig: Data Management Workshop February 25 - 26 in Jena
Here you will find the programm, there will be the following keynotes:
Rasmus Bro
How PARAFAC and PARAFAC2 have changed the analysis of fluorescence and chromatographic data
Multiway data are central in modern analytical chemistry, yet they are often “flattened” into matrices, losing structure and chemical interpretability. This talk highlights how PARAFAC and PARAFAC2 address two core application domains; each with its own data challenges and payoffs.
First, I will show how PARAFAC has transformed the analysis of fluorescence excitation–emission matrices (EEMs) by decomposing EEM datasets into chemically meaningful components that can be interpreted as underlying fluorophore “fingerprints” and sample-wise contributions.
Second, I will demonstrate why PARAFAC2 is often the enabling model for GC–MS data, where retention-time shifts and peak shape variability violate standard trilinear assumptions. Using real GC–MS examples, I will illustrate how PARAFAC2 handles time misalignment while still producing interpretable component profiles and improved quantitative/qualitative insight compared with unfolding-based workflows.
Birgit Götzinger
Design of Experiments (DOE) is a potent tool for process optimization in natural sciences and engineering. However, teaching statistics and advanced methods for data analysis to chemistry students can be rather frustrating for both sides. In this talk, an approach for implementing practical use cases and “hands-on” data analysis from the beginning and throughout a course on statistics and DOE is presented. In addition, a “quick and dirty” application of DOE in an elective module on product design at bachelor level will be shown. Finally, current feedback and reception by the course participants as well as lessons-learned will be discussed.
Natalie Gerhardt
From Spectra to Decisions: Practical Validation of Classification Models in Chemometrics
Spectroscopic data offer powerful opportunities for classification, but turning them into reliable decisions remains a challenge. This talk takes an application-oriented look at how classification models can be developed and properly validated for real-world discrimination tasks in chemometrics. Starting with exploratory data analysis, it illustrates how methods such as PCA, HCA, and k-means help uncover structure, assess clusters, and identify outliers. Building on this, supervised approaches including LDA, logistic regression, and SVM are introduced, with an emphasis on robust performance assessment, regularization, and the risk of overfitting.
Particular attention is given to cross-validation as a key tool for reliable model evaluation, including the importance of proper validation strategies and avoiding data leakage, especially in the context of high-dimensional, noisy spectroscopic data. A clear and practical workflow from exploration to validation is presented, highlighting common pitfalls and strategies for developing trustworthy classification models.
Thomas Rose
Insights from archaeometry on the creation and publication of high quality and reusable data
Many archaeometric methods rely on the comparison of artefact data with reference data, which represent distinct material signatures, geographic regions, or raw material sources. A successful correlation of artefact data with reference data therefore allows the reconstruction of e.g., past exchange relations and economic activities, extending our knowledge about the past.
Because archaeometry is highly interdisciplinary, quality and reusability of data as reference data do not only require a high analytical quality but also a high quality or “richness” of their metadata, i.e., contextual information about the archaeological background and other material-specific information. Moreover, the highly variable academic background of researchers often results in data with high quality in many but not all aspects, limiting their reusability outside the original research context.
Unfortunately, guidance for researchers to turn analytically high-quality data into rich and easily reusable data was sparse. Only recently, numerous representatives of the scientific community and the TerraLID team developed such guidelines for lead isotope data in archaeology. This metadata profile aims to ensure the best possible reusability of data according to the FAIR data principles and its design facilitates an extension to other data types. Consequently, the metadata profile is also the first minimal record recommendation for lead isotope data. The TerraLID metadata profile further serves as the foundation for a research data infrastructure that provides open access to lead isotope data. To increase coverage of the database, the TerraLID team and colleagues compile published data, structure them according to the metadata profile and enriches them with information from other sources.
The experiences made during data compilation and the development of the TerraLID metadata profile serves as this contribution’s starting point to explore challenges in the creation and handling of archaeometric (reference) data as particularly complex data. Strategies for overcoming these challenges as well as their limitations in handling legacy data are presented. They are summarized into recommendations that can be easily implemented into the research workflow to ensure the production of high quality and reusable data at all stages of a project.