Journal of Chemometrics

Current research reports and chronological list of recent articles.


The international scientific Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics.

The publisher is Wiley. The copyright and publishing rights of specialized products listed below are in this publishing house. This is also responsible for the content shown.

To search this web page for specific words type "Ctrl" + "F" on your keyboard (Command + "F" on a Mac). Then: type the word you are searching for in the window that pops up!

Additional research articles see Current Chemistry Research Articles. See also: information resources on chemometrics.



Journal of Chemometrics - Abstracts



Closure constraint in multivariate curve resolution

Multivariate curve resolution techniques try to estimate physically and/or chemically meaningful profiles underlying a set of chemical or related measurements. However, the estimation of profiles is not generally unique and it is often complicated by intensity and rotational ambiguities. Constraints as further information of chemical entities can be imposed to reduce the extent of ambiguities. Not only a long list of constraints has been introduced but also some of them can be applied in different ways. Either investigating constraint effects on the extent of rotational ambiguity or how they can be applied during curve resolution can shed light on curve resolution studies. The motivation behind this contribution is to pave the way to a clarification about the closure constraint. Considering simulated equilibrium and kinetic spectrophotometric data sets, different approaches to closure implementation were applied to demonstrate the geometrical explanation of closure constraint and its effect on multivariate curve resolution-alternating least squares results. Besides, the closure constraint is compared with normalization and it is proved that the closure constraint is a Borgen norm and has the same effect as other Borgen norms in multivariate curve resolution. Finally, to further examine the closure constraint, a real data set was investigated.
Datum: 23.11.2017


Diagnostics of sintering processes on the basis of PCA and two-level neural network model

The application of chemometrics methods to continuous monitoring and diagnostics of sintering process faults for improving iron-ore sinter quality is considered in the article. The sintering process is among complex multivariate processes. A number of agglomeration process faults have often similar symptoms, resulting in late fault detection by an operator and as a consequence, wrong process control decisions. To support the efficient operative decision making, it is proposed to use the process fault monitoring and diagnostics system. The proposed system uses a two-level neural network (NN) diagnostic model. The high-level neural network is used to localize the process faults whereas their reasons are determined by the low-level neural networks. To reduce essentially the time of HL-NN training and retraining, the task dimension is preliminarily reduced with the principal component analysis method so that the scores obtained from initial data are fed into high-level neural network inputs. The use of principal component analysis allowed detection of sintering process faults with T2 and Q statistics. Only upon detecting the fault, a NN diagnostic model starts working to determine the fault reason. The system algorithm provides for special measures to prevent the NN from possible “loss” of the identified fault due to operator's inactivity. To increase the diagnosis depth for controlling fault symptoms that are evident on the sinter cake surface, optical digital cameras are installed and images from them are processed with proposed algorithms on the basis of fuzzy clusterization to take into account uncertainties in the initial information.
Datum: 20.11.2017


Issue Information

No abstract is available for this article.
Datum: 17.11.2017


Multimodal image analysis in tissue diagnostics for skin melanoma

Early diagnosis is a corner stone for a successful treatment of most diseases including melanoma, which cannot be achieved by traditional histopathological inspection. In this respect, multimodal imaging, the combination of TPEF and SHG, features a high diagnostic potential as an alternative approach. Multimodal imaging generates molecular contrast, but to use this technique in clinical practice, the optical signals must be translated into diagnostic relevant information. This translation requires automatic image analysis techniques. Within this contribution, we established an analysis pipeline for multimodal images to achieve melanoma diagnostics of skin tissue. The first step of the image analysis was the pre-treatment, where the mosaicking artifacts were corrected and a standardization was performed. Afterwards, the local histogram-based first-order texture features and the local gray-level co-occurrence matrix (GLCM) texture features were extracted in multiple scales. Thereafter, we constructed a local hierarchical statistical model to distinguish melanoma, normal epithelium, and other tissue types. The results demonstrated the capability of multimodal imaging combined with image analysis to differentiate different tissue types. Furthermore, we compared the histogram and the GLCM-based texture feature sets according to the Fisher's discriminant ratio (FDR) and the prediction of the classification, which demonstrated that the histogram-based texture features are superior to the GLCM features for the given task. Finally, we performed a global classification to achieve a patient diagnostics with the clinical diagnosis as ground truth. The agreement of the prediction and the clinical results demonstrated the great potential of multimodal imaging for melanoma diagnostics.
Datum: 16.11.2017


Baseline and interferent correction by the Tikhonov regularization framework for linear least squares modeling

Spectroscopic data are usually perturbed by noise from various sources that should be removed prior to model calibration. After conducting a preprocessing step to eliminate unwanted multiplicative effects (effects that scale the pure signal in a multiplicative manner), we discuss how to correct a model for unwanted additive effects in the spectra. Our approach is described within the Tikhonov regularization (TR) framework for linear regression model building, and our focus is on ignoring the influence of noninformative polynomial trends. This is obtained by including an additional criterion in the TR problem penalizing the resulting regression coefficients away from a selected set of possibly disturbing directions in the sample space. The presented method builds on the extended multiplicative signal correction, and we compare the two approaches on several real data sets showing that the suggested TR-based method may improve the predictive power of the resulting model. We discuss the possibilities of imposing smoothness in the calculation of regression coefficients as well as imposing selection of wavelength regions within the TR framework. To implement TR efficiently in the model building, we use an algorithm that is heavily based on the singular value decomposition. Because of some favorable properties of the singular value decomposition, it is possible to explore the models (including their generalized cross-validation error estimates) associated with a large number of regularization parameter values at low computational cost.
Datum: 14.11.2017


The hybrid of semisupervised manifold learning and spectrum kernel for classification

Manifold learning classification, as an advanced semisupervised learning algorithm in recent years, has gained great popularity in a variety of fields. Moreover, kernel methods are a group of algorithms for pattern analysis, the task of which is to find and study general types of relations in datasets. Thus, under the framework of kernel methods, manifold learning classifier has been introduced and explored to directly detect the intrinsic similarity by local and global information hidden in datasets. Two validation approaches were used to evaluate the performance of our models. Experiments indicate that the proposed model can be considered as an effective and alternative modeling algorithm, and it could be further applied to the areas of biochemical science, environmental analysis, clinical, etc.
Datum: 10.11.2017


Application of image moments in MIA-QSAR

Owing to the presence of some significant chemical information, the conventional images of molecular structures have been used in the studies of quantitative structure-activity relationship by multivariate image analysis (MIA-QSAR). In this contribution, we suggest that the Tchebichef moments (TMs) calculated directly from the grayscale images of molecular structures are used as molecular descriptors to build linear QSAR models by stepwise regression. The proposed approach was applied to QSAR research on a series of HIV-1 non-nucleoside reverse transcriptase inhibitors, and satisfactory results were obtained. Compared with several published methods, the results indicate that the TM method possesses higher accuracy and reliability. The TMs effectively decompose the image information of molecular structures at different levels without any pretreatment owing its very favorable multiresolution, holographic, and inherent invariance properties. Our study successfully extends the application of image moments to MIA-QSAR research.
Datum: 10.11.2017


Impact of time and temperature of storage on the spoilage of swordfish and the evolution of biogenic amines through a multiway model

A new multiway/multivariate approach is proposed to study and model the spoilage of swordfish with time and temperature of storage through the profiles of putrescine, spermidine, histamine, tyramine, tryptamine, cadaverine, spermine, and 2-phenylethylamine. The evolution of these biogenic amines in food is a complex process that cannot be characterized by a single parameter but by a modification of the amine profiles. An experimental strategy is designed to determine these profiles in such a way that data are structurally 3-way. Modeling the joint evolution of the biogenic amines with a PARAFAC model which explains 97.8% of variability (CORCONDIA index equals 100%) leads to estimate the storage time, storage temperature, and biogenic amines profiles. A multiple regression (determination coefficient of 0.98) based on the loadings of the 2 factors of the time profile of the PARAFAC model enables the estimation of the storage time with an error of 0.5 days.
Datum: 10.11.2017


Structure-based statistical modeling and analysis of peptide affinity and cross-reactivity to human senile osteoporosis OSF SH3 domain

Human osteoclast-stimulating factor (OSF) induces osteoclast formation and bone resorption in senile osteoporosis by recruiting multiple signaling complexes with cognate interacting partners through its N-terminal Src homology 3 (SH3) peptide-recognition domain. The domain can recognize and bind to the polyproline regions of its partner proteins, rendering a broad ligand specificity and cross-reactivity. Here, the structural basis and physicochemical property of peptide affinity and cross-reactivity to OSF SH3 domain were investigated systematically by using an integration of statistical analysis and molecular modeling. A structure-based quantitative structure-activity relationship method called cross-nonbonded interaction characterization and statistical regression was used to characterize the intermolecular interactions involved in computationally modeled domain-peptide complex structures and then to correlate the interactions with affinity for a panel of collected SH3-binding peptide samples. Both the structural stability and generalization ability of obtained quantitative structure-activity relationship regression models were examined rigorously via internal cross-validation and external test, confirming that the models can properly describe even single-residue mutations at domain-peptide complex interface and give a reasonable extrapolation for the mutation effect on peptide affinity. Subsequently, the best model was used to investigate the promiscuity and cross-reactivity of OSF SH3 domain binding to its various peptide ligands. It is found that few key residues in peptide ligands are primarily responsible for the domain affinity and selectivity, while most other residues only play a minor role in domain-peptide binding affinity and stability. The peptide residues can be classified into 3 groups in terms of their contribution to ligand selectivity: key, assistant, and marginal residues. Considering that the key residues are very few so that many domain interacting partners share a similar binding profile, additional factors such as in vivo environments and biological contexts would also contribute to the specificity and cross-reactivity of OSF SH3 domain.
Datum: 09.11.2017


Quantitative structure-property relationship modeling of small organic molecules for solar cells applications

Despite the need of a reliable technology for solar energy harvesting, research on new materials for third generation photovoltaics is slowed down by the diffuse use of trial and error rather than rational material design approaches. The proposed study investigates the use of alternative strategies to material discovery inspired by drug design and molecular modeling. In particular, training set and test set (for validation purposes) comprising well-known small molecule-bulk heterojunction organic photovoltaics were built. Molecules were characterized by semiempirical calculated and 3D molecular interaction fields–based descriptors. Then partial least squares algorithm was applied to rationalize structure-photovoltaic activity relationships, and coefficients were investigated to clarify contributions played by the different molecular properties to the final performance. In addition, a photovoltaic desirability function (PhotD) was also proposed as alternative and versatile novel tool for ranking potential candidates. The partial least squares model and PhotD function were both internally and externally validated demonstrating their ability in estimating new candidates performances. The proposed approach demonstrates that, in the context of computational materials science, chemometrics and molecular modeling tools could effectively boost the discovery of novel promising candidates for photovoltaic application.
Datum: 09.11.2017


Automated data mining of secondary ion mass spectrometry spectra

Time of flight secondary ion mass spectrometry (ToF-SIMS) allows the reliable analytical determination of organic and polymeric materials. Since a typical raw data may contain thousands of peaks, the amount of information to deal with is accordingly large, so that data reduction techniques become indispensable for extracting the most significant information from the given dataset. Here, the use of the wavelet-principal component analysis–based signal processing of giant raw data acquired during ToF-SIMS experiments is presented. The proposed procedure provides a straightforwardly “manageable” dataset without any binning procedure neither detailed integration. By studying the principal component analysis results, detailed and reliable information about the chemical composition of polymeric samples have been gathered.
Datum: 09.11.2017


Blessing of randomness against the curse of dimensionality

Modern hyperspectral images, especially acquired in remote sensing and from on-field measurements, can easily contain from hundreds of thousands to several millions of pixels. This often leads to a quite long computational time when, eg, the images are decomposed by Principal Component Analysis (PCA) or similar algorithms. In this paper, we are going to show how randomization can tackle this problem. The main idea is described in detail by Halko et al in 2011 and can be used for speeding up most of the low-rank matrix decomposition methods. The paper explains this approach using visual interpretation of its main steps and shows how the use of randomness influences the speed and accuracy of PCA decomposition of hyperspectral images.
Datum: 09.11.2017


Hybrid central composite design for simultaneous optimization of removal of methylene blue and alizarin red S from aqueous solutions using Vitis tree leaves

Vitis tree leaves powder was used for efficient removal of dyes (eg, alizarin red and methylene blue) from water samples in binary batch systems. The influence of various parameters such as initial pH, initial dye concentration, and sorbent mass on the biosorption process was investigated. Statistical experimental design was utilized to optimize this biosorption process. A regression model was derived using a response surface methodology through performing the 416B model of hybrid central composite design. Model adequacy was checked by means of tests such as analysis of variance, a lack of fit test, and residual distribution consideration. The proposed quadratic model resulted from the hybrid design approach fitted very well to the experimental data. The optimal conditions for dye biosorption were as follows: pH = 3.0, sorbent mass = 0.05 g, initial alizarin red concentration (CAR) = 999.6 mg L−1 and initial methylene blue concentration (CMB) = 878.5 mg L−1. Evaluation of biosorption data with Langmuir and Freundlich isotherms shows that the Langmuir model indicated the best fit to the equilibrium data with maximum adsorption capacity of 66.4 and 53.5 mg g−1 in single system and 54.6 and 43.9 mg g−1 in binary system for AR and MB, respectively. Moreover, kinetics of the biosorption process was also investigated.
Datum: 23.10.2017


Nonlinear classification of commercial Mexican tequilas

Discriminant partial least squares (PLS-DA)—a de facto standard classification method—was found to behave poorly when 3 classes of tequilas were modeled to study a collection of 170 commercial Mexican spirits measured by UV-Vis spectroscopy. This result was compared with other linear and nonlinear supervised classification methods (PLS with variable selection by SRI index and genetic algorithms; kernel-PLS—modified in this paper to handle simultaneously several classes, quadratic discriminant analysis (QDA), support vectors machines, and counter-propagation artificial neural networks). All linear models performed worse than nonlinear ones, and this was attributed to the quite different inner dispersion of the classes and the intermediate position of 1 class. Considering the overall classification results and parsimony, QDA was selected for routine assessments thanks to its simplicity and broad availability.
Datum: 17.10.2017


Ensemble calibration for the spectral quantitative analysis of complex samples

Ensemble strategies have gained increasing attention in multivariate calibration for quantitative analysis of complex samples. The aim of ensemble calibration is to obtain a more accurate, stable, and robust prediction by combining the predictions of multiple submodels. The generation and calibration of the training subsets, as well as the integration of the submodels, are three keys to the success of ensemble calibration. Many training subset generating and submodel integrating strategies have been developed to form numerous ensemble calibration methods for improving the performance of the basic calibration method. This contribution focuses on the recent ensemble strategies in relation to calibration, especially the ensemble modeling for quantitative analysis of complex samples. The limitations and perspectives of ensemble strategies are also discussed.
Datum: 17.10.2017


Comparative chemometric analysis for classification of acids and bases via a colorimetric sensor array

With the increasing availability of digital imaging devices, colorimetric sensor arrays are rapidly becoming a simple, yet effective tool for the identification and quantification of various analytes. Colorimetric arrays utilize colorimetric data from many colorimetric sensors, with the multidimensional nature of the resulting data necessitating the use of chemometric analysis. Herein, an 8 sensor colorimetric array was used to analyze select acid and basic samples (0.5 – 10 M) to determine which chemometric methods are best suited for classification quantification of analytes within clusters. PCA, HCA, and LDA were used to visualize the data set. All three methods showed well-separated clusters for each of the acid or base analytes and moderate separation between analyte concentrations, indicating that the sensor array can be used to identify and quantify samples. Furthermore, PCA could be used to determine which sensors showed the most effective analyte identification. LDA, KNN, and HQI were used for identification of analyte and concentration. HQI and KNN could be used to correctly identify the analytes in all cases, while LDA correctly identified 95 of 96 analytes correctly. Additional studies demonstrated that controlling for solvent and image effects was unnecessary for all chemometric methods utilized in this study.
Datum: 13.10.2017


Accurate model based on artificial intelligence for prediction of carbon dioxide solubility in aqueous tetra-n-butylammonium bromide solutions

This study highlights the application of radial basis function (RBF) neural networks, adaptive neuro-fuzzy inference systems (ANFIS), and gene expression programming (GEP) in the estimation of solubility of CO2 in aqueous solutions of tetra-n-butylammonium bromide (TBAB). The experimental data were gathered from a published work in literature. The proposed RBF network was coupled with genetic algorithm (GA) to access a better prediction performance of model. The structure of ANFIS model was trained by using hybrid method. The input parameters of the model were temperature, pressure, mass fraction of TBAB in feed aqueous solution (wTBAB), and mole fraction of TBAB in aqueous phase (xTBAB). The solubility of CO2 (xCO2) was the output parameter. Statistical and graphical analyses of the results showed that the proposed GA-RBF, Hybrid-ANFIS, and GEP models are robust and precise in the estimation of literature solubility data.
Datum: 13.10.2017


Selecting the number of factors in principal component analysis by permutation testing—Numerical and practical aspects

Selecting the correct number of factors in principal component analysis (PCA) is a critical step to achieve a reasonable data modelling, where the optimal strategy strictly depends on the objective PCA is applied for. In the last decades, much work has been devoted to methods like Kaiser's eigenvalue greater than 1 rule, Velicer's minimum average partial rule, Cattell's scree test, Bartlett's chi-square test, Horn's parallel analysis, and cross-validation. However, limited attention has been paid to the possibility of assessing the significance of the calculated components via permutation testing. That may represent a feasible approach in case the focus of the study is discriminating relevant from nonsystematic sources of variation and/or the aforementioned methodologies cannot be resorted to (eg, when the analysed matrices do not fulfill specific properties or statistical assumptions). The main aim of this article is to provide practical insights for an improved understanding of permutation testing, highlighting its pros and cons, mathematically formalising the numerical procedure to be abided by when applying it for PCA factor selection by the description of a novel algorithm developed to this end, and proposing ad hoc solutions for optimising computational time and efficiency.
Datum: 06.10.2017


Simultaneous construction of dual Borgen plots. I: The case of noise-free data

In 1985, Borgen and Kowalski [DOI:10.1016/S0003-2670(00)84361-5] introduced a geometric construction algorithm for the regions of feasible nonnegative factorizations of spectral data matrices for three-component systems. The resulting Borgen plots represent the so-called area of feasible solutions (AFS). The AFS can be computed either for the spectral factor or for the factor of the concentration profiles. In the latter case, the construction algorithm is applied to the transposed spectral data matrix. The AFS is a low-dimensional representation of all possible nonnegative solutions, either of the possible spectra or of the possible concentration profiles. This work presents an improved algorithm for the simultaneous construction of the two dual Borgen plots for the spectra and for the concentration profiles. The new algorithm makes it possible to compute the two Borgen plots roughly at the costs of a single classical Borgen plot. The new algorithm comes without any loss of precision or spatial resolution. The new method is benchmarked against various program codes for the geometric-constructive and for the numerical optimization-based AFS computation.
Datum: 02.10.2017


To correlate and predict the potential and new functions of traditional Chinese medicine formulas based on similarity indices

A typical traditional Chinese medicine (TCM) formula (or a prescription) is composed of 1 or several single herbs. The number of possible TCM formulas is nearly as large as that of chemical structures, so the development of quantitative formula-activity relationship models is as appealing as to build a quantitative structure-activity relationship model. In this work, a formula descriptor system based on the TCM holistic medical model is generated to correlate and predict formula functions by using similarity indices. First, 73 general descriptors of 78 formulas from Chinese Pharmacopeia (2010) are computed. Second, 6 different similarity indices are used to evaluate the similarities among the 78 formulas. As the main functions of the 78 formulas are known and annotated, a significant similarity implies that a formula is likely to have some new functions owned by its “analogue.” Finally, different similarity measures are compared with reference to the results of experimental and clinical studies. The consistency between some predictions and the literature results indicates that the proposed method can provide clues for mining and investigating the unknown functions of TCM formulas.
Datum: 28.09.2017


Introducing special issue on chemical image analysis


Datum: 15.09.2017


Post-modified non-negative matrix factorization for deconvoluting the gene expression profiles of specific cell types from heterogeneous clinical samples based on RNA-sequencing data

The application of supervised algorithms in clinical practice has been limited by the lack of information on pure cell types. Several supervised algorithms have been proposed to estimate the gene expression patterns of specific cell types from heterogeneous samples. Post-modified non-negative matrix factorization (NMF), the unsupervised algorithm we proposed here, is capable of estimating the gene expression profiles and contents of the major cell types in cancer samples without any prior reference knowledge. Post-modified NMF was first evaluated using simulation data sets and then applied to deconvolution of the gene expression profiles of cancer samples. It exhibited satisfactory performance with both the validation and application data. For application in 3 types of cancer, the differentially expressed genes (DEGs) identified from the deconvoluted gene expression profiles of tumor cells were highly associated with the cancer-related gene sets. Moreover, the estimated proportions of tumor cells showed significant difference between the 2 compared patient groups in clinical endpoints. Our results indicated that the post-modified NMF can efficiently extract the gene expression patterns of specific cell types from heterogeneous samples for subsequent analysis and prediction, which will greatly benefit clinical prognosis.
Datum: 31.08.2017


FTIR-ATR adulteration study of hempseed oil of different geographic origins

Adulteration of hempseed (H) oil, a well-known health beneficial nutrient, is studied in this work by mixing it with cheap and widely used oils such as rapeseed (R) oil and sesame (Se) and sunflower (Su) oil. Many samples of different geographic origins were taken into account. Binary mixture sets of hempseed oil with these 3 oils (HR, HSe, and HSu) were considered. FTIR spectra of pure oils and their mixtures were recorded, and quantitative analyses were performed using partial least squares regression (PLS) and first-break forward interval PLS methods (FB-FiPLS). The obtained results show that each particular oil can be very successfully quantified (R2(val) > 0.995, RMSECV 0.9%–2.9%, RMSEP 1.0%–3.2%). This means that FTIR coupled with multivariate methods can rapidly and effectively determine the level of adulteration in the adulterated hempseed oil for these studied and frequently used adulterant oils. Also, the relevant variables selected by FB-FiPLS could be used for verification of hempseed oil adulteration.
Datum: 31.08.2017


Prediction of pitting corrosion status of EN 1.4404 stainless steel by using a 2-stage procedure based on support vector machines

The excellent properties of EN 1.4404 have made this material one of the most popular types of austenitic stainless steel used for many applications. However, in aggressive environments, this alloy may suffer corrosion. Electrochemical analyses have been extensively used in order to evaluate pitting corrosion behaviour of stainless steel. These techniques may be followed by microscopic analysis in order to determine the resistance of the passive layer. This step requires the human interpretation, and therefore, subjectivity may be included in the results. This work aims to solve this drawback by the development of an automatic model with the capability to predict pitting corrosion status of this material. A combined model based on support vector machines (SVMs) is presented in this work. With the aim to improve the prediction performance, the model considers the breakdown potential values estimated by itself at a first stage. The performance is evaluated based on receiver operating characteristic (ROC) curves. The area under the curve (AUC) and accuracy results (0.998 and 0.952, respectively) demonstrate the utility of the proposed model as an efficient and accurate tool to predict pitting behaviour of EN 1.4404 automatically.
Datum: 29.08.2017


Quionolone carboxylic acid derivatives as HIV-1 integrase inhibitors: Docking-based HQSAR and topomer CoMFA analyses

Quionolone carboxylic acid derivatives as inhibitors of HIV-1 integrase were investigated as a potential class of drugs for the treatment of acquired immunodeficiency syndrome (AIDS). Hologram quantitative structure-activity relationships (HQSAR) and translocation comparative molecular field vector analysis (topomer CoMFA) were applied to a series of 48 quionolone carboxylic acid derivatives. The most effective HQSAR model was obtained using atoms and bonds as fragment distinctions: cross-validation q2 = 0.796, standard error of prediction SDCV = 0.36, the non-cross-validated r2 = 0.967, non-cross validated standard error SD = 0.17, the correlation coefficient of external validation Qext2 = 0.955, and the best hologram length HL = 180. topomer CoMFA models were built based on different fragment cutting models, with the most effective model of q2 = 0.775, SDCV = 0.37, r2 = 0.967, SD = 0.15, Qext2 = 0.915, and F = 163.255. These results show that the models generated form HQSAR and topomer CoMFA were able to effectively predict the inhibitory potency of this class of compounds. The molecular docking method was also used to study the interactions of these drugs by docking the ligands into the HIV-1 integrase active site, which revealed the likely bioactive conformations. This study showed that there are extensive interactions between the quionolone carboxylic acid derivatives and THR80, VAL82, GLY27, ASP29, and ARG8 residues in the active site of HIV-1 integrase. These results provide useful insights for the design of potent new inhibitors of HIV-1 integrase.
Datum: 29.08.2017


Chemometrics optimization for simultaneous adsorptive removal of ternary mixture of Cu(II), Cd(II), and Pb(II) by Fraxinus tree leaves

Fraxinus tree leaves were successfully used to remove ternary mixture of Cu(II), Cd(II), and Pb(II) from an aqueous solution in a batch system. The simplex-centroid mixture design was used for optimization of the biosorption process. The effective factors on biosorption process, such as pH, s (amount of biosorbent), and Ci initial concentrations of metal ions, were considered via a crossed mixture–process design. Optimal conditions were found to be as follows: pH = 5, s (sorbent mass) = 0.05 g, CCu (initial Cu(II) concentration) = 100.0 mg/L, CCd (Initial Cd(II) concentration) = 129.1 mg/L, and CPb (Initial Pb(II) concentration) = 70.9 mg/L. The results clearly show competitive effects between mixture ingredients in favor of Pb(II), and also an interaction between process and mixture variables was observed. It was found that, with increasing Pb(II) contribution, the removal efficiency increases to its highest value. The pH has a positive effect and sorbent mass a negative effect on the response. To characterize the biosorption, Fourier transform infrared analysis was performed and, according to the results, the main functional groups of sorbent were involved in the biosorption process.
Datum: 29.08.2017


Fault detection based on weighted difference principal component analysis

Recently, multivariate statistical methods, such as principal component analysis (PCA), have drawn increasing attention for fault detection applications in industrial processes. However, industrial processes typically have complex multimodal and nonlinear characteristics. In these situations, the traditional PCA method performs poorly due to its assumption that the process data are linear and unimodal. To improve fault detection performance in nonlinear and multimode industrial processes, this paper proposes a new fault detection method based on weighted difference principal component analysis (WDPCA). Weighted difference principal component analysis first eliminates the multimodal and nonlinear characteristics of the original data by using the weighted difference method. Then, PCA is applied to the preprocessed data, neglecting the influences of multimodality and nonlinearity. Two numerical examples and an industrial application in a semiconductor manufacturing process are used to verify the effectiveness of WDPCA. The simulation results demonstrate that WDPCA shows better fault detection performance than the PCA, kernel principal component analysis (KPCA), independent component analysis (ICA), k-nearest neighbor rule (kNN), and local outlier factor (LOF) methods.
Datum: 24.08.2017


Independent component analysis based on data-driven reconstruction of multi-fault diagnosis

Independent component analysis based on data-driven reconstruction has been widely used in online fault diagnosis for industrial processes. As an alternative to conventional contribution plots, the reconstruction-based fault diagnosis method has been drawing special attention. The method detects fault information with a specific reconstruction model based on historical fault data. In this paper, a novel method was proposed that focuses on handling multiple fault cases in abnormal processes. First, reconstruction-based fault subspaces were extracted based on monitoring statistics in 2 different monitoring subspaces to enclose the major fault effects. Independent component analysis was then used to recover the main fault features from the selected fault subspaces, which represent the joint effects from multiple faults for online diagnosis. The simulation results showed the feasibility and performance of the proposed method with simulated multi-fault cases in the Tennessee Eastman (TE) benchmark process.
Datum: 24.08.2017


Identification of hit compounds for squalene synthase: Three-dimensional quantitative structure-activity relationship pharmacophore modeling, virtual screening, molecular docking, binding free energy calculation, and molecular dynamic simulation

Squalene synthase (SQS) is the key precursor in the synthesis of cholesterol. Located downstream in relation to hydroxy methylglutaryl coenzyme A reductase and having no influence on the formation of biologically necessary isoprenoids make it an interesting target for the development of cholesterol lowering drugs with fewer side effects. To discover novel SQS inhibitors, three-dimensional quantitative structure-activity relationship pharmacophore models were built and further validated by cost function analysis, test set validation, and decoy set validation to obtain a reliable model for virtual screening against a database that contains 5.5 million compounds. The interactions between SQS and the ligands were predicted by an integrated protocol that contains molecular docking, molecular mechanics/generalized born surface area, and molecular dynamic simulation. After that, five compounds with best binding affinities and binding modes were obtained as potential hits for further study and three of them showed inhibitory effects against SQS.
Datum: 24.08.2017


Calculation of topological indices from molecular structures and applications

This mini review presents a brief description of the research efforts for new topological indices of organic molecular structures undertaken in the authors' laboratory at Changchun Institute of Applied Chemistry, Chinese Academy of Sciences. They were used for the processing of chemical information, as highly selective topological indices for uniqueness determination, as highly selective atomic chiral indices for chiral center recognition, in the exhaustive generation of isomers, in a stereo code for the exhaustive generation of stereoisomers, in the prediction of C-13 nuclear magnetic resonance spectra, and in studies on rare earth extractions. The topological indices Ami, 3D descriptors, and chiral descriptors are described, as well as their applications in quantitative structure activity/property relationship studies.
Datum: 24.08.2017


Sampling error profile analysis (SEPA) for model optimization and model evaluation in multivariate calibration

A novel method called sampling error profile analysis (SEPA) based on Monte Carlo sampling and error profile analysis is proposed for outlier detection, cross validation, pretreatment method and wavelength selection, and model evaluation in multivariate calibration. With the Monte Carlo sampling in SEPA, a number of submodels are prepared and the subsequent error profile analysis yields a median and a standard deviation of the root-mean-square error (RMSE) for the submodels. The median coupled with the standard deviation is an estimation of the RMSE that is more predictive and robust because it uses representative submodels produced by Monte Carlo sampling, unlike the normal method, which uses only 1 model. The error profile analysis also calculates skewness and kurtosis for an auxiliary judgment of the estimated RMSE, which is useful for model optimization and model evaluation. The proposed method is evaluated with 3 near-infrared datasets for wheat, corn, and tobacco. The results show that SEPA can diagnose outliers with more parameters, select more reasonable pretreatment method and wavelength points, and evaluate the model more accurately and precisely. Compared with the results reported in published papers, a better model could be obtained with SEPA concerning RMSECV, RMSEC, and RMSEP estimated with an independent prediction set.
Datum: 24.08.2017


Selectivity-relaxed classical and inverse least squares calibration and selectivity measures with a unified selectivity coefficient

Two popular calibration strategies are classical least squares (CLS) and inverse least squares (ILS). Underlying CLS is that the net analyte signal used for quantitation is orthogonal to signal from other components (interferents). The CLS orthogonality avoids analyte prediction bias from modeled interferents. Although this orthogonality condition ensures full analyte selectivity, it may increase the mean squared error of prediction. Under certain circumstances, it can be beneficial to relax the CLS orthogonality requisite allowing a small interferent bias if, in return, there is a mean squared error of prediction reduction. The bias magnitude introduced by an interferent for a relaxed model depends on analyte and interferent concentrations in conjunction with analyte and interferent model sensitivities. Presented in this paper is relaxed CLS (rCLS) allowing flexibility in the CLS orthogonality constraints. While ILS models do not inherently maintain orthogonality, also presented is relaxed ILS. From development of rCLS, presented is a significant expansion of the univariate selectivity coefficient definition broadly used in analytical chemistry. The defined selectivity coefficient is applicable to univariate and multivariate CLS and ILS calibrations. As with the univariate selectivity coefficient, the multivariate expression characterizes the bias introduced in a particular sample prediction because of interferent concentrations relative to model sensitivities. Specifically, it answers the question of when can a prediction be made for a sample even though the analyte selectivity is poor? Also introduced are new component-wise selectivity and sensitivity measures. Trends in several rCLS figures of merit are characterized for a near infrared data set.
Datum: 17.08.2017


Quantitative structure-selectivity relationship (QSSR)-based molecular insight into the cross-reactivity and specificity of chemotherapeutic inhibitors between PI3Kα and PI3Kβ

Selective inhibition of phosphoinositide 3-kinase (PI3K) isoforms α and β with small-molecule inhibitors can result in distinct biological effects on anticancer chemotherapy. However, many existing PI3K inhibitors have moderate or high promiscuity and cross-reactivity between the 2 kinase isoforms. Here, a quantitative structure-selectivity relationship–based statistical modeling scheme was used to characterize the relative contribution of independent kinase residues to inhibitor selectivity and to predict the selectivity and specificity for existing PI3K inhibitors. It is found that the residue type and distribution of kinase's active site play an important role in inhibitor selectivity, while rest of the kinase may contribute to the selectivity through long-range chemical interactions and indirect allosteric effect. The selectivity is also determined by the configuration difference between PI3Kα and PI3Kβ kinase domains. Larger inhibitor compounds have lower binding potency to PI3Kβ than PI3Kα and thus possess higher selectivity for PI3Kα over PI3Kβ.
Datum: 28.07.2017


Ensemble partial least squares regression for descriptor selection, outlier detection, applicability domain assessment, and ensemble modeling in QSAR/QSPR modeling

In QSAR/QSPR modeling, building an accurate partial least squares (PLS) model usually involves descriptor selection, outlier detection, applicability domain assessment, nonlinear relationship, and model stability problems. In the present study, we presented an ensemble PLS (EnPLS) method for solving these modeling tasks under a unified methodology framework. EnPLS aims at developing a consistent algorithmic framework by means of the idea of ensemble learning and statistical distribution. The approach exploits the fact that the distribution of PLS model coefficients provides a mechanism for ranking and interpreting the effects of variables, whereas the distribution of prediction errors provides a mechanism for differentiating the outliers from normal samples and assessing the applicability domain of models. The use of statistics of these distributions, namely, mean/median value and standard deviation, inherently provides a feasible way to effectively describe the information contained by the original samples. Furthermore, ensemble modeling and prediction based on several cross-predictive PLS models could effectively improve the model prediction performance and increase the model stability to a certain extent. The aqueous solubility data are used to demonstrate the ability of our proposed EnPLS method in solving various modeling tasks such as descriptor selection, outlier detection, applicability domain assessment, performance improvement, and model stability. Finally, a freely available R package implementing EnPLS is developed to facilitate the use of chemists and pharmacologists. The R package is freely available at https://github.com/wind22zhu/enpls1.2.
Datum: 19.07.2017


Robust variable selection based on bagging classification tree for support vector machine in metabonomic data analysis

In metabonomics, metabolic profiles of high complexity bring out tremendous challenges to existing chemometric methods. Variable selection (ie, biomarker discovery) and pattern recognition (ie, classification) are two important tasks of chemometrics in metabonomics, especially biomarker discovery that can be potentially used for disease diagnosis and pathology discovery. Typically, the informative variables are elicited from a single classifier; however, it is often unreliable in practice. To rectify this, in the current study, bagging and classification tree (CT) were combined to form a general framework (ie, BAGCT) for robustly selecting the informative variables, based on the advantages of CT in automatically carrying out variable selection as well as measuring variable importance and the properties of bagging in improving the reliability and robustness of a single model. In BAGCT, a set of parallel CT models were established based on the idea of bagging, each CT providing some endowed information such as the splitting variables and their corresponding importance values. The informative variables can be successfully spied via inspecting the variable importance values over all CTs in BAGCT. Taking the promising properties of support vector machine (SVM) into account, we used the informative variables identified by BAGCT as the inputs of SVM, forming a new classification tool abbreviated as BAGCT-SVM. A metabonomic dataset by hydrogen-1 nuclear magnetic resonance from the patients with lung cancer and the healthy controls was used to validate BAGCT-SVM with CT and SVM as comparisons. Results showed that BAGCT-SVM with less number of variables can give better predictive ability than CT and SVM.
Datum: 19.07.2017


The EIS-based Kohonen neural network for high strength steel coating degradation assessment

Electrochemical impedance spectroscopy (EIS) method is used for a long-term and in-depth study on the failure analysis of polymer coatings. With the assistance of neural networks, a deeper insight into the changing states of corrosion during certain exposure circumstances has been investigated by applying specific Kohonen intelligent learning networks. The Kohonen artificial network has been trained by using 4 sets of samples from sample 1# to sample 4# with unsupervised competitive learning methods. Each sample includes up to 14 cycles of EIS data. The trained network has been tested using sample 0# impedance data at 0.1 Hz. All the sample data were collected during exposure to accelerated corrosion environments, and it took the changing rate of impedance of each cycle as an input training sample. Compared with traditional classification, Kohonen artificial network method classifies corrosion process into 5 subprocesses, which is refinement of 3 typical corrosion processes. The 2 newly defined subprocesses of corrosion, namely, premiddle stage and postmiddle stage were introduced. The EIS data and macro-morphology for both subprocesses were analyzed through accelerated experiments that considered general atmospheric environmental factors such as UV radiation, thermal shock, and salt fog. The classification results of Kohonen artificial network are highly consistent with the predictions based on impedance magnitude at low frequency, which illustrates that the Kohonen network classification is an effective method to predict the failure cycles of polymer coatings.
Datum: 13.07.2017


Quantitative analysis based on spectral shape deformation: A review of the theory and its applications

Most of the commonly used calibration methods in quantitative spectroscopic analysis are established or derived from the assumption of a linear relationship between the concentrations of the analytes of interest and corresponding absolute spectral intensities. They are not applicable for heterogeneous samples where the potential uncontrolled variations in optical path length due to the changes in samples' physical properties undermine the basic assumption behind them. About a decade ago, a unique calibration strategy was proposed to extract chemical information from spectral data contaminated with multiplicative light scattering effects. From then on, this calibration strategy has been attentively examined, modified, and used by its developers. After more than 10 years of development, some important features of the calibration strategy have been identified. It has been proved that the calibration strategy can solve many complex problems in quantitative spectroscopic analysis. But, because of the relatively low awareness of the calibration strategy among chemometrics society, its potential has not been fully exploited yet. This paper reviews the theory of the calibration strategy and its applications with a view to introducing the unique and powerful calibration strategy to a wider audience.
Datum: 15.06.2017


Design matrices and modelling


Datum: 02.06.2017


The O-PLS methodology for orthogonal signal correction—is it correcting or confusing?

The separation of predictive and nonpredictive (or orthogonal) information in linear regression problems is considered to be an important issue in chemometrics. Approaches including net analyte preprocessing methods and various orthogonal signal correction (OSC) methods have been studied in a considerable number of publications. In the present paper, we focus on the simplest single response versions of some of the early OSC approaches including Fearns OSC, the orthogonal projections to latent structures, the target projection (TP), and the projections to latent structures (PLS) postprocessing by similarity transformation. These methods are claimed to yield improved model building and interpretation alternatives compared with ordinary PLS, by filtering “off” the response-orthogonal parts of the samples in a dataset. We point out at some fundamental misconceptions that were made in the justification of the PLS-related OSC algorithms and explain the key properties of the resulting modelling.
Datum: 11.04.2017






Information about this site:

Last update: 08.02.2016

The author- or copyrights of the listed Internet pages are held by the respective authors or site operators, who are also responsible for the content of the presentations.

To see your page listed here: Send us an eMail! Condition: Subject-related content on chemistry, biochemistry and comparable academic disciplines!

Topic: Current, research, scientific, chemometrics, letters, science, recent, journal, list, articles..








(C) 1996 - 2017 Internetchemistry










Current Chemistry Job Vacancies:

[more job vacancies]