Recommended Posts

Posted

Let me know if you wish me to summarise it for you. 

:lookaround:

 

https://www.nature.com/articles/s41598-025-88091-4

 

Retrieval of nicotine content in cigar leaves by remote analysis of aerial hyperspectral combining machine learning methods

 

Abstract

Cigar leaf is a special type of tobacco plant, which is the raw material for producing high-quality cigars. The content and proportion of nicotine and other composite substances of cigar leaves have a crucial impact on their quality and vary greatly with the time of harvest. Hyperspectral remote sensing technology has been widely used in the field of crop monitoring because of its advantages of large area coverage, fast information acquisition, short cycle turnover, strong real-time performance and high efficiency. Therefore, it is important to accurately monitor nicotine content of field crops in a timely manner in the production of high-quality cigar leaf. To this end, this study set out to measure crop reflectance spectra acquired by UAV drones from tobacco field crops by hyperspectral image acquisition. MSC, SG, and SNV were combined and applied to the raw data. The output of these operations was then further processed by CARS, SPA, and UVE algorithms to determine the nicotine sensitive bands. Three machine learning algorithms were then used to analyze the data: PLS, BP, RF, and the SVM. An inversion model of the content of nicotine was established, and the model was evaluated for accuracy. The main research conclusions are as follows: (1) With the increase in the rate of application of nitrogen fertilizer, the nicotine content of cigar leaves increased; (2) Processing data by the CARS, SPA, and UVE methods reduces the degree of data redundancy and information co-linearity in the screening of the content of nicotine sensitive bands; (3) The MSC-SNV-SG-CARS-BP model has the best predictive accuracy on the nicotine content. The prediction accuracy of the testing set was R2 = 0.797, RMSE = 0.078,RPD = 2.182.

Introduction

Tobacco is an important cash crop. Among, cigar leaf is a special type of tobacco which has the characteristics of unique aroma, strong taste, great strength and strong satisfaction1. The intrinsic substances of tobacco leaf directly determine its smoking quality. The quality of cigar leaves includes its appearance quality, chemical composition, and human sensory response to its aroma2. Among these, the chemical composition of tobacco, such as sugars, total nitrogen content, nicotine and other substances, in addition, other factors such as the type, proportion, and content level, play a role on the quality of cigar leaves. Fast and accurate monitoring of the quality of cigar leaves is of importance for the production of high-quality cigar leaves3. However, in the current production of cigar leaves in China, quality monitoring is still based on traditional methods of manual investigation, and modern information technology is rarely used.

Hyperspectral remote sensing refers to the technology of obtaining multiple, highly dense, and spectrally resolved images from optical targets in the four bands of the electromagnetic spectrum (ultraviolet band, visible band, near infrared band and middle infrared visible band). Hyperspectral remote sensing technology has the advantages of low cost and high efficiency compared with conventional artificial ground survey due to its capacity for large area coverage, fast image acquisition speed, short period, and strong real-time performance. At the same time, hyperspectral remote sensing itself produces a large amount of information collected under a large number of derived variables. In the topic of crop monitoring, especially in rice, wheat, corn and other crops, multiple parameters have been used to characterize crop growth and product yields. By acquiring continuous spectral images of a target object in different wavelength bands, hyperspectral remote sensing acquires, analyzes and compares its reflection and absorption characteristics for the purpose of obtaining detailed information on the composition and structure of the material4. This technology utilizes narrow band width, combined with high spectral resolution and density. Its data is parametrized in a variety of ways, which makes it of some consequence in agricultural production. For example, in planting flue-cured tobacco, hyperspectral remote sensing technology has been used to access key indicators such as growth status, quality and stress of tobacco leaves, which is important for achieving efficient and accurate flue-cured tobacco production5,6.

Spectral reflectance is a parameter that indicates the ability of an object to reflect light at different wavelength bands. By measuring changes in spectral reflectance factors as a function of wavelength, one can identify crop growth problems in a timely fashion. Rapid scanning rates and high area coverage facilitate timely remediation measures. Daily surveys can enable farmers and producers to establish a scientific basis to identify actions that will improve crop quality and yield, or reduce unnecessary waste of resources. The absorption, reflection and transmission factors of optical radiation being monitored are determined by the assorted biochemical constituents that are active within the crop, or indicative of its state of hydration. Reflectivity parameters that characterize crop growth are perhaps the easiest to monitor from the perspective of surveying an entire field of crops. Measurements of the changes of spectral reflectance obtained at different angles of incident illumination and at different optical wavelengths have been used to characterize crop growth and agricultural production under varying seasonal and environmental conditions. A band selection method called Genetic Programming Spectral Vegetation Index (GP-SVI), was previously reported by7. They proposed that a combination of SVI and GP be used to characterize crop growth. The method derives from GA. In addition, maize canopy nitrogen content was correlated with hyperspectral data acquired by the Compact Airborne Spectrographic Imager (CASI) sensor8. used binary PSO to select the best feature subset and fed it to SVR to estimate rice N concentration. The features extracted by PCA were incorporated into an artificial neural network and used to estimate the maize N concentration index9 and rice N concentration10.About in tobacco leaf information UAV spectral data modeling inversion, Qin uses PROSAIL to augment hyperspectral images with samples to avoid overfitting. The inversion model combines K-means and XGBoost to form a hybrid model. The results indicate that the hybrid model outperforms the other models on the validation set with R2 = 0.83 and RMSE = 3.911. Zhang estimated tobacco LNC using UAV hyperspectral-image data. The results show that all the ensemble learning methods are superior to PLSR (R2 = 0.680, RMSE = 5.402 mg/g, 19.72%). Specifically, the stack-based model achieves the highest accuracy and relatively high stability (R2 = 0.745, RMSE = 4.825 mg/g, 17.98%)12. Junying et al.13 proposed a method to predict the K2O content of tobacco based on UAV-borne hyperspectral imaging. The results on the test set show that the RMSE of the model is 0.40, and the absolute value of the average relative error is 8.04%.Junying et al.14 proposed a method for predicting total sugar content based on UAV-borne hyperspectral imaging. The model was constructed by combining spectral properties and measured total sugar values. According to the sample test results, the RMSE of the model is 1.84, and the absolute value of the average relative error is 8.82%. Hayes and Reed15 conducted a field study using UAV-borne hyperspectral imaging to detect tobacco pests and diseases. The subspace LDA algorithm is utilized to test the recognition ability, and the overall accuracy is 85.7%.

With the continuous development of hyperspectral remote sensing technology over recent years, it is now possible to observe weak spectral differences. This technology is capable of being used directly to obtain quantitative analysis, which gives credibility to the prospects that remote sensing has considerable potential for field applications to characterize plant physiological and biochemical health16. This study uses hyperspectral information technology to collect the spectral information of cigar leaves as they grow in the field. It establishes corresponding mathematical models, and inverts the data to reveal and predict the content of nicotine that is related to the quality of cigar leaves at maturity. It aims to broaden the applications of hyperspectral sensing to optimize leaf production and value.

Experimental design

Test materials

The test materials in this experiment are leafs from 15 cigar varieties. They are: Cuba 1, Cuba 4, Cuba 5, Cuba 6, Cuba 7, Cuba 8, Cuba 9, Slovenia 1, Norway 1, Dominica 2, Nicaragua 2, Indonesia 1, Desuce 1, Desuce 3, and Shiyan 1.

Experimental site profile

This work adopted a two-factor split-plot design, with nitrogen fertilizer accretion as the main controlling factor and tobacco variety as the secondary factor. A nitrogen (pure nitrogen) application gradient was established at three levels: level 1, 0 kg/667 m2; level 2, 6 kg/667 m2; level 3, 12 kg/667 m2.During the field growth stage (May to September), the average temperature was 25℃, the average contents of soil organic matter was 29.61 g/kg, the average contents of total nitrogen, available phosphorus and available potassium were 1.88 g /kg,18.09 mg/kg and 56.90 mg/kg, respectively. The total precipitation in the field during the growth stage was 260 mm. The soil type was yellow brown soil. The tobacco soil has a strong capacity of water and fertilizer, and is convenient for drainage and irrigation. the soil water content is 60–70%, the soil organic matter content is 1.5%, and the pH value is 6.3.

Experimental samples

In this study, the upper, middle and lower leaves of the tobacco plant of each variety were collected at the Modern Agricultural Research Base at Sichuan Agricultural University in Chongzhou, Chengdu, Sichuan Province. Leaves were extracted during the mature picking period from June 20 to July 20, 2022. In total, 15 cigar varieties were picked : Cuba 1, Cuba 4, Cuba 5, Cuba 6, Cuba 7, Cuba 8, Cuba 9, Slovenia 1, Norway 1, Dominica 2, Nicaragua 2, Indonesia 1, Desue 1, Desue 3, and Shiyan 1(Table 1). Next, three nitrogen fertilizer concentrations were established and sampling analysis was carried out at the mature stage. The tobacco leaves are ripe and harvested, dried naturally, and then collected High-grade cigar tobacco leaves were treated with 2 kg each, and conventional chemical components such as nicotine were analyzed.A total of 270 cigar tobacco samples were measured for nicotine and other chemical components.

Table 1 Varieties of cigars.

Number

Variety

S52

Cuba 1

S53

Cuba 4

S54

Cuba 5

S55

Cuba 6

S56

Cuba 7

S57

Cuba 8

S58

Cuba 9

S59

Slovenia 1

S60

Norway 1

S61

Dominica 2

S62

Nicaragua 2

S63

Indonesia 1

S64

Shiyan No. 1.

S65

Desue 1

S66

Desue 3

 

 

Partition of sample set

A total of 270 tobacco spectral data were randomly sampled in sequence, 216 of which were used as the training set and 54 were used as the testing set. The outflow method is divided according to the ratio of the training set to the test set at a ratio of 3:7.

Hyperspectral image acquisition and data extraction

Hyperspectral image acquisition

After the tobacco growth period, the DJI M600pro six-rotor UAV equipped with Nano-Hyperspec hyperspectral camera was used to take field photos every 7 days until the whole field entered the mature stage. UAV operations should be conducted when good weather conditions can be ensured, when the wind does not exceed level 3, and during the time period around noon. Before take-off, the datum of the survey area should be checked and a spacious, unobstructed hardened road surface selected as the area for take-off and landing. The flight height is 10 m, and the original image obtained by the UAV should be quasi-static. The hyperspectral sensor used has a spectral resolution of 4 nm, The number of axes of the UAV is 6, the wheelbase is 1133 mm, the maximum load is 15.5 kg, the maximum flight speed is 18 m/s, the maximum climb speed is 5 m/s, the maximum descent speed is 3 m/s.

The original images were then stitched into complete images using Pix4DMapper software (Pix4D S.A., Switzerland). The target region is selected in each tobacco field image, and then the spectral information in each region of interest is extracted to obtain the corresponding original spectral data. Part of the hyperspectral image is shown in Fig. 1.Due to the role of hyperspectral sensors, system platforms, atmosphere, and terrain in data acquisition, the generated image pixels are squeezed, stretched, distorted, and shifted relative to the actual locations in the cultivation area. processed using the atmospheric correction and radiometric correction Calibration function of ENVI 5.3 (ENVI/IDL5.3., USA). Radiometric correction and geometric calibration are also essential for reducing noise interference, improving reflection accuracy, and ensuring the precision of radiometric measurements. These operations can convert the raw images into hyperspectral reflectance data for the entire tobacco field.

Fig. 1
 

figure 1

Part of the hyperspectral image. (A) Study area; (B) High nitrogen and medium nitrogen treatment areas; (C) Low nitrogen treatment area.ArcGIS (V10.3 Esri).

Spectral enhanced noise reduction and smoothing

Spectral enhancement mainly includes multiplicative scatter correction (MSC) and standard normal variate (SNV) processing. Matlab 2021 (MathWorks, USA)was used to pre-process the original spectral data for signal enhancement. Spectral smoothing and noise elimination processing incorporate Savitzky-Golay (SG) convolution smoothing and baseline correction. SG convolutional smoothing was proposed by Savitzky and Golay17 to play the role of noise resolution without changing the original spectrum. Among the various options, the polynomial fitting multiplied by the window size directly affects data smoothing18 Matlab 2021 was used to perform spectral smoothing and de-noising on the original spectral data files. These aspects of signal processing are associated with hyperspectral image pre-processing methods. They are useful for reducing and often eliminating the inclusion of irrelevant information in the formed image. Image noise reduction is often critical to providing reliable data source files for calculating subsequent inversion models.

Variable selection of hyperspectral features

Uninformative variable elimination (UVE) is an effective noise suppression technique that extracts valuable features from a large number of irrelevant variables, and effectively eliminates useless information by simulating the stability of variables. Matlab 2021 algorithms were used to shuffle the original spectral data and add noise through construction of an independent variable matrix. After adding noise, UVE makes a varying judgment call that assigns a regression coefficient of the target matrix based on the independent variable matrix that comprises the spectral variable + noise. The statistical distribution of the regression coefficient is represented by the ratio of the mean and standard deviation. The statistical distribution of regression coefficients was represented by the ratio of mean and standard deviation. The ultimate feature variables are determined by evaluating the upper and lower signal limits and proposing variables that lie within the corresponding range19.

Successive Projections Algorithm (SPA) is a widely used, wavelength dependent algorithm20 that selects spectral features. This algorithm is available in Matlab 2021. It scrambles the original spectral data, then reconstructs the matrix such that it expands the data set to the full band range. A wavelength i is randomly selected, whereupon a successive projection series is generated, starting from i, to achieve the best spectral feature selection. This process is repeated until all the bands are covered, and finally the wavelength subset of the whole band is determined.

Competitive adaptive reweighted sampling (CARS) is widely used for uncovering hidden features. It captures the specific characteristic wavelength by the least squares variational method and Monte Carlo sampling and accomplishes effective identification of specific spectral characteristics. Matlab 2021 performs these operations by obfuscating the raw information of the spectrum, then utilizes Monte Carlo sampling and minimizes the variation between (what variables) by partial least squares method in order to pick out features with progressively increasing weights. The extrema of these become the final feature wavelengths21.

Establishment of inversion model

Machine learning regression methods have been successfully applied in many instances, including monitoring crop physiological and biochemical parameters. Inversion models have mainly utilized the partial least squares regression (PLSR)22, Back propagation neural network (BP)23, support vector regression (SVR)24, and random forest regression (RFR)25. They all possess the exceptional ability to explore complex non-linear relationships between spectral features and biochemical parameters status indicators in the absence of explicit knowledge of the precise distributions. BP is an artificial neural network ANN. ANN are a common method for developing nonlinear regression and consist of one input layer, multiple hidden layers, and one output layer26. The number of hidden layers is determined, in practice, by parameter tuning. Some researchers have suggested that the number of hidden neurons cannot exceed the maximum value obtained by doubling the sum of neurons in the input and output layers27. Other ANN parameters, such as initial weights, activation functions, and learning rates and dynamics, also have an impact on the model performance. Two important parameters to tune when dealing with in RFR algorithms are the number of regression trees to construct, and the associated input variables to allow at each node. They are usually determined by tracing out trees and computing errors against the training set of parameters. Optimal use involves estimating trade-offs between computational cost and estimation accuracy.

The selection of model evaluation index

In this study, the determination of coefficient R2 and root mean square error (RMSE) were selected as the appropriate indicators to evaluate the estimation model and the validation model. The larger R2 of the estimation model is, the smaller is the RMSE, indicating the better accuracy of the estimation model. RMSEC is the root mean square error of the training set and RMSEP is the root mean square error of the test set. The larger the R2 of the verification model, the smaller is the RMSE, indicating the better stability of the estimation model28. R2 and RMSE are calculated using the following formulas (1) ,(2) and (3).

image.png

Here, n is the number of samples, Yi and Yi, are the number of samples, the following for nicotine content, respectively. Y− represents the average value of nicotine content of the measured samples.

A 5-fold cross-validation (5-fold cross-validation) is used to verify the optimal model selected from different models (the most suitable model). The data set was divided into five sets, and four of them were taken as training data and one as validation data. The corresponding accuracy was obtained for each trial, and the average of the accuracy of 5 times was used as an estimate of the accuracy of the algorithm. At the same time, different models do the same operation as above to get the average ability of each model on a specific data set, which is selected from among.

RPD (Relative Percent Difference) is an indicator used to evaluate the predictive performance of a model, especially in the fields such as spectral analysis or stoichiometry. It represents the ratio between the standard deviation (Standard Deviation, SD) and the predicted root mean square error (Root Mean Square Error of Prediction, RMSEP). The following is the expression form of the RPD calculation formula:

image.png
 

SD represents the standard deviation, usually the standard deviation of the actual observations or the reference dataset. A higher RPD value usually indicates a better predictive performance of the model, as it means that the difference between the predicted and actual values is small relative to the overall variability of the data.All image and statistics analyses were evaluated using Matlab 2021a (MathWorks Inc., Natick, MA, USA).

Results and analysis

The effects of different nitrogen fertilizer application rates on the chemical composition of cigar leaves

There was strong dependence of nicotine content on the quantity of nitrogen fertilizer applied to tobacco plants, as the greater the weight of nitrogen fertilizer applied, the higher the nicotine content (Table 2). The nicotine content of different varieties of cigars grows with the increased application of nitrogen. However, both high and low nicotine content affected the aroma and taste of cigars. Low nicotine content resulted in reducing the physiological stimulation of aroma and taste of cigars. High nicotine content was found to produce excessive physiological stimulation from its aroma, and adversely affects the taste of cigars.

 

image.png

Pre-treatment of raw spectra

Spectral data acquisition and pre-processing

The hyperspectral images of cigar leaves grown under different nitrogen fertilizer applications were collected by a hyperspectral imager mounted on an unmanned aerial vehicle (UAV). Figure 2A shows the original spectra in the visible-near IR band. Figure 3 shows the original spectra in different nitrogen fertilizer applications.

By adopting the joint processing by multivariate scatter correction, standard normal transformation and SG convolution smoothing algorithms, we are able to effectively reduce the interference from variations in the external environment, illumination sources, and instrument dark current on the hyperspectral data. Figure 2B shows the tobacco spectra after pre-processing the data allows us to achieve accurate baseline correction in spectra that originally exhibited a large amount of baseline drift.

According to the reflectance spectral decomposition found in the hyperspectral image, we found a jump in the spectral reflectance of tobacco leaves with increase of fertilizer nitrogen. In particular, the increase of spectral reflectance was greatest in the wavelength range between 700 and 1000 nm, and the net change was greatest in the range between 400 and 700 nm. The “green peak” is the locus of strong chlorophyll reflectivity, which peaks at 550 nm, and again within the near-infrared wavelength band between 700 and 1000 nm. These bands distinguish the spectral reflectance of tobacco regardless of the amount of added nitrogen fertilizer, but can be used to identify and quantify nitrogen compounds in samples that have received nitrogen fertilizer. By comparison, under different nitrogen concentration levels, the spectral band reflectances of tobacco plant showed similar characteristics to other green plants, and also showed similar variation patterns. However, there were multiple perceived scattering, baseline deviation, and wave effects in the original spectra that are resolved by multivariate scatter correction, standard normal distribution adjustment, and convolution smoothing. By using the combination MSC + SNV + SG algorithms, the variability errors in the range between 670 and 1000 nm were significantly reduced, and the number of resolved absorption peaks were significantly augmented.

image.png
 
figure 3
 

Conclusion

In this study, a hyperspectral camera was delivered to tobacco crop locations by a UAV and used to perform remote hyperspectral sensing. It transmits remote reflectane data files from field tobacco crops to a central computer facility where various signal processing methods were used to refine the original data. The computer executes artificial intelligence algorithms that run various inversion models to evaluate crop nicotine content. Fifteen types of cigar leaves were monitored with the aim to establish an inversion model with wider universality and greater ability to interpret nicotine content. The results show that the CARS-BP model based on MSC-SNV-SG pre-processing is the most representative model among the nonlinear models, and has better modeling accuracy, root mean square error (RMSE) and RPD than any other tested models. The MSC-SNV-SG-CARS-BP model has the best prediction accuracy in terms of the content of cigar nicotine.

The process from data collection to model development requires careful consideration and analysis. It is important to improve the quality of data obtained from UAV-based imagery, ground-based observations, and modeling approaches, which may be subject to errors. Therefore, a standardization procedure for the above process is needed. Future work can be investigated using different image collection methods, data processing algorithms, and modeling approaches to estimate biomass; In addition, the biophysical properties of plants and error sources can be further explored. In addition, the uncertainty and transferability of development models and application tests need to be evaluated. While the study achieved remarkable success in modeling cigar nicotine inversion using four different machine learning models, it must be acknowledged that the performance of machine learning-based methods such as BP neural networks can be affected by the specific characteristics of the target plant species and the environmental conditions in which they are grown. Different plant species may have different shapes, sizes, and growth patterns, which may affect the accuracy of object detection. The accuracy and robustness of cigar tobacco nicotine value estimation can be affected by factors such as plant structure, leaf thickness, plant cover, and soil environment, which vary by growth stage. In addition, variations in environmental factors such as light, soil type, and plant density may pose a challenge to the general applicability of the model47. Therefore, further research is needed to assess the adaptability of the method to a wider range of cigar species and diverse environmental Settings.

  • Like 1
Posted

This is a huge leap in agriculture technology. The ability to monitor a large-scale farm and detect deficiencies, diseases, hydration issues and so forth will ultimately lead to an increase in production and quality. Now, realistically, how many old-time farmers are going to actually utilize this technology, I think we can all bet on how that conversation is gonna go. This management strategy most likely won't go into practice for decades down the road, once new generations start managing crops. That coupled with the fact that this is a relatively new technology presents stability issues in the data. Interested in seeing how this pans out. 

  • Thanks 1
Posted

Sheesh. I was reading the title for the third time and just gave up

  • Like 1
  • Haha 1
Posted

Personally, I'm looking forward to their next article, "Why Habanos cigars taste different than New World ones with graphs and mathematical proofs!"

  • Haha 1
Posted

I kept thinking, "pigfish needs to break this down for me!"

  • Like 2

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.

Community Software by Invision Power Services, Inc.