Introduction
Weed image classification is an important task in precision agriculture, which aims to improve farming practices by using advanced technologies and data-driven decision-making. The main objective of weed image classification is to distinguish between weed and crop plants in images taken from agricultural fields. By accurately identifying weed species and their locations, farmers can take targeted actions to control weed growth, leading to improved crop yields and reduced usage of herbicides. When it comes to binary classification, we can use supervised learning like Navie Bayes (NB), support vector machine (SVM) and convolutional neuron network (CNN). In order to implement to real-world tasks, inference time, computing cost and accuracy are important to be consider determining which the methods are most suitable to be used. Thus, in this project, we experimentally present which of the binary classification supervised learning methods (NB, SVM, or CNN) performs better in terms of accuracy and computing time for weed detection in soybean crops.
- Distinguish between weed and crop in images taken from fields.
- Control weed growth, improve crop yields and reduce usage of herbicides.
- Compare the computing time and accuracy between NB, SVM, and CNN.
All experiments are conducted in Python and run on a laptop with four 2.4 GHz cores and 16 GB of RAM.
Methods
To fulfill this project, we can follow the following steps:
Data preparation
- Download the dataset from Kaggle (Link)
- Rearrange the number of images into 260 for weed and 5000 for non-weed.
- Import the necessary packages.
- Preprocess the data by resizing images, normalizing pixel values, and formatting the data type (for example, squeeze the dimension) in order to input the model. We use NumPy array as input for NB and SVM, and Tensor for CNN.
- Split the dataset into training, and test sets with the sizes of 0.8 and 0.2, respectively.
# Import the necessary packages.
import numpy as np
import cv2
import os
import glob
import time
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC, LinearSVC
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from imblearn.over_sampling import SMOTE, RandomOverSampler
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import Pipeline
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import models
from torch.utils.data import Dataset, DataLoader
# Preprocess the data by resizing images, normalizing pixel values, and formatting the data type
def load_images(path, label):
images = []
labels = []
for img_path in glob.glob(os.path.join(path, "*.tif")):
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, (64, 64))
images.append(img)
labels.append(label)
return images, labels
def run(method, sampling=0, crossvalidation=0, optm="SGD"):
weed_images, weed_labels = load_images("weed", 1)
non_weed_images, non_weed_labels = load_images("non_weed", 0)
all_images = np.array(weed_images + non_weed_images)
all_labels = np.array(weed_labels + non_weed_labels)
X_train, X_test, y_train, y_test = train_test_split(all_images, all_labels, test_size=0.2, random_state=42)
if method == "nb":
start_time = time.time()
X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)
if sampling != 0:
resampling = Pipeline([('oversample', SMOTE()), ('undersample', RandomUnderSampler())])
X_train, y_train = resampling.fit_resample(X_train, y_train)
mtd = GaussianNB()
mtd.fit(X_train, y_train)
y_pred = mtd.predict(X_test)
end_time = time.time()
elapsed_time = end_time - start_time
acc = accuracy_score(y_test, y_pred)
elif method == "svm":
start_time = time.time()
X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
if crossvalidation != 0:
param_grid = {'C': [0.1, 1, 10], 'gamma': [1, 0.1, 0.01], 'kernel': ['linear', 'rbf']}
mtd = GridSearchCV(SVC(), param_grid, cv=2, verbose=2)
mtd.fit(X_train_scaled, y_train)
best_svm = mtd.best_estimator_
y_pred = best_svm.predict(X_test_scaled)
else:
mtd = SVC(kernel='linear', C=1, gamma=0.1)
mtd.fit(X_train_scaled, y_train)
y_pred = mtd.predict(X_test_scaled)
acc = accuracy_score(y_test, y_pred)
end_time = time.time()
elapsed_time = end_time - start_time
elif method == "cnn":
acc, elapsed_time = cnn(X_train, X_test, y_train, y_test, optm)
return acc, elapsed_time
# Build the Convolutional Neuronal Network
class WeedDataset(Dataset):
def __init__(self, data, labels):
self.data = data
self.labels = labels
def __len__(self):
return len(self.data)
def __getitem__(self, index):
img = self.data[index]
img = torch.from_numpy(img).unsqueeze(0).float() / 255.0
label = self.labels[index]
return img, label
class BinaryClassifier(nn.Module):
def __init__(self):
super(BinaryClassifier, self).__init__()
self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(32 * 16 * 16, 256)
self.fc2 = nn.Linear(256, 1)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 32 * 16 * 16)
x = F.relu(self.fc1(x))
x = self.fc2(x)
x = self.sigmoid(x)
return x
def cnn(X_train, X_test, y_train, y_test, optm="SGD"):
train_dataset = WeedDataset(X_train, y_train)
test_dataset = WeedDataset(X_test, y_test)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = BinaryClassifier().to(device)
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
num_epochs = 30
start_time = time.time()
for epoch in range(num_epochs):
model.train()
running_loss = 0.0
for images, labels in train_loader:
images = images.to(device, dtype=torch.float)
labels = labels.to(device, dtype=torch.float).view(-1, 1)
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
# print(f"Epoch {epoch+1}/{num_epochs}, Loss: {running_loss/len(train_loader)}")
end_time = time.time()
elapsed_time = end_time - start_time
model.eval()
correct = 0
total = 0
with torch.no_grad():
for images, labels in test_loader:
images = images.to(device, dtype=torch.float)
labels = labels.to(device, dtype=torch.float).view(-1, 1)
outputs = model(images)
predicted = (outputs > 0.5).float()
total += labels.size(0)
correct += (predicted == labels).sum().item()
acc_cnn = correct / total
return acc_cnn, elapsed_time
Model training
- Fit a Naive Bayes classifier.
acc_nb, elapsed_time_nb = run("nb")
- Fit a Naive Bayes classifier with oversampling and undersampling (using the functions SMOTE() and RandomUnderSampler() with default settings). The minority which is weed images is expanded to the same size as the majority, and the number of training set is changed from 4028 to 8022.
acc_nb_os, elapsed_time_nb_os = run("nb", sampling=1)
- Fit an SVM classifier (implemented with gamma=0.1, C=1, and linear as kernel)
acc_svm, elapsed_time_svm = run("svm")
- Fit an SVM classifier with cross-validation (the hyperparameters combination of gamma, C and kernel are [1, 0.1, 0.01], [0.1, 1, 10], [linear, rbf], respectively.)
acc_svm_cv, elapsed_time_svm_cv = run("svm", crossvalidation=1)
- Train a CNN classifier using the preprocessed images. Use the simplest optimizer, stochastic gradient descent with 30 training epochs, 0.01 learning rate, and 0.9 momentum.
acc_cnn, elapsed_time_cnn = run("cnn")
Results
# Visualize the reults
import matplotlib.pyplot as plt
methods = ['NB', 'NB with \n oversampling', 'SVM', 'SVM with \n cross validation', 'CNN']
accuracies = np.array([acc_nb, acc_nb_os, acc_svm, acc_svm_cv, acc_cnn]) *100
times = np.array([elapsed_time_nb, elapsed_time_nb_os, elapsed_time_svm, elapsed_time_svm_cv, elapsed_time_cnn])
def add_value_labels(bs):
for bar in bs:
height = bar.get_height()
plt.text(bar.get_x()+bar.get_width()/2,height,f'{height:.2f}',ha='center',va='top',fontsize=9)
bar_acc = plt.bar(methods, accuracies, color='g')
plt.yscale('linear')
plt.ylim([60,100])
plt.xlabel('Methods')
plt.ylabel('Accuracy (%)')
plt.title('Comparison of Accuracy for Different Methods')
add_value_labels(bar_acc)
plt.tight_layout()
plt.savefig('acc.png', dpi=300)
plt.show()
bar_time = plt.bar(methods, times, color='c')
plt.yscale('log')
plt.xlabel('Methods')
plt.ylabel('Time (s)')
plt.title('Comparison of Computing Time for Different Methods')
add_value_labels(bar_time)
plt.tight_layout()
plt.savefig('time.png', dpi=300)
plt.show()
Discussion
In this project, I implemented various supervised learning techniques that we learned in class, such as Naive Bayes (NB), Support Vector Machine (SVM), oversampling, undersampling, and SVM kernel changing. Furthermore, I employed Convolutional Neural Network (CNN), a popular classification method, and evaluated its performance against the previously mentioned techniques. In this section, I will examine the strengths and weaknesses of each approach in addressing the task at hand, as well as their underlying assumptions.
In terms of calculations, NB (which assumes all features are independent) primarily relies on probability computations, involving simple multiplication and addition operations. SVM (which assumes that data points can be assigned to their respective groups and separated into two classes) mainly consumes memory during the mapping process using kernel functions, such as linear, polynomial, and Radial Basis Function (RBF). CNN calculations are considerably more complex, including the computation of convolutions(linear combination), nonlinear activation functions, and the processing of pooling.
NB | SVM | CNN | |
---|---|---|---|
Complexity | Simple and fast to train | More complex | Highly complex and deep |
Interpretability | Relatively interpretable | Moderate interpretability | Low interpretability |
Scalability | Highly scalable | Scalable for moderate-sized datasets | Scalable but computationally expensive |
Here are some suitable applications for NB, SVM, and CNN in computer vision:
- NB
- Image classification (simple tasks with a small number of features)
- Text recognition
- SVM
- Image classification (high-dimensional, binary or multi-class tasks)
- Object detection
- Face recognition
- CNN
- Image classification (highly effective for complex tasks)
- Object detection and segmentation
- Deep Facial recognition
- Image-to-image translation and style transfer
Conclusion
In summary, SVM and CNN can be favorable options for binary image classification due to their ability to handle high-dimensional data. Although NB can train and perform inference rapidly (in less than 0.15 seconds), its accuracy is relatively low, falling below 70%. If a task requires high accuracy, NB may not be the most suitable choice. For instance, you wouldn’t want a self-driving car to have only 70% accuracy in object detection during driving. The results show that resampling methods significantly improve NB’s accuracy while only adding an extra 0.11 seconds, making it a viable candidate for simple image classification tasks.
On the other hand, SVM with cross-validation and CNN require substantial training time, taking 223.08 and 75.13 seconds, respectively. For complex tasks, it might be necessary to train these models in advance. Despite the high computing time and cost associated with CNN, its near 99.9% accuracy makes it indispensable for complicated tasks that demand precision.
The experiment in this project partially adheres to the FAIR principles, as it is somewhat findable and accessible. The dataset used in this project is publicly available on Kaggle, which enables researchers to access it with relative ease. However, to ensure better findability, it is essential to organize the data according to the standard directory structure, which should ideally be in compliance with the FAIR principles. The strucute of dataset used in this project is “class-based directory structure”, which is like:
dataset/
├── weed/
│ ├── image1.jpg
│ ├── image2.jpg
│ ├── ...
├── non_weed/
│ ├── image1.jpg
│ ├── image2.jpg
│ ├── ...
Furthermore, this project also follows to the FAIR principle of reusability. The methods used in the experiment, such as NB, SVM, and CNN, were implemented as functions, making it easy for other researchers to reuse and build upon the code in further steps. This approach facilitates transparency and reproducibility of the experiment, which are important aspects of scientific research. By making the code reusable, other researchers can test and validate the methods and results, and potentially build upon them to generate new insights or applications.
For further work after weed classification, this knowledge can aid in precision agriculture practices, such as automatically localizing herbicide application targets using agricultural robots. By focusing on specific needs, it can help reduce the use of chemicals and contribute to a more sustainable environment.
References
- Zhang, H. (2004). The optimality of naive Bayes. Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference. pdf
- Kumar, P., & Gopal, M. (2009). A hybrid feature selection via mutual information for text categorization. Proceedings of the International Joint Conference on Neural Networks. pdf
- Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. pdf
- Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 1, 886-893. pdf
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems. pdf
- Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pdf
- Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pdf
- He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision. pdf
- Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pdf