View on GitHub

Comparative Evaluation of Supervised Learning for Binary Weed Classification

ABE516X Project

Introduction

Weed image classification is an important task in precision agriculture, which aims to improve farming practices by using advanced technologies and data-driven decision-making. The main objective of weed image classification is to distinguish between weed and crop plants in images taken from agricultural fields. By accurately identifying weed species and their locations, farmers can take targeted actions to control weed growth, leading to improved crop yields and reduced usage of herbicides. When it comes to binary classification, we can use supervised learning like Navie Bayes (NB), support vector machine (SVM) and convolutional neuron network (CNN). In order to implement to real-world tasks, inference time, computing cost and accuracy are important to be consider determining which the methods are most suitable to be used. Thus, in this project, we experimentally present which of the binary classification supervised learning methods (NB, SVM, or CNN) performs better in terms of accuracy and computing time for weed detection in soybean crops.

  • Distinguish between weed and crop in images taken from fields.
  • Control weed growth, improve crop yields and reduce usage of herbicides.
  • Compare the computing time and accuracy between NB, SVM, and CNN.

All experiments are conducted in Python and run on a laptop with four 2.4 GHz cores and 16 GB of RAM.

Methods

To fulfill this project, we can follow the following steps:

Data preparation

# Import the necessary packages.

import numpy as np
import cv2
import os
import glob
import time

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC, LinearSVC
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

from imblearn.over_sampling import SMOTE, RandomOverSampler
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import Pipeline

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import models
from torch.utils.data import Dataset, DataLoader
# Preprocess the data by resizing images, normalizing pixel values, and formatting the data type

def load_images(path, label):
    images = []
    labels = []
    for img_path in glob.glob(os.path.join(path, "*.tif")):
        img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
        img = cv2.resize(img, (64, 64))
        images.append(img)
        labels.append(label)
    return images, labels

def run(method, sampling=0, crossvalidation=0, optm="SGD"):

    weed_images, weed_labels = load_images("weed", 1)
    non_weed_images, non_weed_labels = load_images("non_weed", 0)
    all_images = np.array(weed_images + non_weed_images)
    all_labels = np.array(weed_labels + non_weed_labels)

    X_train, X_test, y_train, y_test = train_test_split(all_images, all_labels, test_size=0.2, random_state=42)

    if method == "nb":
        start_time = time.time()
        X_train = X_train.reshape(X_train.shape[0], -1)
        X_test = X_test.reshape(X_test.shape[0], -1)
        if sampling != 0:
            resampling = Pipeline([('oversample', SMOTE()), ('undersample', RandomUnderSampler())])
            X_train, y_train = resampling.fit_resample(X_train, y_train)
        mtd = GaussianNB()
        mtd.fit(X_train, y_train)
        y_pred = mtd.predict(X_test)
        end_time = time.time()
        elapsed_time = end_time - start_time
        acc = accuracy_score(y_test, y_pred)

    elif method == "svm":
        start_time = time.time()
        X_train = X_train.reshape(X_train.shape[0], -1)
        X_test = X_test.reshape(X_test.shape[0], -1)
        scaler = StandardScaler()
        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)
        if crossvalidation != 0:
            param_grid = {'C': [0.1, 1, 10], 'gamma': [1, 0.1, 0.01], 'kernel': ['linear', 'rbf']}
            mtd = GridSearchCV(SVC(), param_grid, cv=2, verbose=2)
            mtd.fit(X_train_scaled, y_train)
            best_svm = mtd.best_estimator_
            y_pred = best_svm.predict(X_test_scaled)
        else:
            mtd = SVC(kernel='linear', C=1, gamma=0.1)
            mtd.fit(X_train_scaled, y_train)
            y_pred = mtd.predict(X_test_scaled)
            
        acc = accuracy_score(y_test, y_pred)
        end_time = time.time()
        elapsed_time = end_time - start_time
    
    elif method == "cnn":
        acc, elapsed_time = cnn(X_train, X_test, y_train, y_test, optm)
    
    return acc, elapsed_time
# Build the Convolutional Neuronal Network

class WeedDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        img = self.data[index]
        img = torch.from_numpy(img).unsqueeze(0).float() / 255.0
        label = self.labels[index]
        return img, label

class BinaryClassifier(nn.Module):
    def __init__(self):
        super(BinaryClassifier, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 16 * 16, 256)
        self.fc2 = nn.Linear(256, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 32 * 16 * 16)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = self.sigmoid(x)
        return x

def cnn(X_train, X_test, y_train, y_test, optm="SGD"):
    train_dataset = WeedDataset(X_train, y_train)
    test_dataset = WeedDataset(X_test, y_test)
    train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    model = BinaryClassifier().to(device)
    criterion = nn.BCELoss()
    optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

    num_epochs = 30
    start_time = time.time()
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        for images, labels in train_loader:
            images = images.to(device, dtype=torch.float)
            labels = labels.to(device, dtype=torch.float).view(-1, 1)
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
            # print(f"Epoch {epoch+1}/{num_epochs}, Loss: {running_loss/len(train_loader)}")
    end_time = time.time()
    elapsed_time = end_time - start_time
    
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images = images.to(device, dtype=torch.float)
            labels = labels.to(device, dtype=torch.float).view(-1, 1)
            
            outputs = model(images)
            predicted = (outputs > 0.5).float()
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    acc_cnn = correct / total
    return acc_cnn, elapsed_time

Model training

Results

# Visualize the reults

import matplotlib.pyplot as plt

methods = ['NB', 'NB with \n oversampling', 'SVM', 'SVM with \n cross validation', 'CNN']
accuracies = np.array([acc_nb, acc_nb_os, acc_svm, acc_svm_cv, acc_cnn]) *100
times = np.array([elapsed_time_nb, elapsed_time_nb_os, elapsed_time_svm, elapsed_time_svm_cv, elapsed_time_cnn])

def add_value_labels(bs):
    for bar in bs:
        height = bar.get_height()
        plt.text(bar.get_x()+bar.get_width()/2,height,f'{height:.2f}',ha='center',va='top',fontsize=9)

bar_acc = plt.bar(methods, accuracies, color='g')
plt.yscale('linear')
plt.ylim([60,100])
plt.xlabel('Methods')
plt.ylabel('Accuracy (%)')
plt.title('Comparison of Accuracy for Different Methods')
add_value_labels(bar_acc)
plt.tight_layout()
plt.savefig('acc.png', dpi=300)
plt.show()

bar_time = plt.bar(methods, times, color='c')
plt.yscale('log')
plt.xlabel('Methods')
plt.ylabel('Time (s)')
plt.title('Comparison of Computing Time for Different Methods')
add_value_labels(bar_time)
plt.tight_layout()
plt.savefig('time.png', dpi=300)
plt.show()

Discussion

In this project, I implemented various supervised learning techniques that we learned in class, such as Naive Bayes (NB), Support Vector Machine (SVM), oversampling, undersampling, and SVM kernel changing. Furthermore, I employed Convolutional Neural Network (CNN), a popular classification method, and evaluated its performance against the previously mentioned techniques. In this section, I will examine the strengths and weaknesses of each approach in addressing the task at hand, as well as their underlying assumptions.

In terms of calculations, NB (which assumes all features are independent) primarily relies on probability computations, involving simple multiplication and addition operations. SVM (which assumes that data points can be assigned to their respective groups and separated into two classes) mainly consumes memory during the mapping process using kernel functions, such as linear, polynomial, and Radial Basis Function (RBF). CNN calculations are considerably more complex, including the computation of convolutions(linear combination), nonlinear activation functions, and the processing of pooling.

  NB SVM CNN
Complexity Simple and fast to train More complex Highly complex and deep
Interpretability Relatively interpretable Moderate interpretability Low interpretability
Scalability Highly scalable Scalable for moderate-sized datasets Scalable but computationally expensive

Here are some suitable applications for NB, SVM, and CNN in computer vision:

Conclusion

In summary, SVM and CNN can be favorable options for binary image classification due to their ability to handle high-dimensional data. Although NB can train and perform inference rapidly (in less than 0.15 seconds), its accuracy is relatively low, falling below 70%. If a task requires high accuracy, NB may not be the most suitable choice. For instance, you wouldn’t want a self-driving car to have only 70% accuracy in object detection during driving. The results show that resampling methods significantly improve NB’s accuracy while only adding an extra 0.11 seconds, making it a viable candidate for simple image classification tasks.

On the other hand, SVM with cross-validation and CNN require substantial training time, taking 223.08 and 75.13 seconds, respectively. For complex tasks, it might be necessary to train these models in advance. Despite the high computing time and cost associated with CNN, its near 99.9% accuracy makes it indispensable for complicated tasks that demand precision.

The experiment in this project partially adheres to the FAIR principles, as it is somewhat findable and accessible. The dataset used in this project is publicly available on Kaggle, which enables researchers to access it with relative ease. However, to ensure better findability, it is essential to organize the data according to the standard directory structure, which should ideally be in compliance with the FAIR principles. The strucute of dataset used in this project is “class-based directory structure”, which is like:

dataset/
├── weed/
│   ├── image1.jpg
│   ├── image2.jpg
│   ├── ...
├── non_weed/
│   ├── image1.jpg
│   ├── image2.jpg
│   ├── ...

Furthermore, this project also follows to the FAIR principle of reusability. The methods used in the experiment, such as NB, SVM, and CNN, were implemented as functions, making it easy for other researchers to reuse and build upon the code in further steps. This approach facilitates transparency and reproducibility of the experiment, which are important aspects of scientific research. By making the code reusable, other researchers can test and validate the methods and results, and potentially build upon them to generate new insights or applications.

For further work after weed classification, this knowledge can aid in precision agriculture practices, such as automatically localizing herbicide application targets using agricultural robots. By focusing on specific needs, it can help reduce the use of chemicals and contribute to a more sustainable environment.

References

  1. Zhang, H. (2004). The optimality of naive Bayes. Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference. pdf
  2. Kumar, P., & Gopal, M. (2009). A hybrid feature selection via mutual information for text categorization. Proceedings of the International Joint Conference on Neural Networks. pdf
  3. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. pdf
  4. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 1, 886-893. pdf
  5. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems. pdf
  6. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pdf
  7. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pdf
  8. He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision. pdf
  9. Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pdf