[24]
What do you mean by classification?
Process of arranging data into homogeneous (similar) groups according to their common characteristics. Raw data cannot be easily understood, and it is not fit for further analysis and interpretation. Arrangement of data helps users in comparison and analysis. For example, the population of a town can be grouped according to sex, age, marital status, etc.
“Classification is the process of arranging data into sequences according to their common characteristics or separating them into different related parts.”
– Horace Secrist
Classification is a supervised machine learning method where the model tries to predict the correct label of a given input data. In classification, the model is fully trained using the training data, and then it is evaluated on test data before being used to perform prediction on new unseen data. For instance, an algorithm can learn to predict whether a given email is spam or ham (no spam).
Lazy Learners Vs. Eager Learners
Eager learners are machine learning algorithms that first build a model from the training dataset before making any prediction on future datasets. They spend more time during the training process because of their eagerness to have a better generalization during the training from learning the weights, but they require less time to make predictions. Most machine learning algorithms are eager learners, like, Logistic Regression, Support Vector Machines, Decision Trees and Artificial Neural Networks.
Lazy learners or instance-based learners, on the other hand, do not create any model immediately from the training data, and this is where the lazy aspect comes from. They just memorize the training data, and each time there is a need to make a prediction, they search for the nearest neighbor from the whole training data, which makes them very slow during prediction. Some examples of this kind are K-Nearest Neighbor and Case-based reasoning.
Classification in Real Life
Healthcare: Training a machine learning model on historical patient data can help healthcare specialists accurately analyze their diagnoses For example, during the COVID-19 pandemic, machine learning models were implemented to efficiently predict whether a person had COVID-19 or not.
Education: Education is one of the domains dealing with the most textual, video, and audio data. This unstructured information can be analyzed with the help of Natural Language technologies to perform different tasks such as classification of documents per category.
Sustainable agriculture: Agriculture is one of the most valuable pillars of human survival. Introducing sustainability can help improve farmers' productivity at a different level without damaging the environment. By using classification models to predict which type of land is suitable for a given type of seed.
Different Types of Classification
Binary Classification
The goal is to classify the input data into two mutually exclusive categories. The training data in such a situation is labeled in a binary format: true and false; positive and negative; O and 1; spam and not spam, etc. depending on the problem being tackled. For instance, we might want to detect whether a given image is a truck or a boat. Logistic Regression and Support Vector Machines algorithms are natively designed for binary classifications. However, other algorithms such as K-Nearest Neighbors and Decision Trees can also be used for binary classification.
Multi-Class Classification
The multi-class classification, on the other hand, has at least two mutually exclusive class labels, where the goal is to predict to which class a given input example belongs to. In the following case, the model correctly classified the image to be a plane. Most of the binary classification algorithms can be also used for multi-class classification. These algorithms include but are not limited to Random Forest, Naive Bayes, K-Nearest Neighbors, Gradient Boosting, Support Vector Machines and Logistic Regression.
One-versus-one
This strategy trains as many classifiers as there are pairs of labels. If we have a 3-class classification, we will have three pairs of labels, thus three classifiers. For N labels, we will have N x (N-1)/2
classifiers. Each classifier is trained on a single binary dataset, and the final class is predicted by a majority vote between all the classifiers. One-vs-one approach works best for SVM and other kernel-based algorithms.
One-versus-rest
At this stage, we start by considering each label as an independent label and consider the rest combined as only one label. With 3-classes, we will have three classifiers. In general, for N labels, we will have N
binary classifiers.
Multi-Label Classification
In multi-label classification tasks, we try to predict 0 or more classes for each input example. In this case, there is no mutual exclusion because the input example can have more than one label. Such a scenario can be observed in different domains, such as auto-tagging in Natural Language Processing, where a given text can contain multiple topics.
It is not possible to use multi-class or binary classification models to perform multi-label classification. However, most algorithms used for those standard classification tasks have their specialized versions for multi-label classification. We can cite: Multi-label Decision Trees, Multi-label Gradient Boosting and Multi-label Random Forests.
Imbalanced Classification
For the imbalanced classification, the number of examples is unevenly distributed in each class, meaning that we can have more of one class than the others in the training data. Let’s consider the following 3-class classification scenario where the training data contains: 60% of trucks, 25% of planes, and 15% of boats. The imbalanced classification problem could occur in the following scenario:
Fraudulent transaction detections in financial industries
Rare disease diagnosis
Customer churn analysis
Using conventional predictive models such as Decision Trees, Logistic Regression, etc. could not be effective when dealing with an imbalanced dataset, because they might be biased toward predicting the class with the highest number of observations, and considering those with fewer numbers as noise.