Understanding and implementation of Multilayer Perceptrons (MLP)

This blog provides an in-depth look on the R&D phase about MLP, along with detailed insights from the training and testing phases of the Multilayer Perceptron (MLP) model.

In the realm of artificial intelligence, neural networks have become a cornerstone for solving complex problems. One such neural network that has garnered significant attention is the Multilayer Perceptron (MLP).

After creating the dataset, we need to have a clear understanding of MLPs. But before we dive into the technical details, let’s first get acquainted with what an MLP truly is.

A Multilayer Perceptron is a type of neural network composed of multiple layers, including an input layer, one or more hidden layers, and an output layer. Each of these layers contains neurons, the fundamental building blocks of the network. MLPs are renowned for their versatility and are widely used in tasks like forecasting models and image pattern recognition.[1]

Exploring the Architecture and Functioning of a Multilayer Perceptron (MLP)

A Multilayer Perceptron (MLP) is a type of artificial neural network commonly used for tasks such as regression and classification. It comprises multiple layers of neurons, each playing a crucial role in processing data and making predictions.

Architecture:

  • Input Layer: The neurons in the input layer receive features directly from the dataset, with each neuron corresponding to a specific feature.
  • Hidden Layers: Positioned between the input and output layers, hidden layers consist of neurons whose number is a tunable hyperparameter. These layers are vital for detecting complex patterns within the data.
  • Output Layer: This final layer generates predictions. The number of neurons here depends on the specific task—whether it’s binary classification (one neuron), multi-class classification (one neuron per class), or regression (one neuron for continuous output).

How MLP Works:

  • Initialization: The process begins by assigning small random values to the weights and biases.
  • Forward Propagation: As data flows through the network, neurons apply activation functions to process and transmit weighted inputs, introducing non-linearity to the model.
  • Loss Calculation: A loss function (such as Mean Squared Error for regression or Cross-Entropy for classification) compares the model's output with the target values, calculating the error.
  • Backpropagation: The model then adjusts its weights and biases to minimize this error. This is done by computing gradients and updating parameters through optimization techniques like Gradient Descent.
  • Training: This process is repeated over many iterations, or epochs, until the model converges and accurately maps inputs to outputs.
  • Prediction: Once trained, the MLP uses the fine-tuned weights and biases to make predictions on new, unseen data.

Training is carried out by the code available here:https://github.com/MEGHA-MURALI/MSc_Robotics_Final_Year_Project/blob/65231ad3438f8f1d5fe8d8167f7eafc61f2ba1d8/training.ipynb

Training Results: Evaluating MLP Performance on the different Datasets

In our exploration of Multilayer Perceptrons (MLPs), we trained the model on three distinct datasets, each varying in complexity. These datasets consisted of hand landmark data from different numbers of individuals—ranging from a single person to four individuals. Below, we present the training and testing results for each case.

Case 1: Single Person's Hand Landmark Data

Model Score on Training Data: 0.9998

Model Score on Test Data: 0.9986

In this scenario, where the dataset contained hand landmark data from just one individual, the MLP model performed exceptionally well. The near-perfect scores on both training and test data suggest that the model was able to effectively learn and generalize from the limited dataset.

Case 2: Two Persons Hand Landmark Data

Model Score on Training Data: 0.9979

Model Score on Test Data: 0.9941

When the dataset was expanded to include hand landmark data from two individuals, the model still maintained high performance, though there was a slight drop in accuracy. This indicates that as the complexity of the dataset increased, the model's ability to generalize slightly diminished, but it remained robust.

Case 3: Four Individuals Hand Landmark Data

Model Score on Training Data: 0.9983

Model Score on Test Data: 0.9976

In the most complex scenario, with data from four different individuals, the model continued to perform at a high level. The slight variation in scores between training and test data suggests that the model was able to adapt well to the increased diversity in the dataset.

Training Inference:

Across all three cases, the MLP model demonstrated strong performance, achieving near-perfect scores on both training and testing datasets. This suggests that the model is highly effective at learning from and generalizing across datasets of varying complexity. However, it's important to note that as the complexity of the data increased, the model's performance showed minor fluctuations, highlighting that it is less overfitted compared to initial models.

Testing Insights:

Test code available here:https://github.com/MEGHA-MURALI/MSc_Robotics_Final_Year_Project/blob/main/QuickTest.py

As part of our testing phase, we evaluated the performance of our Multilayer Perceptron (MLP) model on hand gesture data from a Single Person, two individuals and four individuals. The results highlight various challenges and improvements in recognizing specific gestures as the complexity of the dataset increases. Below is a detailed breakdown of the testing outcomes:

Case 1: Single Person's Hand Gesture Data

• Alif Gesture: Detecting the 'Alif' gesture proved to be challenging, particularly when the gesture was close to other similar signs. The model frequently confused 'Alif' with several other gestures, including 'Ghayn,' 'Ayn,' 'Kha,' 'Jim,' 'Ta,' and 'Sad.' The high level of confusion indicates that the model struggled to distinguish 'Alif' from other similar gestures in this dataset.

• Waw Gesture: The 'Waw' gesture was often misidentified as 'Jim' in this case, suggesting difficulties in distinguishing these two gestures when trained on data from only one person.

• Ba Gesture: There was also confusion between the 'Ba' sign and the numeric gesture '1', highlighting another area where the model struggled with accuracy.

• Numeric Gestures: The number '2' was often confused with the 'Ta' and 'Taa' gestures, further demonstrating the model's challenges in distinguishing between gestures with similar visual characteristics.

Case 2: Two Persons' Hand Gesture Data

• Alif Gesture: The second case also encountered difficulties with the 'Alif' gesture, though the level of confusion was somewhat reduced compared to the first case.

• Waw Gesture: Interestingly, the confusion between 'Waw' and 'Jim' observed in the first case was not present in this dataset, indicating an improvement in distinguishing between these two gestures.

• Numeric Gestures: The confusion between the number '2' ,'Ta' and’Taa' gestures persisted, though to a lesser extent.

Case 3: Four Persons Hand Gesture Data

• Alif Gesture: The third case showed significant improvement in detecting the 'Alif' gesture, with much less confusion compared to the first two cases. This suggests that increasing the diversity in the training data helped the model better distinguish between similar gestures.

• Ba Gesture: There was a noticeable reduction in confusion between the 'Ba' sign and the numeric gesture '1', though some instances of misclassification still occurred.

• Numeric Gestures: While there was still occasional confusion between the number '2' and the 'Ta'/'Taa' gestures, the frequency of these errors decreased, indicating better overall accuracy.

Test Results:

Conclusion:

Testing the MLP model across these different datasets revealed key insights into its performance. While the model initially struggled with certain gestures, such as 'Alif' and 'Waw,' the introduction of more diverse training data led to marked improvements in accuracy. However, some challenges remain, particularly in distinguishing between similar gestures and numeric signs. These findings highlight the critical role of hyperparameter tuning in improving accuracy. I’ll be sharing updates on hyperparameter tuning in the upcoming blog.

Reference:

[1] K. Y. Chan et al., “Deep neural networks in the cloud: Review, applications, challenges and research directions,” Neurocomputing, vol. 545, Aug. 2023, doi: 10.1016/j.neucom.2023.126327.

[2] “Classification Using Sklearn Multi-layer Perceptron - GeeksforGeeks,” 2023. https://www.geeksforgeeks.org/classification-using-sklearn-multi-layer-perceptron/ (accessed Aug. 17, 2024).

Comments

Popular posts from this blog

Understanding Saudi Arabic Sign Language Gestures