I have found all twelve researchpapers and have completed the summeries for the papers. The details are shown below.
- Kaur, H., & Sharma, A. (2016). Improved Email Spam Classification Method Using Integrated Particle Swarm Optimization and Decision Tree. IEEE Xplore, 516-521.
The proposed hybrid technique is integrated with particle swarm optimization(PWO) basing on Decision Tree algorithm. This is a unsupervised machine learning methodology classifies spam mails and non-spam from spam base email datasets. J48 algorithmic program , K-means clustering, support vector machine are also been used in the proposed system evaluate the final output.
The study has obtained spam base dataset from UCI machine learning repository . This has a adequate amount of instances with discrete format, attributes which improves the accuracy of the study. Advanced Particle swarm optimization algorithm and J48 been used with and without supervised learning to calculate the final result of a the particular dataset to identify spam. Apart from that, unsupervised filtering is also has been used as a pre-processing system while classifying e-mail to achieve higher accuracy. According to authors, This proposed project is able to classify in to spam and non-spam with an accuracy of 98.32% .
- Singh, M., Pamula, R., & shekhar, S. k. (2018). Email Spam Classification by Support Vector Machine. IEEE Xplore, 878-882.
This paper has purposed a system to evaluate performance of Non- liner SVM based machine learning spam classifiers using two separate Kernel functions. These functions are Gaussian Kernel and Linear Kernel. This has used ‘Spam Assasin – Public Corpus Dataset’ for the training. The data set is analyzed using both kernels to determine which has higher accuracy in testing and training steps. As the final stage, the researchers have used Gmail inbox and its spam collector to classify spam using the purposed system.
In this study, for the processing of data set two separate data sets have been used to increase the accuracy. These are spamTrain.mat and spamTest.mat. The processing is carried out using pre-described processing functions. After the training, ‘Real Time Spam Prediction’ is carried out for spam identification using binary classification. This proposed system has the feature of storing the email sample as ‘.txt file’ format which can be easily deployed to email service providing website. This facilitates classifying incoming spam email registered to specific Email ID. As per the final finding of this project, training time consumption is higher for Gaussian Kernel compared to Linear Kernel. But both kernels have same accuracy level. However, Gaussian Kernel is more advanced and best fitted kernel for the proposed project as per the authors compared to linear kernel because the dataset used is large.
- Swetha, S. M., & Sarraf, G. (2019). Spam Email and Malware Elimination employing various Classification Techniques. IEEE Xplore, 140-145.
The research is focused on solution to eliminate spam using Supervised Machine learning- classification method using binary signature analysis. The authors have compared ten different machine learning classification algorithms such as Support Vector Machine, Decision Tree, k-Nearest Neighbors and Naïve Bayes. The algorithms are trained using pre-labeled data. The accuracy of each classifier is computed based on a set of novel data.
In this research, all he algorithms are treated with pre-labelled data sets. The first one is partial processed text dataset containing 19,000 ham and 26,000 spam. The second one is malicious file dataset consists with about 11,000 Mb size of files; 16,000 malicious and 9,000 legitimate files. The analysing novel data set is examined by all ten algorithms separately using 32 pre-identified parameters. In the project, Multiple executions have been carried out with each and every classifier to obtain higher accuracy. This has resulted varying in success level for different classifiers. Higher accuracy have been achieved for both text and file classification by SVM. The accuracy achieved in this study is about 99% as per the authors.