Assessment item 2 – Project Proposal and Plan

Assessment Item 2

Technology: Machine Learning

Technique: Machine Learning Algorithm

Domain: Performance Analysis of Supervised and Unsupervised Algorithm

Project Title: Performance Analysis of Supervised and Unsupervised Algorithm in Anemia Dataset

Student Name	Santosh Dotel
Student Id	11710823

Performance Analysis of Supervised and Unsupervised Algorithms in Anemia Dataset

Abstract

Anemia is one of the hema-tological diseases and in children it becomes a worldwide problem. This paper specifies the dataset taken from a reliable secondary source and performs an analysis in both supervised and unsupervised algorithm in terms of learning models i.e accuracy, precision and recall. To get an idea form the data, we need to have efficient techniques for analyzing the data, we need to identify what kind of algorithms are best suited for our problem domain, what algorithms have loopholes for these kinds of data that might result in reducing the effectiveness and misleading the results. The dataset collected is filtered and eradicates the undesirable variables then it is implemented on some selected algorithm using data mining tools/python

Keywords: Anemia, Supervised Algorithm, Unsupervised Algorithm, Dataset

Problem Domain

Anemia is considered as one of the global public health problems that mainly affects young children and pregnant women (Meena et al., 2019). However, less research has been conducted in the anemia sector at an early stage. Furthermore, less research is done in regarding the impact of algorithms that we use on the date we analyze. Due to which sometimes it might lead to mislead the result.

Purpose and Justification: The main idea to develop this project is to test the data set of anemia patients using machine learning algorithms to know which one gives the better analysis in terms of precision, accuracy and recall. This will be a research-based project that will look into these difficulties and examine case studies to see how the effects of these issues might be reduced.

Machine Learning Algorithms (MLA) are widely used for early detection of disease which helps to extend the life span of elder’s people and improve their lifestyle. Machine Learning Algorithms have made it easier to correctly diagnose and identify many diseases, and predictive analysis using machine learning has helped to better anticipate disease and treat patients (Sasikala et al., 2021).

Background Information:

According to The World Health Organization (WHO) report 20% of people have anemia (KILICARSLAN et al., 2021). Machine learning is an emerging field mainly used for predictive analysis. Previously, numerous time machine learning and data mining techniques had been used for anemia disease prediction. Such as for the prediction of anemia and performance comparison of two algorithms is done by using Support Vector Machine and C4.5 decision tree algorithm (Sanap et al., 2011).

For the prediction of anemia three different classifier Naïve-Bayes, C4.5 Decision tree and Random forest have been used and among them the best performance, based on the accuracy, is provided by Naïve-Bayes classifier i.e 96 as compared to other (Jaiswal et al., 2018).

Research Questions:

Some of the research questions that have been put together for the current research are:

How does the Supervised and Unsupervised algorithm perform in providing a dataset of anemia patients?
Which factors are responsible for early stage anemia?

Conceptual or Theoretical Framework

This project is mainly based on conceptual framework.

Fig: Conceptual Framework

The steps that are performed in this section are listed below:

Step 1: Gap analysis and defining objectives
Step 2: Data collection and pre-processing
Step 3: Model Preparation
Step 4: Evaluation and analysis of results

Methodology:

Analysis of source of information: First and foremost, the method is gathering all the relevant resources, journals and articles using academic sources. The source can be international journal papers, books, publishers etc. The present document will use an APA referencing style that will be consistent throughout the project.
Research Method: The current paper’s research method will be based on the qualitative approach, in which data from the other reputable sources will be analyzed and inferred to provide general answers to the question posed for current research.
Data Collections: The data will be collected from reliable and secure sources such as Kaggle. Kaggle is referred to as one of the largest communities of data scientists and machine learning experts in the world. It provides a free dataset to run for academic purposes.
Ethical Issues: Because of the project’s nature, ethical issues should not be a concern, as all information that will be used are freely and readily available. As it will not deal with any primary sources. The most important considerations will be referencing sources correctly and ensuring that all the information is cited correctly.
Compliance Requirement: It is important to make sure that the current project will be ensured to comply with the project standards of the Australian Computer Society (ACS) and also with the government guidelines and expectations.

Some of the techniques and environment that can be used in this research are:

Python Program:

Python is a programming language that is becoming popular day by day in analysing of data. With the help of libraries such as pandas, numpy, matplotlib we can easily clean and mine the data and get good results in data analysis.

Supervised Machine Learning:

Supervised algorithm is a machine learning approach which is mainly used to enable machines to identify/classify or predict the objects or issues that are based on labelled data provided to the machine. For this project, classification types of algorithm are used to analyse the dataset.

Unsupervised Machine Learning:

Unsupervised algorithm is another machine learning purpose that are used to analyze and cluster unlabelled data sets. Clustering types of algorithm can be used in this research as unsupervised algorithm.

Parameters

Along with other technique some parameters are also used in this research such as accuracy, precision, recall.

Project Plan

Deliverables: The present project’s deliverables include an annotated bibliography which should include concise information from 12 journal articles related to anemia prediction using machine learning algorithm. Other deliverables include journal paper, report and finally seminar is done for completion of the project.
Work Breakdown Structure (WBS): For the current project, the WBS is constructed as follows:

Gantt Chart

Risk Analysis: The following risk matrix shows the current risk for the project.

Risk ID	Description	Likelihood	Consequence	Treatment
1	Inadequate time management results in failure to submit deliverable on time	High	High	Stick to the project plan and schedule. Make sure to create and update weekly progress report
2	Lack of Knowledge and experience in the domain	Medium	High	Read thoroughly and seek help/guidance from the lecturer or Senior.
3	No connection between research results and research question	Medium	High	Deep study methodology to match with the purpose of question.

References

Jaiswal, M., Srivastava, A., & Siddiqui, T. (2018). Machine Learning Algorithms for Anemia Disease Prediction. Lecture Notes In Electrical Engineering. https://doi.org/10.1007/978-981-13-2685-1_44

KILICARSLAN, S., CELIK, M., & SAHIN, Ş. (2021). Hybrid models based on genetic algorithm and deep learning algorithms for nutritional Anemia disease classification. Biomedical Signal Processing And Control, 63, 102231. https://doi.org/10.1016/j.bspc.2020.102231

Meena, K., Tayal, D., Gupta, V., & Fatima, A. (2019). Using classification techniques for statistical analysis of Anemia. Artificial Intelligence In Medicine, 94. https://doi.org/10.1016/j.artmed.2019.02.005

Sanap, S., Nagori, M., & Kshirsagar, V. (2011). Classification of Anemia Using Data Mining Techniques. Swarm, Evolutionary, And Memetic Computing. https://doi.org/10.1007/978-3-642-27242-4_14

Sasikala, N., Banu, G., Babiker, T., & Rajpoot, P. (2021). A Role of Data Mining Techniques to Predict Anemia Disease. International Journal Of Computer Applications, 174(20). https://doi.org/10.5120/ijca2021921090