Assessment item 2 – Project Proposal and Plan
Assessment Item 2
Technology: Machine Learning
Technique: Machine Learning Algorithm
Domain: Performance Analysis of Supervised and Unsupervised Algorithm
Project Title: Performance Analysis of Supervised and Unsupervised Algorithm in Anemia Dataset
Student Name | Santosh Dotel |
Student Id | 11710823 |
Performance Analysis of Supervised and Unsupervised Algorithms in Anemia Dataset
Abstract
Anemia is one of the hema-tological diseases and in children it becomes a worldwide problem. This paper specifies the dataset taken from a reliable secondary source and performs an analysis in both supervised and unsupervised algorithm in terms of learning models i.e accuracy, precision and recall. To get an idea form the data, we need to have efficient techniques for analyzing the data, we need to identify what kind of algorithms are best suited for our problem domain, what algorithms have loopholes for these kinds of data that might result in reducing the effectiveness and misleading the results. The dataset collected is filtered and eradicates the undesirable variables then it is implemented on some selected algorithm using data mining tools/python
Keywords: Anemia, Supervised Algorithm, Unsupervised Algorithm, Dataset
Problem Domain
Anemia is considered as one of the global public health problems that mainly affects young children and pregnant women (Meena et al., 2019). However, less research has been conducted in the anemia sector at an early stage. Furthermore, less research is done in regarding the impact of algorithms that we use on the date we analyze. Due to which sometimes it might lead to mislead the result.
Purpose and Justification: The main idea to develop this project is to test the data set of anemia patients using machine learning algorithms to know which one gives the better analysis in terms of precision, accuracy and recall. This will be a research-based project that will look into these difficulties and examine case studies to see how the effects of these issues might be reduced.
Machine Learning Algorithms (MLA) are widely used for early detection of disease which helps to extend the life span of elder’s people and improve their lifestyle. Machine Learning Algorithms have made it easier to correctly diagnose and identify many diseases, and predictive analysis using machine learning has helped to better anticipate disease and treat patients (Sasikala et al., 2021).
Background Information:
According to The World Health Organization (WHO) report 20% of people have anemia (KILICARSLAN et al., 2021). Machine learning is an emerging field mainly used for predictive analysis. Previously, numerous time machine learning and data mining techniques had been used for anemia disease prediction. Such as for the prediction of anemia and performance comparison of two algorithms is done by using Support Vector Machine and C4.5 decision tree algorithm (Sanap et al., 2011).
For the prediction of anemia three different classifier Naïve-Bayes, C4.5 Decision tree and Random forest have been used and among them the best performance, based on the accuracy, is provided by Naïve-Bayes classifier i.e 96 as compared to other (Jaiswal et al., 2018).
Research Questions:
Some of the research questions that have been put together for the current research are:
- How does the Supervised and Unsupervised algorithm perform in providing a dataset of anemia patients?
- Which factors are responsible for early stage anemia?
Conceptual or Theoretical Framework
This project is mainly based on conceptual framework.
Fig: Conceptual Framework
The steps that are performed in this section are listed below:
- Step 1: Gap analysis and defining objectives
- Step 2: Data collection and pre-processing
- Step 3: Model Preparation
- Step 4: Evaluation and analysis of results
Methodology:
- Analysis of source of information: First and foremost, the method is gathering all the relevant resources, journals and articles using academic sources. The source can be international journal papers, books, publishers etc. The present document will use an APA referencing style that will be consistent throughout the project.
- Research Method: The current paper’s research method will be based on the qualitative approach, in which data from the other reputable sources will be analyzed and inferred to provide general answers to the question posed for current research.
- Data Collections: The data will be collected from reliable and secure sources such as Kaggle. Kaggle is referred to as one of the largest communities of data scientists and machine learning experts in the world. It provides a free dataset to run for academic purposes.
- Ethical Issues: Because of the project’s nature, ethical issues should not be a concern, as all information that will be used are freely and readily available. As it will not deal with any primary sources. The most important considerations will be referencing sources correctly and ensuring that all the information is cited correctly.
- Compliance Requirement: It is important to make sure that the current project will be ensured to comply with the project standards of the Australian Computer Society (ACS) and also with the government guidelines and expectations.
Some of the techniques and environment that can be used in this research are:
- Python Program:
Python is a programming language that is becoming popular day by day in analysing of data. With the help of libraries such as pandas, numpy, matplotlib we can easily clean and mine the data and get good results in data analysis.
- Supervised Machine Learning:
Supervised algorithm is a machine learning approach which is mainly used to enable machines to identify/classify or predict the objects or issues that are based on labelled data provided to the machine. For this project, classification types of algorithm are used to analyse the dataset.
- Unsupervised Machine Learning:
Unsupervised algorithm is another machine learning purpose that are used to analyze and cluster unlabelled data sets. Clustering types of algorithm can be used in this research as unsupervised algorithm.
- Parameters
Along with other technique some parameters are also used in this research such as accuracy, precision, recall.
Project Plan
- Deliverables: The present project’s deliverables include an annotated bibliography which should include concise information from 12 journal articles related to anemia prediction using machine learning algorithm. Other deliverables include journal paper, report and finally seminar is done for completion of the project.
- Work Breakdown Structure (WBS): For the current project, the WBS is constructed as follows:
Gantt Chart
Risk Analysis: The following risk matrix shows the current risk for the project.
Risk ID | Description | Likelihood | Consequence | Treatment |
1 | Inadequate time management results in failure to submit deliverable on time | High | High | Stick to the project plan and schedule. Make sure to create and update weekly progress report |
2 | Lack of Knowledge and experience in the domain | Medium | High | Read thoroughly and seek help/guidance from the lecturer or Senior. |
3 | No connection between research results and research question | Medium | High | Deep study methodology to match with the purpose of question. |
References
Jaiswal, M., Srivastava, A., & Siddiqui, T. (2018). Machine Learning Algorithms for Anemia Disease Prediction. Lecture Notes In Electrical Engineering. https://doi.org/10.1007/978-981-13-2685-1_44
KILICARSLAN, S., CELIK, M., & SAHIN, Ş. (2021). Hybrid models based on genetic algorithm and deep learning algorithms for nutritional Anemia disease classification. Biomedical Signal Processing And Control, 63, 102231. https://doi.org/10.1016/j.bspc.2020.102231
Meena, K., Tayal, D., Gupta, V., & Fatima, A. (2019). Using classification techniques for statistical analysis of Anemia. Artificial Intelligence In Medicine, 94. https://doi.org/10.1016/j.artmed.2019.02.005
Sanap, S., Nagori, M., & Kshirsagar, V. (2011). Classification of Anemia Using Data Mining Techniques. Swarm, Evolutionary, And Memetic Computing. https://doi.org/10.1007/978-3-642-27242-4_14
Sasikala, N., Banu, G., Babiker, T., & Rajpoot, P. (2021). A Role of Data Mining Techniques to Predict Anemia Disease. International Journal Of Computer Applications, 174(20). https://doi.org/10.5120/ijca2021921090