Work flow to build a Machine Learning Algorithm
In previous blog we discussed about what is Machine Learning and types of algorithms
Data Reduction:
Work Flow to build a Machine Learning
Algorithm:
Earlier we
discussed about Machine Learning and its types, in this document we will
discuss about the work flow to build a model and working on historical data
(Data Preprocessing). Before building a model, we need to transform data
unstructured format like incomplete data, inconsistent and lacking trends to
structured format. Most of the times we gather data from different sources it
consists of different formats which is not feasible for analysis and
prediction.
Data goes
through a series of steps during preprocessing:
·
Data
Cleaning
·
Data
Integration
·
Data
Transformation
·
Data
Reduction etc.
Data Cleaning:
Data can be
cleaned by filling missing values i.e. (There are some imputing techniques to
fill the missing values), smoothing noisy data and removing unused columns like
id columns.
Data Integration:
Data from
different sources are put together at one place.
Data Transformation:
In this step
the data is normalized, aggregated and generalized.
This step aims to present a
reduced representation of the data in a data warehouse like applying slicing
and dicing operations.
Model Workflow:
After doing different
data preprocessing techniques our data is ready to build the model
Step1:
After getting final historical data split data into 2 parts
as Training (70%) and Testing data (30%) sets. First, we will train our model
with training data according to requirement and test the data with testing data
set.
Now depending on the accuracy and performance of our model we
check whether our model is overfit or underfit.
To resolve this problem, need to take certain measures while
building the model, if the problem is with accuracy (less accuracy) we need to
take measures according to algorithm we are using.
Step2:
After ensuring that model performance is good and having good
accuracy, we will pass new data for prediction and built the reports
accordingly.
Comments
Post a Comment