ML- Decision tree and Random forest

Introduction:

Decision tree in general represent the hierarchical series of binary decision. The decision tree in the machine learning works in exactly the same way and except that we let the computer figure out the optimal structure and hierarchy of decision, instead coming up the criteria manually. In this model, I took the Australian weather dataset for forecasting. 

Data Preprocessing:

We'll perform the following steps to prepare the dataset for training:
  1. Create a train/test/validation split.
  2. Identify input and target columns.
  3. Identify numeric and categorical columns.
  4. Impute missing values.
  5. Scale the numeric value.
  6. Encode categorical columns to one-hot vector.

Data Visualization:



Tree is split on the basis of gini index.

 

Plot is based on the important feature of Weather prediction.

Hyperparamter tuning:

What we observe is that training model is 99% accuracy and validation set is just above the average, which means machine is memorizing the data in order to increase the accuracy and reduce the overfitting and underfitting we will do the Hyperparameter tuning.

Random Forest classifiers:

While tuning the hyperparameter of single decision tree may lead to some improvements, a much more effective strategy is to combile the result of several decision trees trained with slightly different parameters. This is called a random forest model.

Conclusion:

The key idea here is that each decision tree in the forest will make different kinds of errors and upon avaeraging, many of their errors will cancel out. This idea is also commonly known as "WISDOM  OF THE CROWD".


Comments

Popular posts from this blog

Best practice to follow on git and github

Deep learning with TensorFlow

Image classification(Fastbook notebook_2)