Posts

Showing posts from October, 2021

Un supervised learning -Clustering

Image
Unsupervised learning: Unsupervised machine learning refers to the category of machine learning techniques where models are trained on datasets without labels. Unsupervised learning generally use to discover the patterns in data and reduce high-dimensional data to fewer dimensions.  Here, I did work on some of the clustering algorithm using scikit-learn namely,  KMeans, DBScan, Hierrarchial clustering. Dimentionality reduction and manifold learning Learning the Algorithm: I Personally feel that data cleaning and Preprocessing are challanging than training the model. Once you finished those 80% of your work is done. Then you can play around with different type of machine learning algorithm. Each algorithms are effective on its own ways. I learned the clustering algorithms through the "iris" dataset in seaborn.  Lets see some of my learning phase of unsupervised learning through Visualization: KMeans: DBScan:  Hierrarchial clustering: Dimentionality reduction: Manifold learning

Time - Series

Image
Fb-prophet We can predict the time analysis using fb prophet.Prophet follows the sklearn model API. We create an instance of the Prophet class and the call its fit and predict methods. Even we can mention holidays in the analysis. In this method monthly-milk data is used.  Source code

ML- Decision tree and Random forest

Image
Introduction: Decision tree in general represent the hierarchical series of binary decision. The decision tree in the machine learning works in exactly the same way and except that we let the computer figure out the optimal structure and hierarchy of decision, instead coming up the criteria manually. In this model, I took the Australian weather dataset for forecasting.  Data Preprocessing: We'll perform the following steps to prepare the dataset for training: Create a train/test/validation split. Identify input and target columns. Identify numeric and categorical columns. Impute missing values. Scale the numeric value. Encode categorical columns to one-hot vector. Data Visualization: Tree is split on the basis of gini index.   Plot is based on the important feature of Weather prediction. Hyperparamter tuning: What we observe is that training model is 99% accuracy and validation set is just above the average, which means machine is memorizing the data in order to increase the accura