Posts

SQL vs Pandas

Understand when to use what? As a data scientist, I've always found pandas to be an excellent tool for data manipulation. Its seamless integration with NumPy allows for fast mathematical operations, making it perfect for working with smaller datasets. However, as I started working with larger datasets and had to perform data-cleaning tasks, my initial approach was to use regex in pandas. My work involves managing datasets that are in the millions and need to be processed daily. Our data orchestration tool is Airflow, but running code on Airflow for an hour wasn't an option. This is where my senior said we are using Snowflake, a NoSQL database that can run queries to clean the data for millions of data within lesser time. I used regex in Snowflake queries to clean the data, and the processing time was significantly faster than running it on pandas. While pandas is an excellent tool for smaller datasets, it's not suitable for all data manipulation tasks. I realized the import

Best practice to follow on git and github

Some basics: linux Note: <> just used to differentiate the file for understanding : ls -list the file mkdir <folder name> - creach the file name touch <file name> - create the file code . - take to vs code vi <filename >- open the vim editor - Don’t dive deep into vim editor There are two mode Edit mode and command mode You can press Esc to enter cmd mode to give command like insert delete update and so on  i - for insert a text in a file O, o move the cursor above and below :wq - write and save the file These are the basic about vim editor but don’t spend too much time on this use other editor for faster progress and so on We can directly open the file and made the changes Vim is fun to learn cat <file name > - show what are there in that file rm -rf <file/folder name> -remove the the file or folder from the current directory cd - change the directory(where we are working and our file path) cd .. go back the previous folder in a directory Git confi

Paddy Doctor: Paddy Disease Classification

Introduction: This time, I don't want to write the hurdles in fastbook and how to overcome them because once you move to notebook 6 or 7 it shows that you have the strong mindset to overcome by yourself and finish the book. Here after we have to practise what we learn by participating in Kaggle Competitions .  Problem statement: Identify the type of disease present in paddy leaf images. Understanding the dataset Labels are given in a separate csv file, train data set contains 10 folders each containing an image of paddy disease with respect to folder name, test data contain the images we have to predict. Model Implementation Once we Explore the dataset, the next step is to build a data block for the problem. We can create using two ways they are: We may map the label of the disease name to the image by csv file. Or we may label the data according to the folder name. What I chose is mapping the label of the images with respect to folder names. Once the data loaders are ready we can

Fastbook notebook_5

Notebook_5 deep dive to Image classification problem and explain the mathematical concepts in an easy understandable code way. I don't want to explain the concepts in the notebook because Jeremy and Sylvian explained in an amazing way. What I am going to discuss is that I face difficulties in some areas you might come across the same issue. That's what we are going to discuss. Once the dataset is downloaded you may come across why we are using the Path.BASE_PATH = path ,if we don't do this means we are going to write the entire path of the directory. Path.BASE_PATH = path performs the same as pathlib module. Data block is like a blueprint for the model we are going to build. While building the data block you may come across an interesting library called regex, don't get intimidated and learn everything about regex, I am sure you may get that in the deep learning journey. Go with the flow learn what you want to learn like just know what it's like $ took the last digi

Fastbook notebook(3 & 4)

Basics of deep learning Lecture 3 is about data ethics, It's self Explanatory and Rachel’s Lecture is more than sufficient.  Lecture 4 is more about breaking the myth behind neural networks. And the basics of pytorch and how fastai api makes deep learning cool. You may come across jargon like Sigmoid, Relu, Stochastic Gradient Descent, Learning rate and so on. Their explanation is great. They explained maths with a piece of code and I completely enjoyed it. What i would suggest is play with notebook 4 and learn more things. In case you don't understand something, go through it again and again and you will catch up. If you want some help ask the question in the forum. You can practise lecture 4 by downloading the repository and going to the clean folder, there is only code not with a prose. It's the best place to practise what you have learned till now. The motto is, If you are in doubt, run the code .

Image classification(Fastbook notebook_2)

Image
Hi Friends, What i am so excited about notebook2 is we are going to deploy a model, which helps to understand the flow of deep learning. I am going to discuss the problem I faced, may be you may face in your journey. First and foremost thing is you may not be able to get the bing api. If you are getting it, It's ok but if you face any difficulties means use DuckDuck go api which comes very much handy. Once you downloaded the dataset Everything worked fine until the deployment part.  What I would suggest is if you are using the colab like me means you may face difficulties in using viola. What I will suggest is to go for Gradio which is much easier. With an few line of code you may built a amazing application. My project is about classication of great ape(Chimpanzee, Bonobos and Gorilla) eg: In case you may face any difficulties go through my notebook . Feel free to ask the doubts.

Fastbook notebook_1

  If you are new to deep learning or stucked like me in the middle of your journey to Artificial Intelligence.  There is no good source as compared to fastai. I like the practical approach of the fastai and the teaching of the Jeremy's Top Down approach. I don't want to write a blog about what are the concepts in lecture 1 because there are so many people in the fastai forum who are wonderful in explaining the concepts. If you have any doubt ask in the forum, There is no such thing as a good question or bad question everyone in the community is helping each other out. No one is going to judge you. I am writing this because you may also face the same problem like me and it may help you in the journey. For each notebook I am going to share my experience and this is about Notebook Lecture 1(Intro). I strongly suggest you watch Jeremy's lectures before going through the notebook. First oscillation in my mind is which platform I should use for the lecture whether it is google c