Starting to learn Data Science ?!

rajat aggarwal
3 min readAug 17, 2020
Photo by Campaign Creators on Unsplash

Being a data scientist is very exciting and it’s already said as the hottest profession of the decade, but it isn’t just training and running a Machine Learning model. It’s a lot more than that, it consists of many fields including data cleaning, data wrangling, data engineering (though some people may count it as a different profession ), training an ML model (obviously).

There are many compulsory steps which fall under data science before we actually apply the machine learning algorithm in it. These techniques are often used when the data used for the machine learning algorithm is text. As the numbers could easily be used in the algorithm cause every character in the elements of data is useful.

But the text is different, it can have many unwanted things like punctuations, conjunctions and meaningless words ( like a, an, the ..) can occur repeatedly and it could shift the output in the wrong direction.

Then we make sure that all the text in the data is the same either lowercase or uppercase. As the algorithm could read 'yes', 'Yes’, 'YES' as totally different words. We do this by using:

The next step we could see how many each word occurs in the sample of the data, and we could see that the words like a the occur the most it could shift the scope of machine learning algorithm toward these words rather than use words.

The first step is to load the data from the data file, but usually, the data file is in the JSON format which is available in an API only

And the words like conjunction could be removed by tokenization and lemmatization.

Well, the fun part about being in this profession is to create new things that we could have never imagined to be have done. The main part of a machine learning process is to learn new things. And the best algorithm to be able to do such a feat is the neural networks

The neural network could find a similarity in the best way and we could train any type of data in a neural network. We just gave to make some changes here and there and voila. We have ourselves a custom build model just for our data.

The main problem facing this is that it just works too well and it has a risk of overfitting but that can also be resolved by adding a dropout layer.

In conclusion, data science is a profession that is not guided by the daily tasks given as in other similar software professions. In this we have to find something new, well that could be a new insight, some discovery or just some improvement from before. But each day is more exciting than the previous one.

So, congratulations to all the data science aspirants.

--

--

rajat aggarwal

Software Engineer , hope you’ll appreciate the content. Also follow me on vocal @ https://vocal.media/authors/rajat-ir2e1e0bfm