Wikipedia defines data science as an interdisciplinary field about scientific processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured. Whew! That is a tad too long, isnít it? Well, the gist is, it is a study that converts raw data into plain English, savvy? And the people who do this, are called data scientists. So what exactly do they do and why? Read on to find out.
We live in a time where a significant portion of trade and related activity occurs online. Being online has its pros and cons, and the possibility that your activity can be tracked is both. It is advantageous because, say you were shopping for a car and were looking up different models online, so if this were tracked, you will soon see ads for car insurance, which you will surely need, aiding you in signing up for an attractive one eventually. This is advantageous to you, as you neednít specifically search for insurances or related products but it is significantly more advantageous to the insurance company, because before the ad was displayed, a series of predictions based on the models of cars you were looking up and the company would push the product you will most likely buy provided you see it. The disadvantage is that anything and everything you do online can be linked to you and with this info, you lose your privacy. But we are not here to discuss the ethics of it, we are here to learn about the facilitators of this process. This is purely done by algorithms developed by data scientists. They study user data available to them and discern patterns from this huge data set and use it to predict future user activity. This process is not limited to advertising (although it is the one with the most immediate monetary gratification), it is also how friend or connection suggestions on Facebook and Linkedin work. Because Facebook and Linkedin know your personal info, they know the people you are most likely to know and can suggest them to you.
Coming back to advertising, Facebook knows exactly what you like and donít, but there are millions of users and a multitude of combination of likes and dislikes possible, so how can one make the ads are specific or targeted and also if there is insufficient data, what then? This is also something the data scientists work on, not only is it their job to make predictions, they are also supposed devise methods to bridge gaps by creating new means to collect data, after all, what good is the analysis when the data is scant and small.
So, what skills does a data scientist need?
1. First, and most basic is the ability to write code and communicate their discoveries verbally or visually or- ideally a combination of both.
2. Next would be an intense curiosity to determine the underlying cause of everything, it doesnít matter if the data goes one level deep or a thousand, the curiosity to get to the last level is a must.
3. And probably the most indispensable of all these would be an understanding of math, statistics, probability and computer science.
If we are to be a bit more specific, a data scientist possesses a combination of analytic, machine learning, data mining and statistical skills and enough coding experience to use the concepts.
What are their core responsibilities?
1. Extract and structure huge volumes of data from multiple sources.
2. Analyse the data with relevant tools and discard irrelevant data.
3. And devise data driven solutions to problems present.
In short, collect data, clean it, make sense of it and solve the problem.
As a bottom line, here is a tweet that sums up what data scientists are:
Data scientists are better statisticians than most programmers and better programmers than most statisticians (by Michael E. Driscoll).