Importance of Math in Data Science

Do I really need to learn so much math?!?!

This blog post aims to discuss and outline my view on the level of math knowledge needed for the data science.

To be clear with you I never had exceptional knowledge of math even studying at school. However, even then I was trying to learn and understand math concepts in the way that I can remember. Additionally, I think many students in latest grades tend to be lazy and not learn as much. Unfortunately, this also has happened to me as well. So I lost some valuable time and did not learn important concepts…

This lack of math was first revealed to me when I was applying to a university and I should have taken GRE test to apply to an MBA program. That is when I understood that I am falling behind and need to catch up. So I started to revise and learn various topics in linear algebra, probability, geometry, calculus etc.. After a few months I took the test and passed it with mark that was acceptable for my university. However, it was clear to me that I need to substantially improve my math. This was solidified when I learned about Data Science, Machine Learning and all various algorithms. Everything was based on math, so to understand them I need to know math!

I started taking various online courses on Coursera with great instructors. Sometimes I took the same course several times and I was finding new things which I have missed or just forgot previously. With this knowledge I was able to follow lectures and understand when professor was trying to explain formulas behind an algorithm.

An important point that I learned along the way is that idea behind Machine Learning or Deep Learning is just that we try to find a function, which will map given set of features X to target variable y. X should be array of numbers representing features of person, pixel values of image, intensity of audio signal, encoded words, etc. Target variable should be again number representing class of observation (ham vs spam, etc.), a float or an integer number representing some value for record. The model just tries to find a relationship and represent it with some mathematical function. Again this function cannot in any way perfectly fit data, i.e. always be right, but decent model should be a good proxy for the real function that maps X to y.

In all of these algorithms there is some measurement of performance or error and algorithm tries to minimize it. In case of Deep Learning, there are concepts from Linear Algebra and Calculus you should know to understand it.

To summarize, you cannot be good a data-scientist with basic knowledge of math but you don’t have to obtain PHD in Math to be able to work as a data-scientist. So the answer is to find some middle spot and remember additional math knowledge will never hurt as I firmly believe that all data-scientists will have at some point in career very cool and important project for which you will need to know math well.

So the conclusion is: it is definitely worthwhile to invest time in math!