In this webinar, we're talking about why you need math skills to be a Data Scientist and exactly which types of math you need to know. Kim Fessel, a Senior Data Scientist and Instructor at Metis, joins us to share her expertise! From Linear Algebra to Calculus to Probability and Statistics, Kim walks you through each math subject and solves four sample problems to get you ready for a data science bootcamp.
Kim Fessel is a Senior Data Scientist and Instructor at Metis at the New York City campus.
Kim’s background is primarily in applied mathematics – she has a Ph.D. in Applied Mathematics from the Rensselaer Polytechnic Institute and did a postdoctoral fellowship at The Ohio State University in Math Biology
Before teaching at Metis, Kim was working on natural language processing at an ad agency. The agency started an internal training program and Kim immediately signed up to be an instructor. Now, Metis gives her “the opportunity to think about data science and spend time in the classroom where I can watch others progress on their journeys, which is really rewarding.”
For a complete beginner, what is data science?
Data Science blends math skills, coding skills, and business intelligence. Humans are producing way more data than other humans could ever have a chance to look at. Data Scientists are trying to understand data.
The data portion is developing algorithms that will help us learn about our world in a more structured way. We're essentially looking for patterns. “Data" doesn’t just mean numbers. Data could be a lot of unstructured things like text, images, or audio files.
The science part is how we make sense of data. We found out pretty quickly that you can't put data into black-box algorithms. We need to take a scientific and rigorous approach to design the setup, ask the right questions, and challenge our hypotheses. All of that falls under the scientific approach.
What role does math actually play in data science?
Blending coding skills with math skills is the core of data science. The algorithms that we use in data science are all worded in mathematics. Whether its an optimization problem, probability problem, or scoring metrics – all of those things are going to require math skills to understand what's going on. Here’s an analogy I like to use to explain this concept: in order to drive your car, you don't necessarily need to know how it all works. But if you're going to be a professional mechanic, you have to know all of those component pieces. For data science, the components are math concepts.
Who is the ideal Data Scientist?
A lot of people in data science right now either come from the math world or the coding world. You definitely don’t need a math PhD like me, but it's fine if you have one! Either way, you're probably going to have to scale up in one area or the other or maybe even both.
Are there any non-technical skills that you need in order to be a data scientist?
Besides those technical abilities, which are important, a Data Scientist must be creative. We're not playing around with data only for the fun of it! We’re solving real problems that a user or customer is facing. We often have to be creative with how we solve those problems to meet an overarching goal. That creativity can be in data sourcing, feature engineering, or how we're going to combine those results. Any of those things require a high level of creativity.
A big thing we look for in Metis applicants is grit. Dealing with problem-solving every day, you'll definitely have issues as you progress through your project. Do you have the determination to creatively solve whatever problem you’re working on? Can you stick with it? That's what grit is.
Finally, communication skills are huge. I can't say this enough! As a Data Scientist, you take on this role of both executing your work well and being a champion for your work. In order to be a champion of your work, you need to be able to tell stakeholders what you found in the data and tell them what recommendations you're going to make based on the things found in the data. You have to be able to communicate both verbally and in written communication.
Have you taught students at Metis who didn't have that formal training in programming or math who were still successful?
Absolutely. I've had students that had training in solely one or the other (programming or math) as well. I have had students who didn't come from any of those fields. That's totally fine too, it's just a matter of putting in a bit of extra work.
So what specific math do you need to know to become a Data Scientist?
Linear Algebra, Calculus, Probability, and Statistics are the four core math concepts for Data Science. The math that you need to know might depend on what area of data science that you specialize in, but these four areas will be important for any data scientist.
What level of these maths should someone know? College courses? High school?
It is possible that a high school level understanding is enough. Some high schools do teach Calculus and Statistics. On the other hand, if you have a collegiate level understanding of these topics you're going to be better off. You don't necessarily have to have a Ph.D. or have majored in math, but having exposure to these topics in college will certainly help you.
What is Linear Algebra?
Linear Algebra is perfectly named. Algebra simply means that we have some unknown quantities and we are trying to solve for them. Linear refers to how the equations look. Any of those unknowns can be multiplied by constants or added together. That's Linear Algebra’s limit. The unknowns can't be raised to a higher power or transform with a log or a sin. It's simply variables multiplied by a constant.
Why is Linear Algebra important for Data Science?
Linear Algebra might be the most important because so many of our algorithms are based in Linear Algebra. You should be comfortable with matrices and vectors. Eventually, we start talking about higher dimensional spaces like vectors and matrices and understanding how those computations work.
I would also recommend that someone who wants to go further with data science should understand not only how to do the numerical computations but also how to get a geometric intuition of what's going on behind the scenes in Linear Algebra. If you want to do something like dimensionality reduction you're going to need to understand vector manipulations.
Example of Linear Algebra for Data Science
What is Calculus?
Calculus is a study of the instantaneous rate of change. Any time you're taking derivatives and trying to understand how a function is changing, that's the first part of the Calculus. The other part of Calculus is integrals – how a quantity accumulates. Integrals and derivatives make up the bulk of Calculus. Those two happen to be intimately linked through the fundamental theorem of Calculus.
What concepts in data science would Calculus show up in?
Some of the algorithms we use are solved with optimization routines and that's going to involve derivatives and rates of change.
A good data science bootcamp student will know how derivatives relate to other things like gradients or finding maxima and minima.
Stochastic gradient descent (SGD) is definitely something data science students should know. That's one of my favorite lectures to teach! SGD is basically a numerical method to come up with a minimum for a function. We end up using optimization functions and minimizing things often in data science. Typically, the way we do that is SGD.
Example of Calculus for Data Science
What is Probability?
Probability is the study of how likely some outcome is. There are some algorithms that are strictly probabilistic in nature. Having a good, strong grasp of what these things mean is important to any Data Scientist.
What kind of Probability concepts should someone know for data science?
The first Probability concept that data science students should brush up on is the random variable. It can get tricky, but ultimately all you need to understand is that a random variable is a variable where some properties of the variable are known but the quantity is not because that variable depends on some random phenomenon. We might know its mean or variance but we're still recognizing that there's some kind of randomness going on.
Example of Probability for Data Science
What is Statistics?
Statistics is essentially what we did with data before there was big data or data science. Statistics concepts developed before we had machine learning and algorithms. We were talking about smaller amounts of data at that time. The mean and variance and those statistical summaries are still important here.
Which Statistics concepts should someone know before applying to a data science bootcamp?
Data distribution is important. This means thinking about what distribution your data comes from and understanding probability mass/density functions and cumulative distribution functions. Once you know a bit about the distribution of your data, you can use that to form a hypothesis test, understand P values, or A/B testing. One cool thing that Statistics will give us is how to understand or quantify uncertainty. You can know how confident you are in your results!
Example of Statistics for Data Science
I'm a complete beginner – how should I start studying math to get ready to apply to Metis or another data science bootcamp?
If I had a limited amount of time to study for the Metis technical assessment to get admitted to bootcamp, the first thing I would do is try a practice assessment - like the Metis Admissions Prep assessment. I would think about these math concepts and try to recognize which ones were difficult for me versus which ones I had some awareness of. Getting an understanding of where your comfort zones are is helpful. That way, you can spend more time on any areas where you know you aren't feeling super comfortable.
That's true not only for math but also for coding! If you're a strong programmer, focus on math. If you're strong in math, focus on coding. Also, consider how you personally learn best. Some people love to crack open a textbook and read while other people learn better from videos. Figure out if you like practice problems, hands-on examples, or reading.
Which math classes or learning resources do you recommend?
There are four online resources I’d recommend:
3Blue1Brown makes great YouTube videos. They have lots of different math topics. Their method of explaining the different concepts is approachable.
Gilbert Strang provides a deeper dive into Linear Algebra. He has a great book on Linear Algebra and some cool lectures from his MIT course online as well.
StatQuest videos are an excellent resource for Statistics concepts.
Lastly, Metis offers a Beginner Python and Math Course. If the unstructured, independent approach isn't working for you, you can find more rigor about what to learn and when with Metis! This is meant to scale you up in math and coding so that they are ready for our bootcamp.
How long is the Metis Beginner Python and Math course and what do students learn?
It's typically six-weeks long. We have a nine-week version as well. It's in the evenings, live online. It's going to walk you through all of the math that we talked about today. You'll get a deep dive into Linear Algebra, a brush up on Calculus and Probability, and you'll even run your own hypothesis test for Statistics. You’ll learn Python loops and functions as well.
Do you have any advice for a student who is thinking about applying to the Metis Data Science bootcamp?
First – Congratulations! Data science is an exciting field. One of the things I love about data science is that there's something for everyone. There's a little bit of math, a bit of coding, visualization, and communication. There are a lot of ways to be a Data Scientist. If you are feeling overwhelmed at all, start with one thing. Pick one type of math or one Python concept and then keep moving up. Little steps every day.
As you are working through various different resources, write down questions that pop up in your mind. Maybe later, after a week or two of learning, you'll know the answer! Or maybe you'll realize that question still hasn't been answered for you, so you'll know where to spend your time. You might also find that some of those questions are obsolete after a week. This can be helpful for both self-reflection and also guide your journey.
8 non-profit bootcamps with a mission to change tech!
How NexGenT helped Iris pivot into network engineering in 24 weeks!
Lighthouse Labs breaks down data viz tools: Matplotlib, Seaborn, and Plotly!