In 2018, 66% of data scientists reported using Python every day, which makes Python the number one language for data science! But how much Python do you need to know for a data science bootcamp? Kim Fessel, Instructor at Metis’ online Immersive Data Science Bootcamp breaks down what Python is actually used for in data science, exactly how much Python you need to know before you start a data science bootcamp, and her favorite resources to learn and practice Python for free. Watch the video or read on to learn more!
Python is one of the world's most popular programming languages, and there are a few reasons why Python is so popular:
Python’s syntax, or the words and symbols used in order to make a computer program work, is simple and intuitive. They're basically English words!
Python supports various paradigms, but most people would describe Python as an object oriented-programming language. In an object-oriented programming language, everything you create is an object, different objects have different properties, and you can operate on different objects in different ways.
Python integrates well with other software components, making it a general purpose language that can be used to build a full end-to-end pipeline – starting with data, cleaning a model, and building that straight into production.
What can Python be used for besides data science?
The better question is what can't it be used for? Here are some key places where you may see Python:
Web Development – Developers, engineers, and data scientists use Python for web scraping or creating a mock-up an app.
Automating Reports – Analysts or product managers who need to make the same Excel report every single week can use Python to help create reports and save time.
Finance and Business – Used for reporting, predictive models, and academic research.
Simulations – As a postdoctoral fellow at Ohio State University, my colleagues used Python to create simulations to study various different behaviors with a computer.
Why do you think Python has recently overtaken R in popularity among data scientists?
There are a couple of reasons I think Python has taken off. Python is a general purpose language, used by data scientists and developers, which makes it easy to collaborate across your organization through its simple syntax. People choose to use Python so that they can communicate with other people. The other reason is rooted in academic research and statistical models. I would say that R has better statistical packages than Python, but Python has deep learning, structured ways to do machine learning, and can deal with larger amounts of data. As people shift more to deep learning, the bias has been shifting toward Python.
Python is an excellent first programming language for beginners because its simple syntax allows you to quickly hit the ground running. Python is flexible in that you can use it to do just about anything. It's also forgiving! Python will try to interpret what you mean. Let's say we wanted to add together two words like school and house. In our minds, we would link these two words by using the plus symbol (school + house) which is exactly how you would do it using Python! Python is also one of those languages that leaves plenty of room for growth and ways to improve your code.
Within any field, you have to get the fundamentals of Python down first before you can move on to more interesting things. Here’s a list of fundamentals you can started with in order:
Understand what data types are (integers, strings, floating point numbers) and how all of those data types are different.
Learn loops and conditionals – Loops execute a block of code several times and conditionals tell the program when to stop executing that block of code.
Learn how to manipulate data – Practice this by reading data into your Python program and then doing some kind of computations on it, cleaning it up, and maybe even writing it out to a CSV file. You'll want to understand exactly how you can manipulate data because that is a huge part of a data scientist’s job.
Algorithms – use algorithms to build models and maybe even create your own models.
Data Visualizations - This is my favorite part of data science! There are multiple Python libraries or packages to help you do this.
Communication – Begin communicating these things that you’ve learned in a way that other people can explain to solidify that learning.
What level of Python would someone need to know before they apply to Metis or any data science bootcamp?
There are a couple of fundamentals that you need to get down before you move on to something more complicated. Those basic parts of Python, definitely data types and data structures, lists, the dictionary, those kinds of types of constructs.
You'll also want to know at least these three basics:
Conditionals – true and false tests. You'll basically have some kind of input, you will test it against a condition, and if that test happens to be true, you'll execute one block of code. If it's false, you might execute a totally different block of codes. It's kind of a gatekeeper.
Loops – repeatable pieces of code. Anytime you need to repeat the same actions on many different items in a group, you might write a loop for that. This would execute over all the different elements in your group of inputs to produce some kind of standard outputs.
Functions – reusable code, not to be confused with repeatable code. If you want to perform the same type of calculation at various points in your code, you’ll write a function. You can reuse that bit of code any time you want the same outputs.
To apply to Metis, you’ll at least need to be able to solve a conditional statement and be able to check inputs against some true or false statement and then perform various actions depending on if that was true or false.
Installing new technology on your computer can be tough when you’re first beginning to learn how to code, but if you are using a Mac, you already have Python pre-installed! All a Mac user has to do is open up their terminal application, type the word Python in lowercase letters, hit enter, and you're ready to do Python code. You can do the example mentioned in the video to test it out!
If you're not on a Mac – or even if you are, Metis often recommends data science students install Anaconda. Anaconda is an all purpose Python package that is available for both Mac and Windows, so it doesn't matter what you're using. When you install Anaconda, you're also installing common libraries that actual data scientists use. Anaconda also comes with Jupyter Notebook, which is a great tool for beginners to use.
It's hard to talk about Python without talking about libraries. A library is a collection of saved code that someone else has written for you. You can import various bits of code so that you don't have to do everything on your own!
A few libraries that are perfect for beginners:
Random – This is used to generate random numbers, which can be interesting. You could build your own game using this.
Math – This one gives you access to all kinds of math functions like square root, cos, sine, and more.
Collections – This will help you interface with your computer or collections, which gives you actual access to additional data structure types within Python.
Once you have a handle on the fundamentals, our Metis bootcamp students learn:
Pandas – For data wrangling and data manipulation because it allows a user to read data in, change it, look for missing values, read data out.
NumPy – For fast computation because it speeds up all of the different calculations that you're doing. Pandas actually uses NumPy under the hood for some of its calculations!
Scikit-Learn – For machine learning because it has all of the algorithms you'll want to use for regression, classification, and unsupervised learning. When you’re deep in the Immersive Data Science Bootcamp, you’ll be leveraging Scikit-Learn pretty heavily.
Matplotlib and Seaborn – For data visualizations. The most common ones will both be able to help you produce some nice visuals.
Jupyter Notebook is an Integrated Development Environment (IDE), and it’s critical in the learning space for two reasons:
It helps you understand what your code is doing instantaneously. You'll be writing small blocks of code in cells and then executing that code immediately. This gives you instant feedback and shows you errors in your code, shows which functions you might need to change, and more. It allows you to learn more quickly and experiment more conveniently.
You can also write in Jupyter Notebooks with text. You can include a message to yourself and you can even add images! This function is helpful for organizing your thoughts, remembering what you need to fix or change later, making a note about what a certain code block is doing, and recording steps you’re trying to follow. As an instructor, I can include an image of a code block for my students.
Jupyter Notebook is great for building projects, structuring homework, and collaborative projects. The annotation feature is amazing because students can record their thought process and you can use this in a real-world work environment too!
Metis offers a Python for Beginners course and it’s written for people who have never seen Python before. The course starts with, “What is Python?” and then goes through the various different fundamentals in more depth. Metis also offers a free Intro to Python video from their Demystifying Data Science 2019 conference.
Once you've got the Python fundamentals down, try Metis’ Beginner Python & Math for Data Science course. This is a great course for people who are serious about a data science career but not quite ready to take the bootcamp. This course will help you brush up on both Python and math as it pertains to data science. From there, you can enroll in Metis’ Immersive Data Science Bootcamp, which covers machine learning and the visualizations.
In order to understand Python, you have practice. The more you practice, the better you're going to get! Two Python practice learning resources are:
Check.io, which is a gamified way to learn Python. You'll be completing challenges and progressing along this game board.
Coding Bat, which has a ton of different practice problems. If you are looking to practice, practice, practice, that's another great spot.
I’ve also launched a YouTube series! My new series is an intro to Seaborn, which is a visualization package. If you are at the level where you know a little bit of Python and are ready to start visualizing data, that could be a helpful resource as well.
If you complete Metis’ Python for Beginners course would you be ready to apply to Metis’ Data Science Bootcamp?
The Python for Beginners course will really launch your journey because it gets you comfortable with programming in general. Then, we cover data types, which you have to have down before you can move on. Next, we go through each of those three core foundations: the conditionals, the loops, and the functions.
What is your advice for a complete beginner who is beginning to learn Python?
Sometimes beginners can get frustrated because they want to automatically be excellent at Python. It's going to take practice. It's about every single day getting a little bit better. Maybe you're not solving everything from the beginning, but know that you are getting incrementally better. As long as you're willing to put in the work to get a little bit better every day, you're going to be off to a great start with Python.
A digital walk-through of Hack Reactor's remote immersive bootcamp!
Learn how General Assembly's online classroom keeps students connected, plus top job search tips!
Two Flatiron School instructors explain: "What's the difference between Analytics and Engineering?"