As an instructor at General Assembly’s Data Science Immersive, Phillippa Thomson not only teaches the basics like Python and SQL; she’s also digging deep into neural networks and ensuring that her students become ethical data scientists. Phillippa describes the data science curriculum at General Assembly, why PhDs are no longer a requirement for working in data, and even walks us through two of her favorite student projects!
What was your background and how did that lead you to General Assembly?
I studied mechanical engineering in undergrad, but I spent 8 years in a career in finance. My job evolved to focus more on data using Excel and Visual Basic code. I wanted to continue working with data, but I wanted a change of industry, so I got a Master's Degree in Statistics for Social Sciences at Columbia University, where I learned how to manipulate data in R. When I finished my master’s degree, I wanted to immerse myself in Python, so I enrolled in General Assembly’s Data Science Immersive to learn Python.
I had a great experience at General Assembly. I loved the environment, and the focus on real-world skills instead of theoretical concepts for the sake of it. There's an emphasis on giving people practical skills that allow them to hit the ground running in a job after bootcamp. Once I graduated, I started teaching the Data Science Immersive, and am now on my fourth full-time class.
Was it hard to learn Python after already knowing R?
The transition was pretty easy because the workflow and thought process for data analysis is language-agnostic. There are some pretty big differences between R and Python beyond just the syntax and the way they are set up, but it was easier to learn Python when you’ve already done data analysis. I've heard other people say that going from Python to R is even easier than R to Python.
There are a few data science bootcamps in NYC. What made you want to teach at General Assembly in particular?
I did think about working at different schools, but one thing I love about General Assembly is that they have a global footprint. I'm not American, so it’s important for me to be in an environment where I am coordinating across the world with colleagues. That was a big selling point for me. Plus, in addition to learning Python, the idea of being in a school with web developers, UX people, and product managers was super exciting to me.
I had also seen (and liked) the culture at GA. There's a lot of innovation and creative solutions coming out of their team. That was really compelling for me.
How does General Assembly keep you growing and learning as an instructor?
I had never taught in a classroom setting before General Assembly, although I've had experience tutoring students one-on-one and onboarding interns/newcomers to the company. General Assembly instructors go through training before they start teaching, and we’re given constant feedback on our teaching. Our managers are always making sure that we are professionally growing.
Through the process of teaching data science, you gain a really deep understanding of the material. Data science and the latest technology is always changing. So we do our best to keep up with the changes. I'm always learning new things to keep up with trends.
What’s your personal teaching style?
The students are in class five days a week for 12 weeks, so one of the things I focus on a lot is varying up the cadence of the course as much as I can, and making sure that it's not just me droning on at the front of the class for hours. I’ll give them a little bit of theory, then show them the real world implementation of a concept. We call it the “I do, we do, you do” model. That solidifies the theory, and gets students invested in learning the concepts quickly.
What are students actually learning in the Data Science Immersive?
We teach Python and SQL, because that’s where we see the demand for jobs. We teach everything in the data science workflow: how to collect data, how to find data, and how to clean data. Then we look at the predictive models data scientists use (including traditional models like linear and logistic regression), and the new up and coming models like neural networks and the latest flavor of boosting model.
We focus pretty deeply on what makes a model good. You can have the fanciest and most advanced models in the world, but if the data you're putting in is wrong or bad, or the model doesn’t suit the data, then it's futile to attempt to find the answer. We try to instill that approach in students as early as possible.
How do you keep the curriculum caught up with data science trends?
We try to keep an eye on students who are graduating and applying for jobs, to see what skills employers are looking for. About six months ago, we noticed that data science jobs were emphasizing SQL. We were already teaching SQL, but we invested a little more class time into SQL in New York, to make sure students were in the best possible position to get jobs when they graduate.
Throughout the course, I’m trying to instill values to teach ethical and honest data scientists. Statistics and numbers hold a lot of weight, and as a data scientist, you’ll be interacting with people who aren't in the data world. You can be intentionally or unintentionally misleading just by the way you present your data. We try to remind students about that as often as possible to make sure students present their findings in a way that’s consistent with the essence of the data. We encourage students to read other sources, look at the statistics critically and skeptically, and apply that kind of filter to the data they're presenting to the world.
Does General Assembly use projects to teach those data models?
Yes. Unit projects build on each other and are evenly spaced throughout the course. The first project is during the Intro to Data and covers descriptive data, and then the fourth and final project is less structured, where students source their own data. We always use real data, because we want students to encounter real issues with real data as soon as possible to learn how to deal with it.
The course culminates with an individual capstone project. Students pick their topic, source their own data, and decide how they're going to answer a business question with our support. By the end of the project, students have gone through the process of self-learning, and also have a portfolio-ready project. Students can then demonstrate the skills they’ve developed to employers.
Tell us about a cool student capstone project that caught your eye!
There have been so many. One recent student had a geophysics background, so he sourced data from the US Geological Survey about earthquakes, then married that with data about the thickness of the earth’s crust. He used self-reported data from people on the ground to try to figure out earthquake magnitudes and predict where people were going to feel an earthquake, and how strongly they were going to feel it across the US. It was one of those cool projects where we as instructors were along for the ride and learned a ton at the same time.
Another project I thought was particularly cool was by a student who played Mahjong online. He built a bot that could learn from other players’ behavior, play the game for him; he managed to make a bot that could play intelligently.
What’s the next big trend in data science you’re seeing, Phillippa?
In the last six months, neural networks have been huge. They’ve had a lot of airtime and we’ve seen some really clever implementations. One popular current example of neural networks is building Twitter bots that can mimic speech patterns.
So General Assembly has now added a few neural network lessons into the course. They're incredibly complex to develop, and historically have been incredibly complex to use, but they're becoming more accessible.
Do you need a PhD to be a good data scientist?
No, absolutely not. Some people are deterred from going into data science because they look at job postings and see a lot of them require PhDs or advanced degrees. But there are huge numbers of jobs in the data world, and many of them don't require advanced degrees at all. We've had some students come through the program with PhDs, but more students come through without PhDs; with similar success rates.
So then what are the prerequisites to get into the Data Science Immersive?
A coding background is not a prerequisite. It's certainly helps, but we have had successful students with no programming experience. We also give pre-work to make sure that everyone arrives on campus able to code the basics.
An interest in data – that natural curiosity about what data can tell you – is a good pre-requisite. But the biggest prerequisite is that students like learning and are intellectually curious. 12 weeks is a lot of hours of study packed into a really short time, so the learning certainly doesn't end when students leave General Assembly. If you enjoy learning new things, that sets you up for enormous success going forward. Going forward as data scientists, grads are undoubtedly going to hit problems and uncover tools they haven't seen before, so they need to be able to constantly self-teach and interact with the online data science community.
Who is the ideal student for the data science immersive?
It's the whole spectrum. Some of our students have worked as data analysts and want to level up, even if they are already employed. Others were working in totally different industries who wanted to make the career transition. Other students just graduated from undergrad and are looking to build out their skill set to make themselves immediately impactful in the workplace. We also have students with 20-30 years of professional experience in unrelated areas, but are looking to make the transition into data.
We've observed that people who succeed are comfortable being a little bit uncomfortable. Students who are open to seeing multiple approaches to the same problem have a much easier, but also more successful, path through the course than people who look for a blueprint on how to approach every single problem.
What’s the difference between data analytics and data science?
In the wild (ie. in job postings), people often use the words data analyst and data scientist interchangeably. But data analytics is very focused on descriptive statistics and describing the current space of data very deeply. That includes how to make visualizations, and how to run statistical analyses on data, but it tends to stop short of making any predictions based on the data.
In data science, we cover how to describe the current state of data, but also make inferences and predictions about unknown data and future data.
What sort of jobs are you seeing your students in the Data Science Immersive land?
We've seen a huge range of jobs out of the program – it varies student to student. We've had a lot of people return to their same industry, but shift jobs and work as junior data scientists. It helps that they have domain expertise already. Other students get jobs as data quality experts. They're the gatekeepers for data to ensure it’s the correct data and is being stored properly. Having a background in data science makes GA grads incredibly effective at that role.
What we call a data scientist today has existed for a really long time in many industries. Now, the term “data scientist” has become this catch-all for people who can work with data, learn from data, and help make business decisions from data, but there isn't uniform vocabulary used across industries. It can be super confusing, particularly for students coming out of bootcamps and applying for jobs, to really understand what a job is looking for.
What are your favorite data science meetups or resources for beginners?
Going to meetups and meeting other data scientists is a good way to introduce yourself to data science. Meetup.com hosts everything from big data meetups to women in data. Go to these before you invest in a bootcamp to learn more about the industry.
Data science is an open community; we're totally spoiled compared to other industries. There’s a wealth of information online and people are super forthcoming about sharing their work. In our Data Science Immersive, we encourage our students to participate in that community as contributors.
Kaggle competitions are a great resource to see the data projects people are doing, and maybe even learn a little code along the way. There are also some great online coding resources where you can learn a little bit of Python just to see how it works.
When students are ready to prepare for the course, students can do the pre-work. Even if a student has never looked at Python online, or never looked at a Kaggle competition on data analysis, the pre-work will get them where they need to be for day one on campus.