With 30 years of experience in the University of Illinois Computer Science department and a stint at Google, Sam Kamin is making the transition into bootcamps. He’s currently designing the curriculum for NYC Data Science Academy’s Data Engineering course. We chat with Sam about the differences between traditional education and coding bootcamps, the world of Data Engineering, and how the NYC Data Science team is preparing for the first day of class on August 24th.
Do you have experience with education or in data engineering?
I was a professor for 30 years at the University of Illinois; I was in charge of the undergraduate Computer Science program for a long time, so my main experience is in education. I also did research and publishing, mostly on programming languages and some parallelism. I went to work at Google in New York, which is all about big data- everything is running on gigantic clusters.
My sister took a class at NYC Data Science Academy and now works here; when they needed someone to teach this new Data Engineering program, it seemed like a great opportunity.
As a professor at a pretty huge research university, did you have to be convinced of this bootcamp model at all?
I was mostly convinced. In the CS program at Illinois, we do have some balance between practical learning and theoretical or more fundamental education. Within the faculty – I’m sure this is true in every department- there’s a range. Some people think on one extreme that programming is just the details that you learn once you know the theory. Other professors think students need to have practical skills to get jobs and that it’s really hard to understand the theory without practice.
I tended to side with a more practical education. So for me, the tension wasn’t that great. I talk to a lot of people who work on Hadoop, for example, who have been in computer science for years; in 6 weeks, you’re not going to train someone to that level. But on the other hand, I meet people who programmed for 6 weeks and have great jobs. The industry is big enough that it can support a wide variety and depth of knowledge.
What is the difference between Data Engineering and Data Science?
Data Engineering focuses on handling big data whereas Data Science focuses on analysis of that data, machine learning, and statistics. Data Science consists of a lot more visualization, whereas Data Engineering is about handling large amounts of data using Hadoop and clusters.
Which programming languages and frameworks will you be teaching in the Data Engineering class?
Python is used everywhere and we’re going to be relying on it heavily, so the students will become expert Python programmers. Python is a general purpose language that we use to grab and massage data, and Hadoop is the framework that stores the data and allows us to process it. So we will teach Python and Hadoop components like MapReduce, Hive, Pig, Sqoop, and others.
Spark is a tool that allows for straightforward analysis on large amounts of data. It’s calling card is that it’s considered to be more efficient than MapReduce. In Data Engineering, there are a lot of different ways of getting at the same thing and analyzing big data. Different companies use different tools; Spark is the latest, so we will cover Spark.
In contrast to the Data Science bootcamp, we won’t be teaching R. R is used by statisticians, but not much by Data Engineers.
Should applicants for the data engineering class have some experience in programming with Python already?
Yes and no; we’re not requiring it but the applicants we’ve seen so far do have some experience. We’re not getting students who are complete newbies – but there’s a big difference between having some experience and being an expert, which is what we intend them to be when they graduate.
We’re going to be teaching people who have some programming experience but probably don’t have PhDs- more bachelor’s degrees.
My understanding is that in Data Science, companies are looking for people with higher degrees. But here that’s not necessarily the case.
What is the application like for the Data Engineering class? Is there a coding challenge?
The application process is fairly straightforward. Here is the link to the application. We are mostly interested in people’s background and reasons for wanting to become a data engineer. I don’t require any samples of code on the application, though sometimes I will ask applicants to send me a sample. Based on their background and the interview we can assess if we think they will do well in the program.
How long is the Data Engineering class?
The Data Engineering class is 6 weeks long and the Data Science program is 12 weeks. We’re offering Data Engineering for the first time so we may tweak that in the future.
What is your teaching style? Will the course be project-based or will it be a lot of lecture?
Fundamentally, it’s going to be a combination of lecture and hands-on, interspersed. There will be homework every night and projects where students will be asked to find their own data sources that they’re interested in and do something with that data.
There’s the overriding imperative to produce a resume or a portfolio and get these students to where they really understand not only the technology we’re teaching but the general lay of the land in the field so that they will interview well and have something to show.
We’ll also do things like pair programming and code reviews. Students will be expected to find some new technologies on the web and give lectures on them, so a pretty broad range of things, but the core of it will be a lecture/lab kind of environment.
I’ve only rarely taught to a small number of students. Even in graduate level classes at Illinois, I had 30 or 40 students. So that will be a different experience and I want to try a bunch of things.
How much emphasis is there on job placement?
We do mock interviews and we have hiring partners. We also host a lot of meetups and have speakers from real companies give talks to the students about what it’s like in the real world. We do a lot to make sure that students will be able to interview well or have something to show. I’ve been spending all my time developing the curriculum, but Janet and Vivian are always working on job prep and hiring partnerships.
Will you give pre-work for students to do before they actually get to the bootcamp?
I think any professor will tell you this: you can give people pre-work but you can’t depend on their having done it. We like Learning Python the Hard Way. But in the case of data engineering, I’m not really assuming any prior knowledge so there’s not really any preparation. I give students suggestions of things they should do but in my experience, it’s not something you should rely upon if you’re teaching.
What is the ideal cohort size for the Data Engineering class?
We will keep this class under 20 students– it is our first time offering this class and we’ll have an instructor and a TA. That’s a nice ratio that we need to maintain if we’re going to support every student and make sure they can all do the homework and projects.
One thing I’ve noticed from observing the Data Science course is that instructors are meeting every day to talk about what happened in the class, which students are having problems, what could be improved in the curriculum, and which students need extra help and on what.
Did you see a similar feedback loop at the undergraduate university level?
No. University is different- every professor teaches what they want, the way they want, more or less. There’s no effective oversight.
Furthermore, the classes are almost never lecture/lab so you don’t get an idea of how well the students are understanding material. I think every school has student evaluations at the very end of a semester, but those serve no purpose in helping the professor improve that semester, and they certainly have nothing to do with a hands-on approach to helping students. I worked hard to teach what I considered the important material for students to learn at that time- and I think almost every professor does- but there’s no real oversight to speak of.
Do you expect every student to make it through the class or do you expect some attrition?
We don’t have traditional tests and our expectation is that everyone will make it through. We’re certainly not planning to weed anyone out. We’re all about supporting every student. Everyone here is really focused on making sure that every single student does well.