Mike McKerns is a data science veteran and instructor at Los Angeles data science bootcamp Logit Academy which starts its first cohort on June 13. In his 37 years as a data scientist, Mike has pioneered new techniques, and founded a nonprofit for advancing the maths behind data science. Mike tells us about his extensive experience, why he’s excited to teach at Logit, and why Python is an ideal language for data scientists to learn.
Tell me about your background and experience in data science.
Data science is a relatively new term. I have been a researcher in data science longer than the field has been called “data science.” For the last 37 years I’ve been using simulations to fit models of experimental data. In machine learning terms, model fitting is generally called regression and/or classification.
I’m an optimization specialist. I write Python programs to solve optimization problems and I build models as well as tools that other scientists and data scientists can use to solve problems. I’m the author of Mystic, which is used for large complex nonlinear optimization problems in both academia and industry. I’m also the author of few Python packages for parallel computing. Some of my core packages are leveraged by popular Python libraries, both in data science libraries and parallel computing.
I work in computing and mathematical science at Caltech. I’m not really a data scientist but somebody who does research that aids data science. I’m a coauthor of Optimal Uncertainty Quantification, OUQ, which is a theory that can rigorously determine the optimal model and optimal bounds for the data and other information you have. I started the UQ Foundation with a few of my colleagues from Caltech with the idea that we could help support, promote, and advance the mathematics and software tools for data science.
I’ve also been teaching data science courses as contract work for a few years.
What did you study at college?
I studied applied physics and I have a PhD in Physics. Back then, you couldn’t get a degree in “data science.” You either had a theoretical or applied degree, and you stuck to your particular domain. There was no real institutional recognition of computational science or data science. So those of us doing it were pioneers, of sorts.
You mentioned that when you were studying, there was no data science major. Do you know if there’s something like that now?
Yes, they exist. I don’t know if those degrees are called data science but, they’ll often say applied and computational math which is basically statistics and data science. Degrees in finance or economics can also involve a lot of data science courses. For theory or a general background, you could get a degree in computational or applied mathematics.
What companies have you worked for over the years?
I’ve worked at J.P Morgan, and more recently I’ve had several contract jobs at companies like Shell, and some hedge funds, and for the US Government. I also have worked for Enthought, and of course for Caltech for many years.
Can you tell me about your experience as a teacher?
I had 7 or 8 years of teaching experience as a graduate student, teaching astronomy and general physics classes. I got my first job at Caltech as a postdoctoral student but I didn’t teach there because I was focusing on research.
I started missing teaching a lot. I was friends with the president and CEO of Enthought, a global company which teaches Python for scientists, engineers, and data scientists. I started teaching again at Enthought. Over the past five years I’ve taught 10 to 15 one-week courses per year at Enthought, and also at Python Academy, all over the world.
What do you think of the idea of this intensive, career-driven data science bootcamp that Logit Data Science offers?
I think it’s a good idea and I’m interested to see how it will work. I know that when I teach a weeklong class, my students by the end of it feel they’ve crammed a lot of knowledge into their brain and that’s tiring. The Logit program classes run for 12 weeks, so it’ll be interesting to see how the amount of information balances out with people’s ability to absorb it.
I think it’s a fantastic idea to take people like me who are working in the field and have experience that can be conveyed to someone not just through a textbook, but with a presentation and hands-on knowledge. I think that’s invaluable. They key is to have interactivity around subjects, then go out and do hands-on learning and problem solving. That’s a very important and critical experience that you don’t get elsewhere and I’m really excited to do that. When I teach weeklong classes, you can’t go to into that depth. You can do one or two example problems, but you can’t go to the depths of the sort of problem you might do if you were working in the field. That part of the lab should be pretty exciting.
Are you going to be teaching full-time or will there be other instructors as well?
The plan is that the first bootcamp will be split up into three instructors. I’m teaching one week at the beginning, one in the middle and one near the end.
What will you be teaching in your three weeks?
In the first week I will do an introduction to data science; what data science is and the different types of data science. Then in the middle I’ll do a section on time series problems - time evolution and things relevant to stock market information and making predictions on things that are dynamically evolving.
Then near the end I’ll cover applications to text or corpus problems; basically text based processing, or scraping. A lot of marketing companies take email responses or posts, scrape them for customer reactions, and then do market analytics based on the customer reactions.
Why is Logit teaching Python rather than R or another data science language?
There are a number of reasons:
- Python is probably a more pervasive language than R. If you look at the use of Python in industry, Python is used in big banks and hedge funds more than R.
- R is a more tightly specific kind of language so if you’re going to be doing statistics, you might be doing R. It’s got a very tight scope. Once you get outside of it and start to apply it to other problems, you have to start working in another language.
- Python is very good at binding and interplaying with other languages, technologies and tools. What I mean by binding, is that it can communicate back and forth between these other languages. We have Python bind to C and to R and all sorts of other languages that people use.
- Python’s used a lot in high-performance computing, which data science typically needs.
- Python’s more generalist so there are more programs in Python for optimization than there might be in R. If you’re solving a predictive problem, you need optimizers.
Overall, it’s a more general toolset that is used more in the larger scale industry. In certain cases, people will use things on a smaller scale like R and Excel. But if you’re looking at an integrated technology where you want to do bigger problems, in bigger companies, you find that they almost always use a language like Python.
What will the students’ schedule be like?
In the mornings, we’ll cover general theory so it’ll be lecture style with slides and walkthroughs, where the instructor is working through problems, and discussing user stories. Afternoons are basically labs where the students get hands-on with real data sets and real problems.
Logit Data Science classes are 9:30am to 6:30pm, Monday to Friday.
What’s your personal teaching style?
I’m an extremely informal instructor. If it's just me talking all the time, it’s a failure. Basically, I want interactivity and I want people to ask questions. The more questions somebody asks the better the class is, so it’s my responsibility to move the class through the entire curriculum. When a student asks a question, another student probably has the same question and it helps enhance the learning process.
In my own experience I found that most of my instructors weren’t able to communicate subjects to me very well. I learned best by hearing examples rather than seeing derivations. In a traditional setting you don’t get a lot of that. So I teach how I learn, which is to introduce one or two guiding mathematical expressions, then tear them down, and get a picture of what the thing looks like so you can start to translate it to real life. It’s not remembering, it’s actually learning how the equation is part of the picture of real life and how you can use that in code to solve a problem.
How will Logit help students find jobs?
Logit is partnered with a recruiting firm and resume experts who will help ensure that students are prepared for interviews and exposed to job opportunities. We’ll have a few hiring managers and recruiters coming to speak to students, and there will also be networking events. A senior data scientist from Google is one of the first speakers.
What sort of jobs might students be prepared for when they graduate?
If you want to be general about it, the students are being prepared to be data scientists. However, in a lot of other industries you might get a different name for what you’re doing. The type of education and problems we’ll get into in this class will put you on the right learning trajectory for tech companies like Google and Facebook. It should also make you very attractive to hedge funds and banks, which use a lot of data science modeling.
Data science is pretty pervasive in any field that does predictive analytics. We’ll try to cover problems in the different sections that are specific to each of those different fields so people will get a taste and see how data science applies to it.
What sort of students do you expect to see studying with Logit?
I would expect two types of student. First, I would expect someone primarily looking for a job in data science. Second, we might see people who already have jobs that entail some data science, so they use it as training for data science within a particular company. You’d be surprised that a lot of big companies don’t offer data science within their own training program.
What sort of resources or meetups are there in Los Angeles about data science where people can get a taste of what it would be like to study data science?
There are a number of data science meetups in LA, including the Hollywood Data Science Meetup which is hosted by Logit Academy.
How are you feeling about starting as an instructor there?
I think the people running Logit are very nice, genuine people. They’ve been working in data science for a while. They’re not necessarily data scientists but more in data science management. There are a lot of data science positions in Los Angeles, and they know it’s important to build the community of data scientists in Los Angeles. They saw the opportunity and not only are they in the right place at the right time but they really care about getting students work in the field.