When you hear the term data science, do you think spreadsheets, numbers, and mad scientists? Data science helps us understand all of the data that we’re collecting, and helps turn that data into action. The field is in high-demand – 2.7 million jobs will need to be filled by 2020! In this data-drive culture, companies now more than ever require data analysis skills to enhance business growth. And to fill the market gap, data science bootcamps are steadily growing, training thousands of data scientists each year. Read on for a crash course in data science and our picks for the best data science bootcamps.
What is Data Science?
Data science is a multidisciplinary field that combines computer science and statistics. The objective of data science is to pull insightful and useful knowledge out of datasets which, at times, can be too large for traditional statistics to analyze. This can include anything from analyzing complex genomic structures, to interpreting handwriting, to optimizing a marketing strategy. NYC Data Science Academy. Director, Josh Wills, says that a data scientist is a “person who is better at statistics than any software engineer and better at software engineering than any statistician.”
The Data Science Job Market
According to IBM, by 2020, the data analysis workforce will grow by 28% and the number of roles will increase from 364,000 to 2.7 million. For data science and other advanced data roles, the demand will reach 61,800. The democratization of data has governments, businesses, and organizations measuring any and everything to make better business decisions.
What’s the average salary for a Data Scientist?
The average salary for Data Science Analysts is $80,265, while the average salary for advanced Data Scientists and Data Engineers is $105,909.
What are the best data science bootcamps?
Length: 10 weeks, full-time; 5-10 weeks, part-time
Curriculum: Students learn how to build categorical and numerical models that can predict results before they happen using tools like Anaconda, Jupyter Notebooks, Python, and R.
Career Prep: Springboard students have mentor-guided courses with a job guarantee. The career team works with you to refine your portfolio, optimize your resume, and build industry connections. If you don’t land a job within six months of graduating, your tuition will be refunded.
Cost: $2,900 and $12,000
Location: NYC, Toronto, Vancouver, and online
Length: 5 months, full-time; 10 months, part-time and self-paced
Curriculum: Students will receive a solid foundation in cleaning and gathering data with Python, Pandas, and SQL, while also understanding of how to go from problem requirements to actionable steps with issue trees and experimental design.
Career Prep: Students work one-on-one with a dedicated career coach to build the collateral and skills they’ll need to succeed in landing a job in data science.
Cost: $12,000, $8,200, and $6,000
Length: 13 weeks, full-time
Curriculum: Students gain experience across the data science stack: data munging, exploration, modeling, validation, visualization, and communication.
Career Prep: With career prep workshops and events, Galvanize helps students level up in their career.
Location: San Francisco, NYC, Austin, Denver, Boulder, Phoenix, and Seattle
Length: It varies;10-12 weeks, full-time or 9 weeks part-time and more
Curriculum: Their data analytics bootcamp uses SQL, Excel, and Tableau to extract, analyze, and illustrate real‐world data. Their data science bootcamp uses Python, SQL, UNIX and Git to mine datasets and predict patterns, build statistical models, and master the basics of machine learning.
Career Prep: GA has in-house career coaches to guide you on your path to a data role.
Cost:$15,950 and $3,950
Location: Various cities globally and online
Length: full-time, 7 weeks
Curriculum: Insight Data Science is an intensive fellowship intended as a post-doctoral bridge between academia and professional data science. You must have a PhD for this course.
Career Prep: Students will learn from top industry leaders and be positioned to interview with leading companies.
Location: Boston, Toronto, Seattle, Silicon Valley, NYC, and online
Length: self-paced, 15 weeks
Curriculum: Students learn data analytics using SQL, R and more. Students gain foundational skills in analytics and work on integrated projects from real companies.
Career Prep: Level prepares students with skills to land a job and master data analytics within the workforce.
Location: Boston, Charlotte, San Francisco, Charlotte, online, and more
Length: 12 weeks, full-time
Curriculum: Students will learn cutting-edge technologies and techniques like Jupyter, machine learning, interactive data visualization, and other big data tools and architecture.
Career Prep: Students receive hands-on career support from dedicated Career Advisors until they're hired.
Location: NYC, Chicago, San Francisco, and Seattle
Length: 12 weeks, full-time
Curriculum: NYCDA teaches beginner and intermediate data science with Python, and Hadoop as well as the most popular R packages like Shiny, Knitr, rCharts and more. Their Hadoop & Spark Bootcamp uses Python, Scala and Java, and emphasizes the use of Hadoop tools to analyze large volumes of data. You need a PhD for this bootcamp.
Career Prep: Enjoy 1-on-1 career support and access to job assistance resources.
Length: 2-6 months, self-paced
Curriculum: The tailored curriculum covers Python, data wrangling, data story, inferential statistics, and machine learning. Students receive a one-on-one mentor to help reinforce learning.
Career Prep: Students learn how to prep for interviews, write resumes, and receive advice on strengthening their data science portfolio.
Cost: $499/ month or $7,500
Data Science vs. Data Engineering vs. Data Analytics
You’ve probably heard these terms – Data Science, Data Engineering, and Data Analytics – here are the differences:
Data Science is a cross-disciplinary field requiring skills in Computer Science (Machine learning), Statistics and Mathematics. Typically, it requires candidates to have an advanced degree in a STEM field (e.g., Science, Technology, Engineering, Mathematics, Statistics) and a good understanding of the sophisticated concepts underlying modeling. Most Data Scientists use R and/or Python as their primary tools.
Data Engineering leans more towards software engineering and computer science, with just some knowledge of data science. It mainly covers Hadoop, Spark, Python, Java and Scala. It entails writing scripts and being familiar with tools to input and extract data from big data warehouses.
Data Analytics is considered more entry-level and focuses on BI (business intelligence). Its focus is to draw business insights from commonly seen data types. It includes data cleaning, data visualization and simple modeling including linear regression. Common Data Analytics tools are SQL and Excel.
Data Science Bootcamps vs. Data Science Fellowships
There are significant differences between data science bootcamps and data science fellowships. Data science bootcamps are geared towards students with a bachelor's degree and an aptitude for math and statistics (no PhD required, but it helps to know a programming language like R or Python). Some schools, such as NYC Data Science Academy, and Science 2 Data Science require students to have a PhD or masters. Data science bootcamps are intensive 3-6 month programs and prepare graduates for entry-level and junior data science jobs.
Unlike Data Science bootcamps, Data Science fellowships are generally free to the student (revenue is generated through hiring partnerships). Data Science fellowships generally require more experience than bootcamps. For example, the Data Incubator requires candidates to have a Masters degree or Ph.D. in a social science or engineering field and relevant work experience. Data Science fellowships help academic data scientists prepare for work in a corporation or startup. According to a white paper by Insight Data Science, fellowships are a great bridge between academia and a career. The program enables data professionals to learn the industry-specific skills needed to succeed in the growing field.
What is “big data”?
Many data science courses use the term “big data” to describe their curriculum content. But what exactly is “big data”?
According to NYC Data Science Academy, “big data” is a term coined to describe data sets that are too large to be analyzed on one computer. With the advent of the internet, streaming data, wearables, etc, the amount of data being produced each day equals all the data ever created up to the year 2003. This data holds insights that can be useful for decision makers, but its sheer volume, together with the usual problems of corruption, incompatibility, and complex structure (often including natural language), make it challenging to use. Sophisticated tools (e.g, Hadoop, Spark) that can employ multiple computers simultaneously are required to extract actionable knowledge from this data.
Most Common Data Science Technologies
The technologies learned at a data science bootcamp often differ from what is taught at a traditional coding bootcamp. Let’s break down the common technologies used in the field and what they’re used for:
- SQL - SQL stands for Structured Query Language. In traditional database environments, industries rely on SQL to extract data for data analytics and reporting purposes. It is designed for managing data in relational database management systems.
- Hadoop - Hadoop is a suite of technologies for managing data and executing programs in a cluster (a collection of networked computers running in a data center). This includes a file system designed for the needs of large data, the MapReduce system for running programs in parallel, the SQL-like Hive database for querying data in a cluster, and many other components.
- Spark - Spark is a system for writing parallel programs to run in clusters. As a competitor to MapReduce, it has gained popularity for its higher efficiency on many problems. It also has a powerful machine learning library, mllib, and can be used with R, which makes it especially popular among data scientists.
- Python and R - Python and R are both standard languages that are used by data scientists. The Python vs R conversation reflects the fact that data science is a marriage between computer science, where Python is used, and statistics, where R is used. A complete data scientist will know both languages and leverage their different strengths.
- Machine Learning- Machine learning refers to a growing set of algorithms that are able to analyze large sets of data. Its popularity is due to the fact that these algorithms are able to make predictions about future events that exceed what traditional statistics is designed to do. The reason it is called “machine learning” is because many of these algorithms are built to use the results its initial findings to feed better data into subsequent models. Thus the machine “learns” how to improve its predictive powers.
What kind of background should data scientists have?
While having math aptitude is important, Data Scientists come from a variety of educational and professional backgrounds. Common background skills include: problem-solving, logical reasoning, communication, and being detail-oriented. For a better idea of the types of students who are successful, check out these Q&As:
- Sumanth Reddy (Professional Poker Player) NYC Data Science Academy
- Emily (Art Major) & Itelina (Econ Major) Metis
- Jason Liu (Physics Ph.D.) NYC Data Science Academy
- Adam Hill (Astrophysics Ph.D.) Science to Data Science Fellowship
Data Science Jobs
As with most fields, Data Science job titles don’t always give you the nuts and bolts of what the job entails. Below are some common job titles you’ll come across when looking for jobs in data science and their average salaries.
Data Analysts are responsible for analyzing large datasets whether for customer research, business intelligence or internal studies. Data Analysts start with a large data set and are tasked with drawing actionable conclusions from this data. Data Analysts may work with engineers, UX Researchers and Sales staff to develop growth solutions. In addition to data science tools like SQL, Data Analysts should also have knowledge of statistics and concepts like A/B testing.
Average Salary - $64,425
Data Scientists are responsible for determining the data necessary to answer a question, from designing a method for capturing data to gathering data, analyzing data and finally presenting the solution. Similar to the Data Analyst, the Data Scientist’s role is much larger in scope and requires careful planning and design of research from beginning to end. Data Scientists will use the full gamut of data science tools including as Python, MongoDB, Hadoop and more.
Average Salary - $131,836
Database Administrators work with technologies such as MySQL, MongoDB, and Postgres to manage large datasets. Depending on the company and role, their duties may include investigating and solving database problems, repairing glitches and designing elements that improve the storage and maintenance of data.
Average Salary - $84,641
Data Engineers are half software developer, half data scientist. Data Engineers use programming languages to write scripts that capture data. Data Engineers then analyze the data and make program or product recommendations based on their analyses.
Average Salary - $127,624
What is jQuery: An Intro for Beginners
Should you learn jQuery? We look at the debate around jQuery and what jobs need jQuery.
What is Express? A Guide for Beginners
Git and GitHub: A Guide for Beginners
Turing School instructor David explains Git, GitHub, version control, and repositories for beginners