Artificial intelligence (AI) fuels significant technological advancement (hello self-driving cars!) but how exactly do machine learning and deep learning play into that development? Andrew Berry, a Lighthouse Labs Data Science Mentor, breaks down the difference between machine learning and deep learning, how data analysts and data scientists are using machine learning to find solutions to complex data problems, and the exact programming languages and technologies you should learn in order to break into each data field. Plus, Andrew shares his favorite learning resources, including how the Data Science Bootcamp at Lighthouse Labs helps career-changers kick-start their data career!
Machine learning (ML) is a subcomponent of artificial intelligence (AI) that uses specific statistical algorithms to process massive amounts of data in order to produce insights, predictions, and unique outputs.
Essentially, machine learning is applied statistics taken to a new level by using tools like Python and computers that aid a data scientist to work on complex problems. People could do machine learning algorithms themselves but it would take a long time! The computational power of computers enables humans to interpret interesting information from massive amounts of data at a much faster rate.
Unstructured data is unlabeled data. In order to navigate unstructured data, a data scientist has to organize the data set themselves.
Example: There is so much text on the internet - words, sentences, and paragraphs. In machine learning, this is unstructured data.
Supervised learning is when labelled data is analyzed in order to make a prediction on a known output. As in, we know we are trying to predict x or y, not something entirely new. Most machine learning that happens these days is supervised learning. Data scientists make predictions using structured and labeled data in order to leverage an understanding of the data itself, what the data set means, and the actual output.
In unsupervised machine learning, unmarked territory is being explored and there are no predictions of the output. Unsupervised machine learning analyzes huge datasets to identify unique trends that humans may not interpret as easily. It’s then up to the data scientist to assess the outputs and determine their validity.
Example: If a data scientist wanted to gain insights about a company’s customers, they may use unsupervised machine learning. The data scientist can feed structured data, like the demographics of customers, into an unsupervised machine learning model, and the unsupervised machine learning model would go through the data to find relationships (features/dimensions/variables) in the data set.
Deep learning (DL) is a subset of machine learning, which works with artificial neural networks to try to mimic a human mind. Human brain cells constantly process patterns from huge amounts of data from our senses. Deep learning tries to mimic human behavior by using large neural networks to process huge amounts of data in a larger scale than traditional machine learning. Neural networks are a series of algorithms that attempt to recognize commonalities in a dataset through a process that mimics the operations of the human brain. Keep in mind that not all machine learning is deep learning, but all deep learning is machine learning!
Both data analysts and data scientists use machine learning. In many companies, data analysts and data scientists work closely together to derive insights from machine learning and deep learning models.
Data analysts query, manipulate, and clean data, often looking at historical data to derive interesting insights.
In most use cases, data analysts will analyze the data and create dashboards, but this is only true if a company has a data science team. When there is a data science team, the data analyst may help the data scientist identify the right data, business problems, etc. The deployment of models will be under the data scientist's responsibility.
Data scientists review data sets and run machine learning or deep learning models to generate insights. Most machine learning and deep learning responsibilities fall under the data scientist role. In contrast to general data analytic work, data scientists use applied statistics to make predictions. Companies employ data scientists with the intention that they use data to solve ML problems; a data scientist must have the skills and tools to deal with those problems.
A key responsibility of a data scientist is determining which machine learning algorithm or deep learning tool is the best solution for a specific problem. This is an important skill because some algorithms, tools, and solutions take hours, days, or even months to compute, depending on the dataset. In practice, if a data scientist chooses to use a simple algorithm, such as linear regression, then they will typically work with Python and the appropriate plugins, based on their workflow. It’s also possible to use Excel to gain insight into simple algorithms, but not necessary. For more complex problems, data scientists may employ deep learning to produce outputs with higher accuracy.
Python is the underlying language that interfaces with different packages and plugins. A data scientist codes in Python while calling in the functions of these tools to manipulate data:
The most popular packages and plugins that enable deep learning are:
To share their data findings, data analysts frequently use tools like Tableau and Microsoft Power BI to deliver business intel reports. Looker is a BI tool that was recently acquired by Google and is gaining popularity in the startup world.
The biggest benefit of using machine learning and deep learning is the ability to quickly process and gain insights from massive amounts of data. With machine learning and deep learning, humankind has the potential for significant technological advancement, such as facial recognition and self-driving cars. Tech companies like Tesla, Apple, and Google are currently using deep learning.
That said, ethical consideration needs to be included when employing machine learning and deep learning models. When working with a project that impacts human life, data professionals must be careful of the output, which means ensuring accuracy and that it’s not causing harm to anyone. Data professionals should always ask themselves if the models they are building are right for society. For example, machine learning algorithms have become so good at engaging an audience that users of a platform are likely to feed off certain confirmed news outlets, causing global divisions in media.
The biggest limitation of machine learning and deep learning is the data itself. Right now machine learning is still relatively basic in that a data scientist feeds a model the data and it outputs something. Even though computational power has increased over the years, human brains are still much more advanced. The limitations of the computer are still in accordance with what a data scientist tells them and what they are capable of doing.
For someone who is totally new to data science, a data science bootcamp with a hands-on learning approach may be the best way to learn. Lighthouse Labs offers an intensive, three-month data science bootcamp where students receive a solid foundation of data science skills and an introduction to a full toolset that a data scientist would use in the workforce. Within the first four weeks of the bootcamp, students are introduced to rudimentary machine learning algorithms and begin working on machine learning projects.
Included in the 12-week intensive bootcamp is an additional 40-80 hour prep course that will prepare students with the tools and knowledge to hit the ground running on day one of the program. This prep module ignites students to begin thinking like a data scientist and familiarizes students with Python, SQL, basic statistics, and basic fundamental math theories. The prepwork also prepares students for the intensity of the bootcamp.
For those who are more interested in the data analyst-side of machine learning and deep learning, Lighthouse Labs offers a part-time Introduction to Data Analyst Course.
Lighthouse Labs offers a solid data science foundation for those looking to become a data professional, but there are free tools to help folks new to tech begin building their knowledge-base:
Data is everywhere and there's more data than ever before, you just have to know where to look! These are excellent resources for those needing data sets to build a machine learning model:
Artificial intelligence (AI) has been around since the 1950s, but in the last 20 years, machine learning and deep learning have significantly progressed our technological capabilities. Data science was once limited to the academic world, but now working within the data science field without a college degree is attainable. Anyone who wants to pick up data science skills can. Data science tools are now very accessible, and people only need an understanding of Python to begin working with data. There is a high demand for data scientists right now, and I don’t see that abating. Now is a great time to get into data science!
8 non-profit bootcamps with a mission to change tech!
How NexGenT helped Iris pivot into network engineering in 24 weeks!
Lighthouse Labs breaks down data viz tools: Matplotlib, Seaborn, and Plotly!