Written By Cole Ingraham
Edited By Jess Feldman
Course Report strives to create the most trust-worthy content about coding bootcamps. Read more about Course Report’s Editorial Policy and How We Make Money.
Course Report strives to create the most trust-worthy content about coding bootcamps. Read more about Course Report’s Editorial Policy and How We Make Money.
In today's world, it is nearly impossible to avoid hearing about artificial intelligence (AI). Given the recent hype and popularity of ChatGPT, you may feel like AI is completely novel, but it is an idea that has been around since the 1950s and has been an active area of research and development ever since. At the core of modern AI developments is the artificial neural network. Let’s dive into neural networks (one of the fundamental concepts you should understand if you want to break into AI as a career!) with Dr. Cole Ingraham, a Lead AI Instructor at NYC Data Science Academy, a leading entity in AI education with ten years experience of teaching Machine Learning/AI working professionals.
🧑🏽💻 Ready to dive deeper into all things data science, AI, and machine learning? Enroll in an immersive bootcamp or short course at NYC Data Science Academy!
📌 Neural networks come in various forms but at their core they aim to represent the relationship between inputs and outputs as a weighted sum (matrix multiplication), followed by a nonlinear activation function, which determines if the output is "on" or "off."
Stacking multiple of these on top of one another allows them to represent very complex functions, and has proven to be very useful for some tasks that have been difficult or impossible to create hand written code for, such as identifying objects in images or translating one language to another. This stacking of layers is where the term deep learning comes from.
In order to perform various tasks, various types of neural network "layers" have been developed. For vision related tasks, it is useful to understand that an object is an object no matter where in the frame it appears. If we wanted to learn that fact from data, we would need examples of every object we are interested in detecting appearing in every possible location, which is wasteful and generally impractical.
To address this, the idea of convolution was used to encode the translational invariance property. This is referred to as a convoluted neural network (CNN). As a result, CNNs perform particularly well on vision tasks. It should be noted that translational invariance does not only apply to two-dimensional things like images, but can be used to detect patterns regardless of their location in one dimension (such as sound) or three dimensions (such as 3D scenes or video with the third dimension representing time).
Working with language requires a different approach where we care less about identifying things in isolation and more about the relationship between things across the whole input. Most neural networks require all of their inputs to be the same size, which is not typically the case with language (most sentences are different lengths from each other).
To address these requirements, the recurrent neural network (RNN) was developed. RNNs are built around the idea of having a "hidden state" which acts as a memory. First the input is broken up into tokens and the hidden state is initialized (as in, it’s set to a starting value). Then the first token is fed into the RNN, along with the hidden state. The RNN then outputs the updated hidden state, which can be used to predict some output. The updated hidden state is then fed back into the RNN along with the next token, and the process repeats until all input tokens are processed. At each step, the RNN is responsible for determining what information from the input is stored in the hidden state for later and what is "forgotten" (erased). This allows the network to both handle arbitrary length sequences as well as perform reasoning which requires identifying patterns across a sequence.
Since their invention in 2017, state of the art performance on many tasks has been achieved by using transformers, which are the basis for today's Large Language Models (LLMs). These generally work by letting the model learn to "pay attention" to different parts of its input depending on the situation. Transformers initially gained their popularity in working with language because they provided some new benefits over RNNs: they are faster to train and they can look at all of their input in order to make decisions rather than needing to learn what to remember. However they also suffer from some drawbacks that RNNs do not: they require more and more memory as their input length increases, and if you cannot fit something in the input it effectively does not exist (where RNNs can in theory remember things forever, although in practice this is not usually the case). Beyond language, transformers have also been successfully applied to vision, replacing the sliding window approach of the CNN with breaking the image into patches and using attention to decide what is important.
Today neural networks are commonplace in both academia and industry, and there are many skill sets applicable to working on or with them:
At NYC Data Science Academy, many of our students use neural networks in various ways to solve real world problems:
However, it is worth noting that neural networks are not always the right tool for the job. Many of our student's projects involve comparing the performance of neural networks to other models and, depending on the task and the data that is available, sometimes more traditional models are a better fit. Knowing when not to use something is just as important as knowing when to!
If you are interested in learning about or working with neural networks, there are many paths you can take. Many universities now offer masters degrees in data science, which involve deep learning, and there are various Ph.D. programs as well, however these programs often require having a degree in math or computer science.
If you are coming from a different background then there are many options for either self-directed learning, such as Coursera or DeepLearning.AI, or bootcamps such as NYC Data Science Academy.
There are also plenty of great resources online for those who are more self directed. The Youtube channels of Yannic Kilcher and Andrej Karpathy are fantastic for gaining an understanding of many of the concepts important to neural networks.
Finding supportive communities such as the Learn Machine Learning subreddit can also help getting your toes wet. Also looking at the code associated with papers or other example implementations such as Phil Wang's Github can be invaluable, provided that you are already comfortable with programming (typically in Python).
Regardless of how you prefer to learn or how deep you wish to go, I always encourage everyone to gain at least some familiarity with neural networks, if for no other reason than to demystify them and understand when and where they are helpful.
Dr. Cole Ingraham is a musician and composer turned software engineer and data scientist who currently teaches at NYC Data Science Academy as Lead AI Instructor.
A TripleTen career coach answers what to do in the first 90 days after bootcamp graduation!
Learn how to launch a career as a technical writer!
Find out the fundamentals of cloud engineering and how to launch a career in the Cloud!
Follow our tips to help you choose between these two, in-demand tech careers!
Hack Reactor's Zubair Desai shares how bootcampers should (and shouldn't!) use GenAI...
Lighthouse Labs walks us through cybersecurity jobs across 6 different industries!
Why You Should Learn CSS If You’re Not a Software Engineer
A Fullstack Academy instructors shares how AI is used in Data Analytics!
Codesmith CEO Will Sentance walks us through the importance of software engineers in the AI age
Plus, find out how to learn WebXR to become a WebXR developer!
Sign up for our newsletter and receive our free guide to paying for a bootcamp.
Just tell us who you are and what you’re searching for, we’ll handle the rest.
Match Me