Neural Networks: A Guide for Aspiring Engineers

Cole Ingraham

Written By Cole Ingraham

Jess Feldman

Edited By Jess Feldman

Last updated on January 8, 2024

Course Report strives to create the most trust-worthy content about coding bootcamps. Read more about Course Report’s Editorial Policy and How We Make Money.

In today's world, it is nearly impossible to avoid hearing about artificial intelligence (AI). Given the recent hype and popularity of ChatGPT, you may feel like AI is completely novel, but it is an idea that has been around since the 1950s and has been an active area of research and development ever since. At the core of modern AI developments is the artificial neural network. Let’s dive into neural networks (one of the fundamental concepts you should understand if you want to break into AI as a career!) with Dr. Cole Ingraham, a Lead AI Instructor at NYC Data Science Academy, a leading entity in AI education with ten years experience of teaching Machine Learning/AI working professionals. 

🧑🏽‍💻 Ready to dive deeper into all things data science, AI, and machine learning? Enroll in an immersive bootcamp or short course at NYC Data Science Academy!

What are Neural Networks?

📌 Neural networks come in various forms but at their core they aim to represent the relationship between inputs and outputs as a weighted sum (matrix multiplication), followed by a nonlinear activation function, which determines if the output is "on" or "off." 

Stacking multiple of these on top of one another allows them to represent very complex functions, and has proven to be very useful for some tasks that have been difficult or impossible to create hand written code for, such as identifying objects in images or translating one language to another. This stacking of layers is where the term deep learning comes from.

What is a Convoluted Neural Network (CNN)?

In order to perform various tasks, various types of neural network "layers" have been developed. For vision related tasks, it is useful to understand that an object is an object no matter where in the frame it appears. If we wanted to learn that fact from data, we would need examples of every object we are interested in detecting appearing in every possible location, which is wasteful and generally impractical. 

To address this, the idea of convolution was used to encode the translational invariance property. This is referred to as a convoluted neural network (CNN). As a result, CNNs perform particularly well on vision tasks. It should be noted that translational invariance does not only apply to two-dimensional things like images, but can be used to detect patterns regardless of their location in one dimension (such as sound) or three dimensions (such as 3D scenes or video with the third dimension representing time).

What is a Recurrent Neural Network (RNN)?

Working with language requires a different approach where we care less about identifying things in isolation and more about the relationship between things across the whole input. Most neural networks require all of their inputs to be the same size, which is not typically the case with language (most sentences are different lengths from each other). 

To address these requirements, the recurrent neural network (RNN) was developed. RNNs are built around the idea of having a "hidden state" which acts as a memory. First the input is broken up into tokens and the hidden state is initialized (as in, it’s set to a starting value). Then the first token is fed into the RNN, along with the hidden state. The RNN then outputs the updated hidden state, which can be used to predict some output. The updated hidden state is then fed back into the RNN along with the next token, and the process repeats until all input tokens are processed. At each step, the RNN is responsible for determining what information from the input is stored in the hidden state for later and what is "forgotten" (erased). This allows the network to both handle arbitrary length sequences as well as perform reasoning which requires identifying patterns across a sequence.

Transformers vs RNNs & CNNs

Since their invention in 2017, state of the art performance on many tasks has been achieved by using transformers, which are the basis for today's Large Language Models (LLMs). These generally work by letting the model learn to "pay attention" to different parts of its input depending on the situation. Transformers initially gained their popularity in working with language because they provided some new benefits over RNNs: they are faster to train and they can look at all of their input in order to make decisions rather than needing to learn what to remember. However they also suffer from some drawbacks that RNNs do not: they require more and more memory as their input length increases, and if you cannot fit something in the input it effectively does not exist (where RNNs can in theory remember things forever, although in practice this is not usually the case). Beyond language, transformers have also been successfully applied to vision, replacing the sliding window approach of the CNN with breaking the image into patches and using attention to decide what is important.

Which tech roles use neural networks on the job?

Today neural networks are commonplace in both academia and industry, and there are many skill sets applicable to working on or with them:

  • Data scientists typically work on developing the models themselves as well as getting the required data into a form that is useful. Gathering training data can be done in a wide variety of ways, from using domain experts to annotate examples, using a labeling service such as Amazon Mechanical Turk when more general knowledge is acceptable, to scraping from the internet which requires some programming experience. 
  • Training and deploying models is typically the realm of machine learning engineers or MLOps, which is generally more like traditional software engineering but requires an understanding of the technical requirements for dealing with models. 
  • Even non technical roles, such as product managers or executives can benefit from understanding the use cases and limitations of neural networks.

Practical Use Cases of Neural Networks

At NYC Data Science Academy, many of our students use neural networks in various ways to solve real world problems:

  • One student applied CNNs to the realm of microchip manufacturing in order to help identify patterns in defects, with the aim of determining the root cause of the issues.
  • Others have combined their domain expertise with generative AI in order to extract useful insights from financial data. 

However, it is worth noting that neural networks are not always the right tool for the job. Many of our student's projects involve comparing the performance of neural networks to other models and, depending on the task and the data that is available, sometimes more traditional models are a better fit. Knowing when not to use something is just as important as knowing when to!

How to Learn Neural Networks

If you are interested in learning about or working with neural networks, there are many paths you can take. Many universities now offer masters degrees in data science, which involve deep learning, and there are various Ph.D. programs as well, however these programs often require having a degree in math or computer science. 

If you are coming from a different background then there are many options for either self-directed learning, such as Coursera or DeepLearning.AI, or bootcamps such as NYC Data Science Academy

There are also plenty of great resources online for those who are more self directed. The Youtube channels of Yannic Kilcher and Andrej Karpathy are fantastic for gaining an understanding of many of the concepts important to neural networks. 

Finding supportive communities such as the Learn Machine Learning subreddit can also help getting your toes wet. Also looking at the code associated with papers or other example implementations such as Phil Wang's Github can be invaluable, provided that you are already comfortable with programming (typically in Python). 

Regardless of how you prefer to learn or how deep you wish to go, I always encourage everyone to gain at least some familiarity with neural networks, if for no other reason than to demystify them and understand when and where they are helpful.

About The Author

Cole Ingraham

Cole Ingraham

Dr. Cole Ingraham is a musician and composer turned software engineer and data scientist who currently teaches at NYC Data Science Academy as Lead AI Instructor.

Also on Course Report

Get our FREE Ultimate Guide to Paying for a Bootcamp

By submitting this form, you agree to receive email marketing from Course Report.

Get Matched in Minutes

Just tell us who you are and what you’re searching for, we’ll handle the rest.

Match Me