The Software Guild teaches data engineering courses for business analysts at companies that want to analyze a wider variety of data. Could data engineering be the next skill you need in your skillset? Software Developer Jonathan Frazier and Data Scientist Haythem Balti, both instructors at The Software Guild, explain the difference between data science and data engineering, and what outcomes they see from employees who have taken their data engineering course.
Haythem: Usually, when we start working with a company, we try to identify the skill gaps and where we can actually add value to the employees by reskilling or upskilling.
Jonathan: One important part of the process is gathering requirements to understand the individual needs of that company. We try to customize the material towards the needs of specific client, and towards the level of the students.
Haythem: We noticed that insurance companies and banks rely heavily on business analysts and business intelligence to conduct all sorts of operations, risk assessments, cost reductions, etc. Traditional business analysts are fluent in SQL, which is great. But nowadays, these large companies have data that is not just stored in a SQL format, but instead stored in files, videos, audio, texts, and formats that we call unstructured data. Our program is targeted towards business analysts who know a little bit of SQL, but now need new skills to manage unstructured data.
Data engineering is very important because good data engineering leads to good data science (that’s not necessarily true the other way around). If you prepare your company with good data engineers who can organize data, then you will actually improve the outcome of all of these efforts by reducing costs, improving business intelligence, or by improving revenue.
Jonathan: Big data is a problem that forward-looking companies are increasingly putting resources behind. And because this is a new field, there isn’t typically a large pool of employees that already have the skills for solving big data problems. If you want to get the most information from data to make the best decisions for your business, you need to be able to understand it. And that requires a certain skill set that a lot of people don't have. So we have developed a program aimed at uninformed employees, like business analysts or people who need to upgrade their data skills.
Haythem: When you're looking at a company’s data, the data is usually spread out across many servers and architectures. So in order to actually access that data, combine it, and analyze it, there are two different processes – data engineering and data science.
A data engineer also has different skills than a data scientist.
Usually, good data science starts with good data engineering. For example, let's say we want to estimate the number of items sold on Amazon in the next month. We would gather the sales data from the last five years – a data engineer would use their skills to pull all the sales data into one single data source. Then a data scientist can access that data and analyze it.
Jonathan: To give another analogy for this, imagine we're trying to bake a cake. The data engineer’s job is to get all of the right ingredients that you need to bake your cake, prepare them for baking, and get them in the right shape to do the work.
The data scientist actually bakes the cake. There are all sorts of different cooking techniques that you can use with your data to put it together in interesting ways and end up with a final product that has value.
Haythem: The first time we taught the data engineering course we taught it face to face at the client’s office. For another iteration, we taught it online because our clients had employees located in different locations at different offices. There are a lot of tools that we leverage online that allow us to teach just like we are essentially in a traditional classroom.
Jonathan: One of the features of the Software Guild is the flexibility. We've taught in-person classes on location at company offices. We also have our own campuses in a few different locations around the country, where teams can come and host their classes. Ten students is recommended, but we handle classes of up to around 20.
Haythem: The length of the courses are determined on a case by case basis. The data engineering program is two, six-week tracks, but we try to work around what is best for the employees, to create a balance between work time and training time.
Haythem: The data engineering curriculum and all the projects revolve around three main steps: Extract, Transform, Load (ETL). Every time you analyze data, you need to extract data, then transform that data to get it into a format that we can use for analysis. Finally, we take those results and load them into a database or a file.
We teach students how to use Python and other tools to access data, and read data from SQL databases, such as MySQL, NoSQL, and MongoDB. We also cover distributed computing such as Hadoop or Spark, and make sure students know how to read data, save data, analyze and transform data. At the end of the six weeks, students showcase a project they worked on as a team.
Jonathan: Something really important is the fact that instruction is focused around individual work and group work because both are a very important part of software development and data engineering. Many of the complex problems that we work on require effort from more than one person. So it's really important to teach skills for working effectively in teams.
Not all employees are already strong in software engineering concepts. So one of the benefits of working with the Software Guild is that we have a lot of experience training people who essentially don't have a tech technical background at all. We can guide employees from low technical skills through the basics, through more advanced concepts like data engineering into data science, where you get to really leverage some of the powerful data analytic tools that exist right now.
Haythem: One of our clients, a large insurance company, actually brings data from their own work, and we help them solve a real problem. Not only do we teach the students how to use Python, we actually teach them to solve their business problems.
Jonathan: For me, a lot of effective instruction is built around practice – giving students exposure to new concepts and programming, then making sure they get enough practice to internalize the concepts. A lot of my lessons are example based.
I also like to focus on the thought process that students can use when they're going through the development of their code. So a lot of times during instruction, I'll ask students questions to help them think about the next step in their process.
Haythem: As a data scientist, I truly believe in learning by experience. And if you expose students to enough examples, they will learn the ins and out of things. My teaching style revolves around teaching by examples which get incrementally more difficult.
I never use PowerPoints slides. My typical teaching session involves plugging my computer into the main screen and writing code, so the students can follow along, write it themselves, test it, and make sure it works.
Haythem: We do that through learning assessments and assignments. We use modular curriculum content delivered through a learning management system (LMS) called Canvas. Each module or unit has its own assessments, quizzes, and some of them have group projects. We measure student performance every step of the way throughout the program, and report that to our clients.
Jonathan: In addition to the module based assessment, the final projects are really important in the assessment of how the students were able to absorb the material from the course overall. The final projects incorporate the concepts that they've learned during the entire class and demonstrate their level of ability to solve data problems using what we've taught them.
Haythem: We recently ran the data engineering course at a large insurance company. This company found their business analysts were not able to do certain types of analysis. Like any insurance company, they had a lot of data spread across many different technologies, but the business analysts didn't have the skills to gather that data. We first identified exactly what those skill gaps were, then figured out what kind of tools to teach them so they could do their jobs without any supervision or help from someone else.
After the employee training, the company was able to essentially improve its efficiency. A group of employees had been doing a process that took a couple of hours to run. After going through our curriculum, they were able to automate it using Python, and get that process down to a few minutes, making their job much easier.
Jonathan: In the recent class, there was a product that was being developed at the company. It was a process that had value for the company because it allowed users to see different views of the data. But it was slow in some ways. So over the course of the 10 to 12-week class, we taught the students how to pull the existing data from one of the databases into their own process to design a new database schema that better fit that data, and how it was going to be used.
One of the cool things about this particular project is it gave the company a different view of the data. Before, it had been kind of segmented by department, but now they were able to show global views of what was going on in the data, which was something they couldn't do before.
Haythem: These employees or students are very smart, they just don't have the right skills to make something more efficient or better. So we focus on what tools we need to teach them to be able to do their job more efficiently.
Haythem: The first time I taught the data engineering program, one of the students was super motivated, very smart and she excelled in the class. After the training, she went back to her team, upskilled all her teammates with some of the same technologies, changed the way they think and do their work so that everyone was really contributing to the team. It was pretty awesome to see that.
Another thing that we've seen is employees getting more autonomy. The students come to this training with a lot of SQL experience, but no programming experience. So once we teach them all of these skills and languages, they're able to work autonomously on their own without having to consult anyone from IT. That boosts productivity, builds self-confidence, and makes the work more enjoyable.
Jonathan: And that's definitely one of the goals of this training from the client perspective – they want to create a workforce that is better prepared for solving 21st-century problems. So they want to give their employees a new set of skills to prepare them for new roles and new opportunities through this technical education.
Haythem: My advice is to focus on identifying the skill gaps. You have to take the time to make sure you are solving the right skill gaps, or implementing the right training programs. That means establishing the objectives of the training and what tools, concepts, and curriculum should be covered in that training. So spend more time talking to your employees about what would make their jobs more efficient. Once you have that information, writing and delivering the content will be a relatively easier task.
Jonathan: Building on that, my advice is to find the pain-point that your company is dealing with when it comes to data, and understanding how far employees can go, even with just basic software development skills. You can actually do a lot once you've learned the basics. So getting started and learning that foundation is something that's really important, and can never be wasted effort.
Haythem: Companies can get started with one-off seminars, mini conferences, mini training sessions, or lunch and learn sessions for employees where someone knowledgeable can teach some data engineering concepts.
Another option is to encourage self-based learning through an online learning platform. I would recommend starting with foundational courses in Python and data covering types of data, structured data, unstructured data, and semi-structured data. That would be enough for employees to get started and become more autonomous. There are many websites for learning these skills, which are great resources, but unfortunately, they're not really customized for specific industries. At The Software Guild, if we work with an insurance company, then all the examples and the content will be insurance oriented. If we work with a finance company then we're going to have examples in finance.
Jonathan: Like Haythem said, there are a lot of free data engineering resources online, but one of the downsides is that it can be kind of overwhelming navigating through all those options. If there is a specific skills gap, I definitely recommend they reach out to an expert and ask questions to get their bearings on the landscape. That's also something that we can help out with at the Software Guild – helping people understand what the market is like and what their options are.