You may have heard of Git and GitHub as essential tools for software developers, but do you know what they are, and why they exist? We asked Turing School Front-End Instructor David Whitaker to tell us why Git and GitHub are important for developers to know about and use, why Turing teaches them, and which big companies use them on a daily basis. David also tells us about Version Control, what kind of jobs require knowledge of Git and GitHub, and how you can get started learning how to use Git and GitHub today!
Have you ever had a document named something like report_final_draft_final(3).doc? If so, you've felt the pain of managing and sharing files. Keeping track of the changes to a file over time is difficult but important. Git is a Version Control System (VCS) – a tool that helps us to keep track of differences in a file or collection of files over time.
You've likely experienced less sophisticated version control such as Microsoft Word's “undo” or “go back” features that allow you to return to a previously saved version of a file. Maybe you've created your own version control by saving drafts of a file with different numbered filenames. Git takes it to the next level by allowing you to decide exactly when you save a version of a project, and to also provide annotations when doing so.
You can create versions of a project by making it a repository (or 'repo' for short). A repo is a folder with Git tracking turned on, which may contain files and subfolders (ie. a codebase) through a series of intentional snapshots called commits. One thing to keep in mind is that Git is most often used “locally,” which just means on your computer. That's in contrast to GitHub, which is a website that hosts Git repositories.
Git was originally developed by Linus Torvalds, while maintaining the Linux kernel (An operating system that runs the majority of smartphones, tablets, servers, and supercomputers across the globe) in 2005.
Git is used for managing the changes to a project over time. A project might be just a single file, a handful of files, or thousands of files. Those files can be anything from plain text to images or videos.
Because Git is focused on managing changes, it is often used as a collaboration tool allowing people to work on the same project at the same time. By tracking their individual changes, Git can bring everything together to the final version.
Imagine that you're writing a blog post that has multiple files associated with it. You may have one main text file which is the actual post, an additional file for references, as well as some other files that are diagrams and other images.
Without Git, you may have these various files stored in a folder on your computer, but there is no way to tell where all of the files are at a given point in time. Imagine you send your post to two friends to copy edit. How do you merge their changes back together? Which file is the original, which is edited? With Git you can track the whole repo at various points in time using commits while also providing annotations on why you decided to save the project at that point in time. Later, you can browse this history of commits to see a clear history of your project, and also travel back in that history, if necessary.
A software codebase works just like that blog post. At its most basic level, it is a collection of files that are linked to one another. When a developer is working on a certain feature, Git provides a way to save a snapshot of the entire repo via a commit. This is usually done when incremental progress has been made and a feature is bug-free. In making the commit, the developer can provide annotations explaining what was changed and why it was changed. This message adds to the history of the project and can make it easy to determine when a certain feature – or even a bug – was introduced.
GitHub, developed in 2008, is a web application that hosts Git repositories. The team that started GitHub saw that Git could solve important problems for many teams – but Git itself is often difficult to use. GitHub adds a bunch of collaboration and exploration tools on top of Git to help you (and your team) be more productive.
For instance, GitHub makes it easy to share code between multiple computers and developers. It's become the centralized organization tool of the open source community and, in turn, is used by thousands of companies and teams. Some GitHub users have one repo they work with every day, some have hundreds.
Some of the most important tools GitHub layers on top of Git include:
Imagine I'm working on a project on my laptop at work. Once I get to a place where I'm happy with my progress, I can make a commit and then push my repo up to GitHub. When I get home, if I want to continue adding changes to the blog post on my desktop computer, I can pull it down and continue working where I left off. I can continue to make commits and push them to GitHub so that my project is readily available on any computer I choose to work on. Here's a diagram of what that workflow might look like:
I can repeat the above process with few issues if I'm working on my own. However, if I decide to pull in a teammate to work with me on a project, we'll need to incorporate branching into our workflow. Branching makes it easier for us to update the same project simultaneously with limited headaches. If you want to go down that rabbit hole, The GitHub Flow is a good explanation of what a basic branching workflow looks like.
Both major corporations and small open source organizations use GitHub for their development process. GitHub makes their revenue on paid subscriptions for private repositories, which are usually proprietary products. Public repositories are free for everyone.
Other popular companies that use Git and GitHub for source code hosting:
GitHub is by far the most popular Git repository hosting service. As of October 2018, there were 31 million users across the globe with over 96 million repositories. As of October 2017, GitHub had almost 5x as many users as the next most popular source code hosting site, Bitbucket.
Some alternatives to Git include Mercurial and Subversion (also referred to as SVN). Mercurial was also created by a member of the Linux Kernel team when they lost access to the proprietary VCS they were using at the time. SVN used to be super popular, but has lost momentum in recent years as it's a centralized VCS, meaning that users must connect to a central server to use it (unlike Git and Mercurial which are distributed VCSs that require no internet connection, and most work can be done locally).
The main disadvantage with both Git and GitHub is that they both have a fairly steep learning curve.
I know some developers who work on teams that don't use version control and it sounds like a nightmare. Bugs that could be caught using version control sneak into production and there's no place to conduct code reviews or see history on the projects. I personally can't imagine working on a codebase without version control and would argue that some VCS competency is a must for any developer.
Most software development jobs require some knowledge of Git and GitHub. Other job areas that might require some Git and GitHub knowledge are data science, technical documentation, project manager, design, and product manager roles.
Learning Git will provide beginners with another tool that they'll likely use on the job so I think it's a great idea.
GitHub is also a place where many prospective employers look at applicants:
Git and GitHub are incorporated into the Turing curriculum from day one. Students are given a lesson during their first module to go over basic commands and workflow. In later modules they're graded on their ability to conduct proper pull requests and code reviews during feature development. We also introduce more complex rebase workflows they may use in their future jobs.
We've taught GitHub from the beginning because GitHub was the first Git repo hosting service and is also the most commonly used. It has also been fundamental in making Git more approachable for folks learning it. On a more personal organizational level, GitHub was built in Ruby (the first language that we taught at Turing) and has been really supportive to the Ruby community. The organization has also been super supportive of us as a non-profit organization by providing unlimited free public and private repositories, and it just so happens that we have 10 or so alumni and former instructors who work there.
There are a ton of great resources out there for learning Git and GitHub.
Here are some of my favorites below:
Our guide to choosing between Dataquest and Datacamp – two self-paced, online data science classes.
November was full of news about fundraises (and layoffs) and updates on past acquisitions!
Every coding bootcamp will teach CSS – but what is it? Get a head start with this guide to CSS!