BETA version 0.4.5 "Defiant"
Report bugs to beta@denyconformity.com

DenyConformity.com

The Basics of Git

I want to try to explain how Git works, using a metaphor I just came up with. I'm posting it here just so I can share it with anybody who might be confused about Git, or version control in general. If you have no interest in those things, you don't have to continue reading, because this will either be boring or make no sense to you. If you're on the other end of the spectrum and you are some sort of Git expert, you will probably also find this pretty boring (hopefully not completely inaccurate, though).

This exploration does assume that you have a passing familiarity with Git. You've done some cursory reading or a tutorial or two on how Git works. My intention here is not to tell you the absolute basics, but to help you wrap your head around a really complicated subject.

Anyway, let's dive in (pun intended - you'll see in a second).

Git done

Git is the current standard system for doing what's called Version Control. Put simply, a Version Control system is used by programmers to keep track of changes to files. It allows an entire team to work on the same set of code without stepping on each others' toes (at least not much), and it allows them to stay in sync with the latest code updates at any given point. Also it tracks everything so you can easily see who broke what, if something were to break. I don't want to get into the specifics about how Git is unique and different from other Version Control systems, because that would take all day probably, and I'm already going to take all day just to explain the basics. I want to explain in simple terms how Version Control systems work, from the perspective of Git, since that's what most people use.

Version control works with a thing called a Repository. A Repository is essentially a folder, stored somewhere in the cloud, full of other folders and files. It is the entire project, be it an app for tracking your garden, the Reddit website, or, I don't know, a website for posting pictures of dogs barking at watermelons or something. Every project generally has its own Repository. There are plenty of resources to explain Repositories and whatnot, so I won't waste time re-explaining the wheel. I'll just get right to putting my own spin on it in the form of complicated metaphor.

The Complicated Metaphor

Think of a Repository like a river. It is a collection of stuff, flowing in one direction. Think of that stuff, the water and rocks and fish and stuff, as the files that are stored in the Repository. When that stuff gets changed, those changes are present in the river from that point onward. If you want the rocks to be a different shape, or you want to add another fish, you can do that. The river will only have those new things from then on. Everything is constant downstream, and nothing flows up-stream.

It's a special river, though - a river of the Imagination. Like any good thought experiment you kind of have to meet me half-way. Suppose that this special river is so densely populated with stuff that we can take a bucket (of the Imagination) and take a big scoop of the river at any point such that our bucket will contain everything in the river at that specific point.

To put it another way, we can scoop out an exact duplicate of how the river exists at that point in its life. Any changes made to the stuff in the river before that point - any species of fish that we added or changes we made to the shape of the rocks or whatever - will be represented in our bucket. Any changes made after the place we scooped our bucketful will not be present in said bucketful. Are you with me so far?

Oh, right, I keep forgetting that I exist as text and you can't really answer my questions. I'll just assume I haven't lost you yet.

This process (scooping a bucketful of the river) is called either "cloning" or "checking out" or doing a "pull" of the repository. Whenever I copy my repository to my computer, I'm scooping up a bucketful of water from the latest point that the river has reached. When I make a "commit" (followed by a "push") to the repository, I am filling my bucket with all of the changes that I want to introduce into the river, further down stream. If we're being pedantic (and since we're talking about computer programmers, that's a given), I will also plant a flag in the riverbank at the point where I dumped my commit. The flag explains exactly what I added at that point, so I can always keep track of what went where.

What makes Version Control powerful is that the river always exists, and any time I dump a bucket into it I'm just adding it further down-stream. The water and all that stuff still flows exactly as it always has at every point up to the new commit. Any time I want, I can stroll up-stream and revisit the river as it existed before my meddling hands turned all the fish purple or something. I can even get a scoop of how it existed at any point and go from there.

To relate the river to Git, we can name the river "origin" and call this flow of stuff the "master" branch.

Do the thing

I'm going to create a repository on Github and call it "coolwebsite" - this is my river. It will live at http://github.com/shauvonm/coolwebsite (because ShauvonM is my github user account). To interact with my river, I'll use the Git command line interface, because that's the most efficient way to do that.

Just to be clear, I have not actually created the coolwebsite repository - that's just a made up thing for this thought experiment.

> git clone https://github.com/shauvonm/coolwebsite
This command scoops up a bucketful of my river and dumps it out into a folder on my computer. There are many other ways to create and / or check out a repository, but in my opinion this is the easiest, and it's just an example anyway. Do whatever works for you.

Now I can add a file to my cool website. The bare minimum that you need for a website is an index.html page, so I'll just throw that out there so I can have something to work with.

> git add index.html
Assuming I'm inside the cool website directory in my command line (and assuming I have created that file somehow), this will add specifically that file to my bucket. Normally I would use the command "git add -A" (note the capital A) to add every change made and be done with it, but I felt like being specific here. When in doubt, just use "git add -A".
> git commit -m "Adding index.html file"
This sets up my bucket to be ready to dump into the river. I also add a message here which is what shows on the flag I'll stick into the riverbank where I dump my bucket. You have to add that message whenever you do a commit. You don't have to be like me and make all of your commit messages Iambic Pentameter (I'm not kidding), but please don't be like the dude I used to work with who always just made his messages "". That helps nobody.
> git push
This is how I dump my bucket into the river, thus adding my "index.html" fish to the ecosystem. The river will contain that file from now on, unless I removed it.

If someone else makes changes to the repository and I want to work with those changes, I can get another bucket of the river to update my local copy of the river.

> git pull
In other words, grab a bucketful of any new stuff from the river. If nothing has changed from the bucketful you already have, pull won't do anything.

In the future, if I want to add more files or change the contents of my index.html file, I can do that through the same process - add those changes to my bucket, commit to those changes with a message, and then push those changes to the repository. Eventually, I'll have all the stuff I need for a super cool website.

If I want to make big changes, I can do it in a separate branch.

Rivers Can Have Branches, Okay?

Usually a big project will reach a point where it has reached a certain point.

. . . let me take another crack at that . . .

Eventually a project hits a major landmark, and everything so far works great. Now it's time to begin a new big feature or change which could really impact the rest of the project. For example, let's say I really like how my river is shaping up, and all of the rocks, fish, plants, and other stuff are working well together. However, I want to experiment with changing one of the fish into a frog. It might take some time to do that, and in the meantime the river will be all weird with some sort of half-fish half-frog thing getting in the way. What I want to do is create a duplicate of my river so I can make these changes without upsetting the "master" flow. I can simply split the river so that now it flows in two streams, "master" and a new stream which I will call "frog."

In Git, this is called a branch. The river now flows down both paths, which each start with exactly the same stuff. I can make whatever changes I want to the "frog" branch, and it won't effect "master" (and vice-versa). They will continue flowing independently of each other forever.

> git checkout -b frog
This command both creates a new branch called "frog" and switches to that branch. Just like with cloning, there are other ways to create a branch but in my opinion this is easiest. In order to actually create this branch on the River Origin, though, we have to make sure we push it to the repository.
> git push --set-upstream origin frog
This tells the repository that we have a new branch called "frog" on river "origin" and it makes sure that our bucket is set up to dump into that branch. Now if we check the repository on Github, we would see that there's another branch called "frog."

We can make updates to the frog branch just like we did to master earlier, using "git add," "git commit -m," and "git push." This will only update the frog branch, because it is now flowing in a different direction than the master branch.

They are still flowing nearby, though, so I can easily update one with the other's stuff. At any point I can update the "frog" branch with any updates I make to "master" by taking a bucket of water out of master and dumping it into frog.

> git checkout master
Now we're back on the master branch.
> git pull
Grab a bucket of water from master, to get any changes that have been made to it.
> git checkout frog
Return to the frog branch.
> git merge master -m "Updating with the latest code on Master"
Dump anything in the master bucket into the frog bucket, updating frog with whatever changes made to master since we made the new branch. If no changes were made to master, nothing will be changed on frog. EDIT: I originally didn't mention that you should include the "-m" flag when you make merges, just like you do with commits (because when you make a merge, it technically also creates a commit) You should include a message each time you do a merge or a commit.
> git push
Don't forget this command (I often do), because without this you won't actually dump your bucket into the river.

Once I finish my grand experimentation and I'm ready to introduce this frog-that-was-a-fish into the master branch of River Origin, I can easily merge the two branches back together.

> git checkout master
Return to the master branch.
> git merge frog -m "Adding Frog to the river."
This will take the two streams and merge them back together. EDIT: Don't forget your commit message.
> git push
Remember, your changes aren't actually dumped into the river unless you use this command.

Now River Origin is whole again. From here on, it will flow from this point with all of the changes made to the origin branch and the frog branch. If I turned around and looked at the river, it would split off for a while into two streams, before coming back together again. If I wanted to go back to how master was before the frog was added into it, I can do that by just returning to that point in the river.

Branching is a tool that allows me to make changes to a repository in a safe, separate space. How I use branches is up to me (or my team). Some people create a new branch for every new feature, task, or change they want to make. Some people use branches to keep track of different versions of their code. Often the "master" branch is reserved for only the latest released version of the code, and all changes made until the next release are done on a separate branch. Some people don't use branches at all. Those people are probably okay with their peas touching their mashed potatoes on their dinner plates, and they are therefore monsters.

Anyway, there's one more concept in Git that I want to explain using this river concept: forks.

The Fork that isn't

Let's suppose that you, completely on your own and not affiliated with me at all, see my "cool website" river and think it would work perfectly for your own cool website. My repository is on Github and it's public, so it's totally within reason to make a copy of it for your own purposes. The legal, ethical, and technical specifics of all this is beyond the scope of what I'm talking about, so let's just ignore it for now.

You can take a bucketful of my river (check out or clone the repository) to your own computer and go to town, but any changes you make would be breaking my precious river that I have spent so much time on. You don't want your unique changes affecting my river, where they aren't applicable.

You could make a branch, then, where you could change whatever you want without getting in my way. The problem is that this will still clutter up my river with branches that you have no intention of ever merging back into the master stream. This is still not ideal, because there's only one master branch, which won't end up being the "master" branch of the crazy stuff you're doing. Someone else who comes along to look at the river will be super confused about what's going on.

So you don't want to mess with my river or even make a branch from it. What you want is your own river. In Git, this is called a Fork.

The best way to fork a repository is to just click the "fork" button in Github. I'm sure there's a command for this, but it won't be easier than just clicking that button. When you do, it will create an entirely new repository in your Github account which is an exact duplicate of the one you're forking. You take a bucketful of the master branch of the first river and you create an entirely new river from the contents of that bucket. This won't copy all the branches or the history of the first river - you're getting a whole new river. Now you can go nuts doing whatever you want with your river and the one you forked from will not be affected in any way.

Okay, okay, I realize that what we call a "branch" in Git is more like a "fork" in a river, what we're calling a "fork" in Git is more like a "clone" and a "clone in Git is more of a "fetch" or something. In my opinion, the terms for things in Git are kind of, well, stupid, but that's what they're called so that's what we call them. Welcome to computering, where forks aren't forks, your laptop has a desktop, and the cloud is definitely not a cloud.

Anyway, by making a fork, you can change whatever you want and I don't have to worry about it doing anything to my river. What if you discover a bug in my river (which I guess in this analogy could be an actual bug) and you have a good solution for fixing that bug? You can put a bucket together which includes a fix or change and you can submit it to me to have me dump it into my river. I can take a look at the contents of your bucket and choose to add it into my master branch. In Git, this is a "pull request." You are making a request to me, the repository owner, to pull your changes into the original repository.

Again, I'm sure there's a command for this, but it's easiest to just do it through Github, on the page for your fork. This is a pretty advanced scenario that you may never even encounter, but still it's good to know about.

Pull requests happen all the time on some of the bigger repositories. Take a look at the Reddit repository and check out the "pull requests" tab. Those are all buckets that people have submitted to the Reddit team to change something about how Reddit works. They can discuss each item and decide to either dump it in or reject it. Yes, this means that literally anyone in the world could potentially make changes to Reddit (or make a fork and start their own Reddit, theoretically). That is the power of open source and version control.

Let's look at one last scenario.

One Last Scenario: Updating a Fork

Say you have a fork - a duplicate - of my river that you are making changes to. I push some updates to my master branch which you would really like to have. Your fork was made from my river before I made those changes, though, so as far as your river is concerned, they don't exist. How do you get those updates into your river?

You don't have to go through and make all the same changes yourself. It's actually just a matter of a few commands.

> git remote -v
This will list all of the URL's linked to your clone of the repository, under the "origin" label (remember, that's what we called our river). Basically, this is where your buckets go when you push, and where the buckets come from when you pull. When you make a fork, the only URL listed here will be your fork's (https://github.com/youruser/coolwebsite for example), not the original repository. We need to change that.
> git remote add upstream https://github.com/ShauvonM/coolwebsite
This will tell your computer to add the original repository as the "upstream" source for your fork. Your river is technically down-stream from mine, and by using this command you can make it official. Note that this URL is made up, you would replace it with the URL of whatever you have forked.
> git remote -v
Now you'll see my URL listed with yours, under the "upstream" label. Your river is always "origin" - the river you forked from is "upstream."
> git fetch upstream
This will pull down the latest state of upstream. In other words, it fills up a bucket with the latest stuff in my river.
> git checkout master
Switch to the master branch of your repository. You can replace "master" with whatever branch you want to merge my code into.
> git merge upstream/master -m "Updating my fork from upstream."
This pulls in the changes from the master branch of the upstream river and dumps it into your river. This is the same thing I did earlier to update a branch in my river. This time you just specify that you want my (upstream) master branch instead of your (origin) master branch. EDIT: Again, include a commit message to future-you here.

Hopefully at this point you know what comes next.

> git push
Never forget to push.
Now You Are An Expert

Hopefully now you have a working mental image for how Git repositories work, and how you can use branches and forks. This just scratches the surface, of course, and it doesn't get into what might happen if, for example, you have conflicts when you try to merge branches, or how you plug in your git credentials. However, it's a start, and on a day-to-day basis this encompasses about 90% of my interaction with Git on a major project out in the wild (and I have been on quite a few).

I know this stuff is really complicated, especially when using the command line to interact with it. However, once you get that mental image in your head that helps you visualize what's going on in a repository, the command line tools will be second nature. Hopefully this whole river metaphor will help you develop that mental image.

Please wait...
advertise here!
Top
click for gallery
order a print
close
<
>
order a print