Notes and best-practices
My evolving notes and on how to effectively use Git and GitHub.
Evolution of version control systems (and their shortcomings): from having differently named files and folders (error prone and inefficient) to local version control systems (LVCs) where files are checked out from a locally hosted database that contains the entire history (single point of failure and impractical for collaboration) to centralized version control systems (CVCSs) where files are checked out from a server-hosted database (single point of failure) to distributed version control systems (DVCs) where files and a database containing the entire history are checked out from a served-hosted database, so that each local node contains all information stored on the server (content on server can easily be restored in case of failure).
Git stores data as snapshots: at each new commit, modified files are replaced with a snapshot of their new state, while unmodified files are replaced with a link to the previous snapshot.
In a basic workflow I edit a file in the working tree (the locally checked out version of the project), add them to the staging area (also called index), commit them to the local database, and finally push them to the remote database on GitHub.
Neither main (or, formerly, master), nor origin have any special significance in Git. The reason they are widely used is that main is the default name for the starting branch when running
git initand origin is the default name for the remote repository when running
Git stores a single file as a blob, which is a backronym for binary large object and is a collection of binary data. It can store any type of data including multimedia files and images. Blobs in the git object database are stored named with a SHA-1 hash key of their content and containing the exact same content as the file would on my filesystem.
Interacting with remotes
A remote repository is a version of the project that’s hosted on a server, in my case always on GitHub. The default name Git assigns to the remote when I clone it is origin. (Technically, the remote could also be hosted on my machine but in a separate location from my working copy.)
git fetch <remote>downloads all new objects from the remote repository (e.g. including references to new branches) but does not automatically merge these objects into my local work.
git pullfetches and automatically merges the remote version of the local branch I’m currently on and merges it if the current branch is set up to track the remote. By default,
git clonesets up my local
mainbranch to track the remote version, named
To share work with the remote, I can use
git push <remote> <branch>(e.g. to share work from my local
origin/main, I can do
git push origin main).
Cardinal rule: don’t push stuff before you’re fully happy with it. Changing your local history is easy, changing the history on the server isn’t.
Changing message of last commit: (with an empty index) run
git commit --amend.
Changing the content of the last commit: stage changes you want to add, then run
git commit --amend. If you don’t want to change the commit message, append
Unstaging a staged file:
git restore --staged <pathspec>.
Undoing changes in the working directory and reverting a file to its state after the last commit:
git restore <pathspec>.
Changing multiple commits (edit, reorder, squash, delete, split, etc.):
git rebase -i HEAD~#, where
#is the parent of the last commit you want to edit (e.g. if you want to edit the last 3 commits,
HEAD~3will select commits
HEAD~2). More in docs
I’ve accidentally overwritten a file with content I meant to place in a new file, and now want to 1) save the new content under a different name and 2) restore the file I’ve overwritten. Solution: Just save the new content under a new name (temporarily deleting the file I’ve accidentally overwritten, and then use
git restore <name_of_overwritten_file>to restore the old version of the overwritten file.
I’ve deleted one or more commits (e.g. by using a hard reset) that I need to recover. First thing to try: run
git reflogto get a log of commits HEAD pointed to. Once I’ve identified the commit I need (ab1af) I can create a new branch that points to it using
git branch recover-branch ab1af. If there is no reflog, I can run
git fsck --fullto check the database for integrity and get a list of objects that aren’t reacheable. The commit I’m looking for will be labelled with
dangling commit, and I can create a branch pointing to it.
Removing a file from every commit (e.g. accidentally committed large data file):
git filter-branch --index-filter 'rm --ignore-unmatch --cached data.csv' HEAD. This can take a long time. One way to speed things up is to find the commit that added the file to the history and only filter downstream from there.
git log --oneline --branches -- data.csvwill list all commits that contain the file from latest to earliest. If the file was added in commit
a34s5, then I only want to rewrite commits
a34s5^..HEAD, which I can substitute for
filter-branchcommand. (In case I don’t know the name of the large file I want to remove, see here for how to find it. Finally, as advised in relevant section here, it’s best to first do this on a separate branch to test the behaviour before running it on
Undoing a (pushed) commit:
git revert <commit_hash>. Creates a new commit undoing the specified commit. To undo a range of commits, use
git revert <oldest_commit_hash>..<latest_commit_hash>. (More here).
Frequently used stuff and best practices
I’ve started to work on an issue I decide I don’t want to work on yet but I want to save that work. Just stash the work.
I’m working on a topic branch and discover another issue I need to fix first. What to do?
Create local version of remote branch that automatically tracks remote branch (if branch has unique remote counterpart):
git switch <branch>.
Reset vs restore vs revert vs rebase (docs here):
resetis about updating your branch by adding or removing commits from the branch,
restoreis about unstading files from the index or undoing changes in the working directory,
resetis about making new commits to undo changes made by other commits,
merge, is a way to integrate work from two different branches. But unlike
merge, which takes the endpoints of two branches and merges them together,
rebaseapplies changes from the branch you merge onto the branch you merge to in the order they happened and thus creates a linear history.
Adding a local repository to GitHub:
Initialise the local repo as a Git repository:
Add and commit all content of the local repo:
git add --all; git commit -m 'Intial commit'.
Create a GitHub repo and follow prompts:
gh repo create.
Think of HEAD as the last commit on the current branch (it’s a pointer to the current branch which is a pointer to the last commit on that branch), the index as the proposed next commit, and the working directory as a sandbox. Think of all of them as collections of files, or file trees.
When I switch to a branch, Git makes HEAD point to the new branch ref, populates the index with the snapshot of the last commit on that branch, and copies the contents of the index into the working directory.
Changing a file updates it in the working directory, staging it updates the version in the index with that of the working directory, and committing it updates the version HEAD points to with that of the index.
git reset --soft HEAD~moves the branch that HEAD points to to the parent of the last commit. The version of the file that HEAD points to now differs from the versions in the index and the working directory, which still contain the version of the last commit on the branch. Effectively, we’ve undone the last commit. You could now make changes to the index and then commit them, accomplishing the same as
git commit --amend. So,
git reset [--mixed] HEAD~moves the branch HEAD points to (just as
--softabove) but then also updates index with the content of the snapshot HEAD now points to. So,
git reset --hard HEAD~, does what the above does, but then continues and also updates the working directory with the content of the index. This forcibly overwrites files, which, if they haven’t been committed (in which case they can be recovered using the reflog), is unrecoverable. So,
git add, and all changes made in the working directory since the last commit.
We can undo multiple commits: to undo all commits since commit 9e5bf, simply run
git reset <option> 9e5bf.
resetis also handy to squash commits together. To squash the last three unpushed commits, use
git reset --soft HEAD~3. This moves the branch ref to the great-grandparent of the latest commit. Because the index remains unchanged, all changes committed after
HEAD~3now appear as staged and can be committed in a single new commit.
Writing good commits
Writing good commits (more here):
No whitespace errors (
git diff --check, probs integrated in
Each commit is a logically separate changeset.
Useful commit message using capitalisation and written in imperative style (“Fix bug” instead of “Fixed bug” or “Fixes bug”, to be compatible with Git’s auto generated messages) comprising a short summary (no longer than 50 characters so it fits on one line in log) followed by a blank line followed by a motivation for the changes and a description of the contrast between old and new behaviour.
Short SHA-1 hashes:
git show d921shows the commit with abbreviated SHA-1 hash
git show iss3shows the commit on the tip of the branch
Ancestry references: There are two different ways of ancestry selection which I think of as “horizontal” and “vertical” selection.
^selects different parents for merge commits:
git show d921^shows the first parent of the merge commit. By default, this parent is from the branch from which the merge was performed (frequently
git show d921^2shows the second parent, which is from the branch that was merged. I think of this as horizontal ancestor selection. Conversely,
~performs vertical ancestor selection in that it iteratively selects the first parent a specified number of times.
git show d921~shows the parent of
git show d921~2the parent of the parent,
git show d921~3the parent of the parent of the parent, and so on (the previous could also be written as
git show d921~~~). The two syntaxes can be combined:
git show d921~2^2shows the second parent of the grandparent of
d921, assuming the grandparent is a merge commit.
git log main..iss3lists all commits that are reachable from the
iss3but not the
mainbranch. Git substitues
HEADif one side of
..is empty, so to list local commits that aren’t yet on the remote, you could do
git log origin/main... Similarly, to select the last three commits, use
HEAD~3..HEAD, which selects all commits reachable from
HEAD~3, which are commits
Triple dot: to list commits that occur either on
iss3but not both, I can do
git log master...iss3, and to get arrows to indicate whether a commit is reachable from the right or left branch,
git log --left-right master...iss3.
- To search for commit message in log history:
git log --grep='regex'
- The section on ignoring files in this ProGit book chapter is excellently clear and provides very useful examples.