Git
Notes and best-practices
My evolving notes and on how to effectively use Git and GitHub.
Creating a new project/repo
My usual workflow is the following:
I create a local directory for my project
Initialise Git:
git init
Create a GitHub repo:
gh repo create
(you need the GitHub CLI for this)
Workflow
I code in Neovim in the terminal, within tmux sessions.
I use
gitgutter
to show me git diff markers in the sign column and to access hunks of changes easily.I use gitgutter to preview (
,hp
), stage (,hs
), and undo (,hu
) hunks.
Git
Evolution of version control systems (and their shortcomings): from having differently named files and folders (error prone and inefficient) to local version control systems (LVCs) where files are checked out from a locally hosted database that contains the entire history (single point of failure and impractical for collaboration) to centralized version control systems (CVCSs) where files are checked out from a server-hosted database (single point of failure) to distributed version control systems (DVCs) where files and a database containing the entire history are checked out from a served-hosted database, so that each local node contains all information stored on the server (content on server can easily be restored in case of failure).
Git stores data as snapshots: at each new commit, modified files are replaced with a snapshot of their new state, while unmodified files are replaced with a link to the previous snapshot.
In a basic workflow I edit a file in the working tree (the locally checked out version of the project), add them to the staging area (also called index), commit them to the local database, and finally push them to the remote database on GitHub.
Neither main (or, formerly, master), nor origin have any special significance in Git. The reason they are widely used is that main is the default name for the starting branch when running
git init
and origin is the default name for the remote repository when runninggit clone
.Git stores a single file as a blob, which is a backronym for binary large object and is a collection of binary data. It can store any type of data including multimedia files and images. Blobs in the git object database are stored named with a SHA-1 hash key of their content and containing the exact same content as the file would on my filesystem.
Interacting with remotes
A remote repository is a version of the project that’s hosted on a server, in my case always on GitHub. The default name Git assigns to the remote when I clone it is origin. (Technically, the remote could also be hosted on my machine but in a separate location from my working copy.)
git fetch <remote>
downloads all new objects from the remote repository (e.g. including references to new branches) but does not automatically merge these objects into my local work.git pull
fetches and automatically merges the remote version of the local branch I’m currently on and merges it if the current branch is set up to track the remote. By default,git clone
sets up my localmain
branch to track the remote version, namedorigin/main
.To share work with the remote, I can use
git push <remote> <branch>
(e.g. to share work from my localmain
branch withorigin/main
, I can dogit push origin main
).
Undoing things
Cardinal rule: don’t push stuff before you’re fully happy with it. Changing your local history is easy, changing the history on the server isn’t.
Changing message of last commit: (with an empty index) run
git commit --amend
.Changing the content of the last commit: stage changes you want to add, then run
git commit --amend
. If you don’t want to change the commit message, append--no-edit
.Unstaging a staged file:
git restore --staged <pathspec>
.Undoing changes in the working directory and reverting a file to its state after the last commit:
git restore <pathspec>
.Changing multiple commits (edit, reorder, squash, delete, split, etc.):
git rebase -i HEAD~#
, where#
is the parent of the last commit you want to edit (e.g. if you want to edit the last 3 commits,HEAD~3
will select commitsHEAD
,HEAD~1
, andHEAD~2
). More in docsI’ve accidentally overwritten a file with content I meant to place in a new file, and now want to 1) save the new content under a different name and 2) restore the file I’ve overwritten. Solution: Just save the new content under a new name (temporarily deleting the file I’ve accidentally overwritten, and then use
git restore <name_of_overwritten_file>
to restore the old version of the overwritten file.I’ve deleted one or more commits (e.g. by using a hard reset) that I need to recover. First thing to try: run
git reflog
to get a log of commits HEAD pointed to. Once I’ve identified the commit I need (ab1af) I can create a new branch that points to it usinggit branch recover-branch ab1af
. If there is no reflog, I can rungit fsck --full
to check the database for integrity and get a list of objects that aren’t reacheable. The commit I’m looking for will be labelled withdangling commit
, and I can create a branch pointing to it.Removing a file from every commit (e.g. accidentally committed large data file):
git filter-branch --index-filter 'rm --ignore-unmatch --cached data.csv' HEAD
. This can take a long time. One way to speed things up is to find the commit that added the file to the history and only filter downstream from there.git log --oneline --branches -- data.csv
will list all commits that contain the file from latest to earliest. If the file was added in commita34s5
, then I only want to rewrite commitsa34s5^..HEAD
, which I can substitute forHEAD
in thefilter-branch
command. (In case I don’t know the name of the large file I want to remove, see here for how to find it. Finally, as advised in relevant section here, it’s best to first do this on a separate branch to test the behaviour before running it onmain
.Undoing a (pushed) commit:
git revert <commit_hash>
. Creates a new commit undoing the specified commit. To undo a range of commits, usegit revert <oldest_commit_hash>..<latest_commit_hash>
. (More here).
Frequently used stuff and best practices
I’ve started to work on an issue I decide I don’t want to work on yet but I want to save that work. Just stash the work.
I’m working on a topic branch and discover another issue I need to fix first. What to do?
Create local version of remote branch that automatically tracks remote branch (if branch has unique remote counterpart):
git switch <branch>
.Reset vs restore vs revert vs rebase (docs here):
reset
is about updating your branch by adding or removing commits from the branch,restore
is about unstading files from the index or undoing changes in the working directory,reset
is about making new commits to undo changes made by other commits,rebase
, likemerge
, is a way to integrate work from two different branches. But unlikemerge
, which takes the endpoints of two branches and merges them together,rebase
applies changes from the branch you merge onto the branch you merge to in the order they happened and thus creates a linear history.Adding a local repository to GitHub:
Initialise the local repo as a Git repository:
git init
.Add and commit all content of the local repo:
git add --all; git commit -m 'Intial commit'
.Create a GitHub repo and follow prompts:
gh repo create
.Done.
Understanding reset
Think of HEAD as the last commit on the current branch (it’s a pointer to the current branch which is a pointer to the last commit on that branch), the index as the proposed next commit, and the working directory as a sandbox. Think of all of them as collections of files, or file trees.
When I switch to a branch, Git makes HEAD point to the new branch ref, populates the index with the snapshot of the last commit on that branch, and copies the contents of the index into the working directory.
Changing a file updates it in the working directory, staging it updates the version in the index with that of the working directory, and committing it updates the version HEAD points to with that of the index.
git reset --soft HEAD~
moves the branch that HEAD points to to the parent of the last commit. The version of the file that HEAD points to now differs from the versions in the index and the working directory, which still contain the version of the last commit on the branch. Effectively, we’ve undone the last commit. You could now make changes to the index and then commit them, accomplishing the same asgit commit --amend
. So,--soft
undoesgit commit
.git reset [--mixed] HEAD~
moves the branch HEAD points to (just as--soft
above) but then also updates index with the content of the snapshot HEAD now points to. So,--mixed
undoesgit commit
andgit add
.git reset --hard HEAD~
, does what the above does, but then continues and also updates the working directory with the content of the index. This forcibly overwrites files, which, if they haven’t been committed (in which case they can be recovered using the reflog), is unrecoverable. So,--hard
undoesgit commit
,git add
, and all changes made in the working directory since the last commit.We can undo multiple commits: to undo all commits since commit 9e5bf, simply run
git reset <option> 9e5bf
.reset
is also handy to squash commits together. To squash the last three unpushed commits, usegit reset --soft HEAD~3
. This moves the branch ref to the great-grandparent of the latest commit. Because the index remains unchanged, all changes committed afterHEAD~3
now appear as staged and can be committed in a single new commit.
Writing good commits
Writing good commits (more here):
No whitespace errors (
git diff --check
, probs integrated infugitive
somehow).Each commit is a logically separate changeset.
Useful commit message using capitalisation and written in imperative style (“Fix bug” instead of “Fixed bug” or “Fixes bug”, to be compatible with Git’s auto generated messages) comprising a short summary (no longer than 50 characters so it fits on one line in log) followed by a blank line followed by a motivation for the changes and a description of the contrast between old and new behaviour.
Selecting commits
Short SHA-1 hashes:
git show d921
shows the commit with abbreviated SHA-1 hashd921
.Branch references:
git show iss3
shows the commit on the tip of the branchiss3
.Ancestry references: There are two different ways of ancestry selection which I think of as “horizontal” and “vertical” selection.
^
selects different parents for merge commits:git show d921^
shows the first parent of the merge commit. By default, this parent is from the branch from which the merge was performed (frequentlymain
).git show d921^2
shows the second parent, which is from the branch that was merged. I think of this as horizontal ancestor selection. Conversely,~
performs vertical ancestor selection in that it iteratively selects the first parent a specified number of times.git show d921~
shows the parent ofd921
,git show d921~2
the parent of the parent,git show d921~3
the parent of the parent of the parent, and so on (the previous could also be written asgit show d921~~~
). The two syntaxes can be combined:git show d921~2^2
shows the second parent of the grandparent ofd921
, assuming the grandparent is a merge commit.Double dot:
git log main..iss3
lists all commits that are reachable from theiss3
but not themain
branch. Git substituesHEAD
if one side of..
is empty, so to list local commits that aren’t yet on the remote, you could dogit log origin/main..
. Similarly, to select the last three commits, useHEAD~3..HEAD
, which selects all commits reachable fromHEAD
but notHEAD~3
, which are commitsHEAD~2
,HEAD~
, andHEAD
.Triple dot: to list commits that occur either on
master
oriss3
but not both, I can dogit log master...iss3
, and to get arrows to indicate whether a commit is reachable from the right or left branch,git log --left-right master...iss3
.
Misc.
- To search for commit message in log history:
git log --grep='regex'
.gitignore
- The section on ignoring files in this ProGit book chapter is excellently clear and provides very useful examples.