Large data files and codes with filenames like "module_12-24-2017_2018_updated_v2_working" can be quite a pitfall in scientific computing. Version control can simplify scientific computing projects by saving the history of their development, as well as providing a simple backup. The most popular version control software is Git and Github. Git is the version control software and Github is an online repository that backs up the files and version history. Git and Github can be quite intimidating, as there are many guides that introduce git with opaque terminal commands and obtuse explanations. The following steps show how I set up Github to manage versions of my research project.
My initial project folder looked a lot like this.
Backups were occasionally taken when there was important data to be saved and looked at later, or when a large amount of code was changed. If there were small changes, the particular lines that were replaced were commented out until they were forgotten about or deleted later. This was not a good way to manage a project that was growing larger and larger the longer I worked on it.
So I signed up with Github using my .edu email account (this gets students some free services) and created a repository.
Repositories are what version control software calls their managed folders. While my code and data is hosted on Github, there is a private repository option so that only I and whom-ever I choose can view it.
To manage git on my computer, I downloaded the Github Desktop software and signed in using my Github account. Then I setup my new repository by "cloning" it into a folder of my choosing. To make sure that nothing gets deleted, I make a backup of my original files and then copied them back into my new folder/repository.
The Github desktop app sees all the changes that are made in this folder and keeps track of them. But let's say that I have finished writing and running code for the day and I want to save all these changes. I will have to "commit" them to save a version of them in Git. Before committing, these changes show up in the desktop app on the left, where a small commit description is required (just to jog your memory).
Pressing "Commit to main" commits these changes to the version control software. Pushing these changes communicates to Github and syncs the version control and files with Github.
One small snag that I ran into was that I had many files over 100MB in size. To sync these requires another piece of software called "git lfs" which needs to be installed and configured before Github desktop recognizes that it is there and allows you to sync these large files.
Installation and configuration of Git LFS
In the next post, I will detail a shell script that I have written that manages all of this for me automatically when I run it at the end of the day.
No comments:
Post a Comment