Git Cherry Picking Across Forked Repos and Empty Commits

Recently I found myself in a situation where I wanted to bring in a specific upstream commit into a forked repository. Although these repos share a common history, the two repos had diverged enough that it wasn’t a straight-forward cherry-pick between branches. Instead, with clones of the two repositories I managed to cherry-pick as follows:

git --git-dir=..//.git format-patch -k -1 --stdout  | git am -3 -k

To complicate things further, a few days later, I found myself wanting to do the same thing, however, this time a submodule and another file had diverged enough that the patch no longer applied correctly. To get around this I had to:

git --git-dir=..//.git format-patch -k -1 --stdout  | patch -p1 --merge

Manually fix any of the still broken changes, then create a new commit with the changes.

These two stack overflow questions helped to work both of these issues out: https://stackoverflow.com/a/9507417 and https://stackoverflow.com/a/49537226

Finally, I’ve also in recent months found myself wanting to create a completely empty commit to kick off a downstream build process… much like you may touch a file to change its timestamp. To do this you can simply run:

git commit --allow-empty -m "Redeploy"

Docker CFLAGS march=native and illegal hardware instruction

I’ve recently hit an interesting issue involving Docker containers and compiled C code crashing code with illegal hardware instruction errors.

In a nutshell the Docker image was built on a server using Jenkins. It has been running fine until the underlying host orchestration software and physical hardware changed.

My solution has been to remove the build of the C code from the Dockerfile and instead put in inside a script which runs when the container starts.

For my use case this solution will work, however, it will require the C code to be compiled every time the Docker container runs.

Digging a little deeper, it appears that the underlying root issue may be the use of the CFLAGS march=native flag in the C code’s underlying Makefile.

A few posts on Stack Overflow (in particular, https://stackoverflow.com/questions/54039176/mtune-and-march-when-compiling-in-a-docker-image), suggest that removing this flag may make little difference to the run time of the compiled code. Which, again depending on the use case, may be a good option to try as well to resolve these problems.

Site is now on AWS

I’m in the process of migrating the hosting of this site across to AWS.

So far everything is going smoothly, except for one of the domain alias which redirects here.

Merging a git repository from upstream when rebase won’t work

I use a lot of open source software in my research and work.

In recent months I’ve been modifying the source code of some of open source repositories to better suit my needs and I’ve contributed a few small changes back to the DeepLearning4J and the Snacktory projects.

This morning I’m starting to work on a further patch for the DeepLearning4J repository and I needed to bring my local repository up to date before committing the change. However, at some point over the past few months the DeepLearning4J repository has been rebased and my fork of it will no longer merge.
The usual approach for fixing this is to use the command:

git rebase upstream/master

However, for me this produces an error:

git encountered an error while preparing the patches to replay
these revisions:

As a result, git cannot rebase them.

Despite trying on two different computers similar errors occurred.

As I didn’t want to delete my entire repository and create a whole new fork of the upstream master this is the approach I took to fix the problem:

Backup the current master into a new branch:

git checkout -b oldMasterBackupBranch
git push origin oldMasterBackupBranch

Switch back to the master branch and replace it with the upstream master

git checkout master
git remote add upstream url/to/upstream/repo
git fetch hard upstream master
git reset --hard upstream/master

Push the updated master my github fork

git push origin master --force

This StackOverflow question helped a lot in working out this problem: Clean up a fork and restart it from the upstream

Reverting VirtualBox Guest Additions 5.0.22 on Ubuntu 16.04

Earlier today I upgraded my installation of Virtualbox on Windows 10 from 5.0.20 to 5.0.22. This also came up with an update to the guest additions which I installed into my Ubuntu 16.04 guest machine.

After a reboot Ubuntu would no longer boot and instead would flicker as it tried to change the screen resolution inside the virtual machine. This is a known problem in VirtualBox (see: https://www.virtualbox.org/ticket/15526).

I’ve managed to revert VirtualBox and the guest additions back to 5.0.20 using the following steps:

  1. On the host OS (in this case Windows 10) reinstall VirtualBox 5.0.20 over 5.0.22 – this was straight forward, I found the old installer in my downloads directory.
  2. Open VirtualBox and start the Ubuntu guest machine, immediately hold down the left shift key. This must be done before the purple splash screen.
  3. At the grub menu that loads select “Advanced Options for Ubuntu” and then select the second menu entry. This entry should contain the word “upstart”.
  4. Once Ubuntu has booted to a console, login.
  5. Switch to the currently installed VirtualBox guest additions install directory
    cd /opt/VBoxGuestAdditions-5.0.22/

    and try and run

    sudo ./uninstall

    I didn’t have much success with this.

  6. Insert the guest additions CD from VirtualBox’s top menu bar.
  7. Mount the CD Rom: 
    sudo mount /dev/cdrom /mnt/cdrom
  8. Switch to the cdrom directory
    cd /cdrom
  9. Reinstall the old version of the guest additons
    sudo ./VBoxLinuxAdditions.run
  10. Reboot Ubuntu and everything “should” be fine.

Git logs and commits across multiple branches

Like any good computer scientist I use git for many research and personal projects. My primary use of git is for code backups rather than collaborating with others. However, in some of my recent work I’ve been sharing repositories with colleagues and students which has caused me to improve my git skills.

The following is some of the functionality I’ve only recently discovered that has been extremely helpful:

git cherry-pick commit-id-number

This command proved very useful when I recently forked a github repo and made some changes to the source code for the specific project I’m working on. I soon discovered a bug in the original repository that a number of users had reported. I was able to fix the bug in my fork, but as my fork had changes that I didn’t want to contribute back to the original repository I was able to use the cherry-pick command to bring across only the specific commit related to the bug fix.

git checkout --theirs conflicted_file.php

Merge conflicts suck. But sometimes despite trying to pull as often as possible they still occur and can full your code with ugly messes to clean up. I recently wanted to throw away my changes to a file and simply use the latest committed file. By using git checkout –theirs I was able to throw away all my changes and go for the file that had been committed and conflicted with my changes. Conversely, you can use –ours to replace the conflicted file in favour of local changes.

git shortlog

During the past few weeks the students in the course I’ve been teaching this semester have used git to collaborate on group projects. The git shortlog command produces a list of commits grouped by each author allowing you to quickly see the relative rate at which people are contributing commits to a repository.

git branch -a

When you clone a remote repository it pulls in all branches from the remote repository. However, if you just type git branch you won’t see this. The -a flag allows you to see everything.

git log --all

The same issue applies when you are trying to see the log across all commits across all branches, just using the standard git log command will only produce the log for the current branch. The -all flag allows you to see the log across all branches, combining this with the cherry-pick command is very useful when you want to bring across just one set of changes rather than merging a whole branch.

git log --all --stat --author="Tom"

Bringing this all together I’ve begun to regularly use the above command to see all commits by a single user across all branches. This has been a good way to measure students’ contributions to a group project (note: the author option is case sensitive).

Code Structure, Random Number Generation and Conditional Probability

I had an interesting discussion with a group of students this afternoon which highlighted how important it is to understand conditional probability and how code structure can produce unexpected results.

The students’ code contained three lists and a random number generator. The random number generator was meant to randomly chose which list to take something from with equal probability (that is 1/3 chance of selecting from each of the lists). However, somehow one of their lists was being selected far more than the others.

Their code looked similar to the following:

if (random.nextInt(3) == 0) {
  // select from list 1
} else if (random.nextInt(3) == 1) {
  // select from list 2
} else {
  // select from list 3
}

Can you spot the bug? Why would this not generate equal probabilities of selecting from each list?

I wrote some of my own code to test this code structure almost 100 million times:
int count0 = 0;
int count1 = 0;
int count2 = 0;

// loop almost 100 million times
for (int i = 0; i < 99999999; i++) {
  if (r.nextInt(3) == 0) {
    count0++;
  } else if (r.nextInt(3) == 1) {
    count1++;
  } else {
    count2++;
  }
}

Which generated the results:
Count0 = 33333271
Count1 = 22221788
Count2 = 44444940

Which demonstrates this code doesn’t generate equal probabilities.

The problem is the second random number generation:

if (r.nextInt(3) == 0) {
  count0++; // this has 1/3 probability (0.333333333333333)
} else if (r.nextInt(3) == 1) { // this else will run 2/3 of the time
                                // and resolve true 1/3 of the times it runs

  count1++; // 1/3 * 2/3 this line will run... i.e. 2/9 probability (0.22222222222)
} else {
  count2++; // 2/3 * 2/3 this line will run... i.e. 4/9 (0.4444444444444)
}

Restructuring the code to only generate the random number once and storing this in a temporary variable makes a huge difference:

int rInt = r.nextInt(3); // only generate a random number once.
if (rInt == 0) {
  count0++; // this will run 1/3 of the time.
} else if (rInt == 1) {
  count1++; // this will run 1/3 of the time.
} else {
  count2++; // this will run 1/3 of the time.
}

And the results of this are:
Count0 = 33333907
Count1 = 33332586
Count2 = 33333506

Which looks much better.

(Note: alternatively you could make the second random number generation be out of 2 rather than 3 but this would be slower as the random number generator will take a few CPU cycles to calculate.)

Completely destroying all data on a Hard Drive

A few days ago while clearing through some old boxes of computer equipment I discovered an old hard drive. This drive had been removed from an old computer that had been disposed of. At the time of disposal I copied all information from the old computer its replacement and kept the old hard drive as a backup in case something went wrong.

Now more than five years later I no longer need the backup and want to dispose of the physical hard drive. But first, I want to ensure that the drive is completely clear of the old data. Connecting the drive to my current computer it can still mount and read the old drive and I can see all the old files on it. It’s good that the backup has lasted this long but to completely wipe the drive of all this old personal data is a little more complex than just selecting all the contents and pressing the delete button or doing a reformat under Windows.

Completely destroying all data on the drive is important. If the drive is not completely wiped (that is every single physical sector on the drive is written over) it is possible that someone using a few pieces of software could bring the old data back from the dead.

To completely nuke the drive I could pay for commercial software or take a hammer to the physical drive. But there is a free way to nuke the drive by using Ubuntu Linux and it’s quite simple to do:

  1. Boot Ubuntu (running from a live CD/USB should work too)
  2. Find the full device path of the drive you want to destroy by running at a terminal:
    sudo gparted

    If you don’t have gparted installed, then install it using

    sudo apt-get install gparted

    Then on the right hand side of the GUI window select from the drop down list of hard drives and check the partitions of each one to confirm the path of the device that you want to nuke is. For instance:

    /dev/sdN
  3. Shred the drive using the following command, replacing the path with the path you found in step 2.
    sudo shred –vzn 1 /dev/sdN

    This command will take a while to run. It will go over the entire drive writing random data into every single physical sector of the drive and then a second time writing zero into every single sector.

Once the command has finished your drive will be completely wiped. It can now be reformatted and reused without any worry about someone being able to resurrect the previously deleted data.

Previous and Next Month Links in Archives

It has just taken me around an hour to try and insert some previous and next month navigation links to the blog monthly archives. It seems that WordPress doesn’t have any real native support for this time of navigation which is odd.

Anyway the code below is my very hacky version of how to do it. I placed this in my theme’s archive.php file


<?php //twentyeleven_content_nav( 'nav-above' ); ?>
<nav id="nav-above" style="display:inline">
<?php
$archive_year = get_the_time('Y');
$archive_month = get_the_time('m');
$archive_month = $archive_month - 1;
if ($archive_month == 0) {
$archive_month = 12;
$archive_year--;
}
$archive_month_next = $archive_month + 2;
$archive_year_next = $archive_year;
if ($archive_month_next > 12) {
$archive_month_next %= 12;
$archive_year_next++;
}
echo '<a href="' . get_month_link( $archive_year, $archive_month) . '">Previous Month</a>';
if ($archive_year_next.sprintf('%02d', $archive_month_next) < = date('Y', strtotime("now")).date('m', strtotime("now"))) { echo '<a href="' . get_month_link( $archive_year_next, $archive_month_next) . '" style="float:right">Next Month</a>'; } ?> </nav>

Site Meter Rewriting Links on WordPress Sites

This afternoon I discovered that a number of relative links on this site were being redirected through x.vindicosuite.com

At first I thought that this site may have been compromised. It hasn’t. After a quick Google search it turns out that the Site Meter tracking code has been changed and now silently rewrites links.

Fortunately for me I haven’t noticed any of these links inserting or redirecting people to ad sites, but there are stories online that suggest these actions may have happened.

The way in which this script has changed is really inappropriate and annoying. The offending script has now been removed from this site.