My favorite programming interview happened by accident. I rode a chairlift next to a computer vision researcher who worked in the private sector. I recognized the company. Its campus was close to my university, almost 300 miles away. I told him this and he said his team was hiring interns, which was great because I needed something to pay rent that summer. I had my first interview at the ski lodge cafeteria and my second interview onsite a week later.

The second interviewer asked me a series of questions about medians that built on each other with progressive difficulty. I liked how the interview used a consistent theme to test your ability to do big-O analysis and solve programming puzzles.

Question 1

Suppose you have an ordered list of N numbers. How quickly can you find their median?

The median of N numbers is defined to be the value that separates the lower half from the upper half. If those numbers are ordered it’s easy to compute. We’ll return the one located in the middle of that list when N is odd. We’ll return the average of both middle values when N is even.

Here’s a C++ example. Because the median could be an average of two integers, the return type of this function is double instead of int.

double median(vector<int>& nums) 
{
    int N = nums.size();

    return (N % 2 == 0) ? (nums[N / 2 - 1] + nums[N / 2]) / 2.0 
                        : nums[N / 2];
}

In short, we can find a median in Θ(1)\Theta(1) time by querying the list at N/2\lfloor N/2 \rfloor.

Question 2

Now suppose you can’t assume the list is ordered. How quickly can you find their median?

As an upper bound, we could first sort the list and then query it like we did in Question 1. This has a time complexity of O(NlogN+1)=O(NlogN)O(N\text{log}N + 1)=O(N\text{log}N). Can we do better than that?

Yes! The Quickselect algorithm applies the logic of Quicksort to the task of finding the kth smallest element in a list. Like Quicksort, it has O(N2)O(N^{2}) performance in the worst case (when the list is already sorted and you choose pivots poorly) and great performance on average.

The partition() function moves all values less than the pivot index value before that value in the list. It returns the index where that pivot value now lies.

int partition(vector<int>& nums, int left, int right, int pivotIndex)
{
    int pivotValue = nums[pivotIndex];

    swap(nums[pivotIndex], nums[right]);

    int swapIndex = left;
    for (int i=left; i<right; i++)
    {
        if (nums[i] < pivotValue) 
        {
            swap(nums[i], nums[swapIndex]);
            swapIndex++;
        }
    }
    swap(nums[swapIndex], nums[right]);
    return swapIndex;
}
int Quickselect(vector<int>& nums, int left, int right, int k)
{
    while (left <= right)
    {
        int pivotIndex = left + (right - left) / 2;
        pivotIndex = partition(nums, left, right, pivotIndex);

         /* Cases:
         * 1. kth smallest element
         * 2. too far to the right -> iterate left
         * 3. too far to the left  -> iterate right */

        if (pivotIndex == k)
            return nums[k];
        else if (pivotIndex > k)
            right = pivotIndex - 1;
        else  
            left = pivotIndex + 1;
    }
    return -1;
}
double median(vector<int>& nums)
{
    int N = nums.size();
    int k = N / 2; 

    if (N % 2 == 0)
    {
        return Quickselect(nums, 0, N - 1, k);
    } 
    else 
    {
        int first  = Quickselect(nums, 0, N - 1, k - 1);
        int second = Quickselect(nums, 0, N - 1, k);
        return (first + second) / 2.0;
    }
}

This approach runs in O(N)O(N) time on average.

Question 3

Now suppose you have two ordered lists. Assume they’re of equal size. How quickly can you find the median of all numbers in both lists?

For starters we can use the merge function from Mergesort to combine the two sorted lists into one larger list, then determine the median the way we did to answer Question 1. This has O(2N+1)=O(N)O(2N + 1)=O(N) time complexity and O(2N)=O(N)O(2N)=O(N) space complexity.

vector<int> merge(vector<int>& nums1, vector<int>& nums2)
{
    vector<int> result;
    int i = 0, j = 0;
    while (i < nums1.size() && j < nums2.size())
    {
        if (nums1[i] <= nums2[j]) 
        {
            result.push_back(nums1[i]);
            i++;
        } 
        else 
        {
            result.push_back(nums2[j]);
            j++;
        }
    }
    while (i < nums1.size())
        result.push_back(nums1[i++]);

    while (j < nums2.size())
        result.push_back(nums2[j++]);
}
double median(vector<int>& nums1, vector<int>& nums2)
{
    vector<int> allNums = merge(nums1, nums2);

    int s = allNums.size();

    return (s % 2 == 0) ? (nums[s / 2 - 1] + nums[s / 2]) / 2.0 
                        : nums[s / 2];
}

To improve on this, notice that we’re doing more work here than we need to do. The code above iterates through both arrays fully. merge() will continue adding elements to the final vector long after it passes the median. This time let’s define two new variables, leftMiddle and rightMiddle. We’ll slide them towards the position of the two middle elements in the combined list and stop once we reach it.

double medianOfSortedLists(vector<int>& nums1, vector<int>& nums2)
{
    int halfWay = nums1.size();
    int index1 = 0, index2 = 0;
    /* we know the combined array
    *  will have an even number of elements */ 
    int leftMiddle = INT_MIN, rightMiddle = INT_MIN;

    for (int i = 0; i <= halfWay; i++)
    {
        if (index1 == halfWay)
        {
            leftMiddle = rightMiddle;
            rightMiddle = nums2[index2];
            break;
        }
        if (index2 == halfWay)
        {
            leftMiddle = rightMiddle;
            rightMiddle = nums1[index1];
            break;
        }

        if (nums1[index1] <= nums2[index2]) 
        {
            leftMiddle = rightMiddle;
            rightMiddle = nums1[index1];
            index1++;
        }
        else 
        {
            leftMiddle = rightMiddle;
            rightMiddle = nums2[index2];
            index2++;
        }
    }
    return (leftMiddle + rightMiddle) / 2.0;
}

This solution has equal time complexity to the previous one but with better constants. In practice it will be faster. It’s space complexity is O(1)O(1) because a third vector is no longer needed.

Question 4

How does your previous answer change if you can’t assume the lists have equal size?

This part stumped me. It took several hints from the interviewer for me to see there’s a log-time solution. If you’re familiar with LeetCode you might recognize this as problem #4.

Even though list sizes may differ now – let’s call these sizes N and M – our goal is the same: We want to find a pair of indices that would partition the combined list into halves. These halves will be size N+M2\lfloor \frac{N+M}{2} \rfloor no matter where the median lies, but the portions of each list that belong to each half will vary. If the index in list one is higher, the index in list two must be proportionally lower.

We can use this detail to our advantage. Each index divides its list into two list segments, so let’s name the indices cutIndex1 and cutIndex2. left1 and right1 will hold the values that lie where the left and right segments of the first list meet. left2 and right2 will hold the values where the second list’s segments meet.

We know both lists are ordered. If left1 <= right2 then all values in the right segment of list two are greater than all values in the left segment of list one. If left2 <= right1 then all values in the right segment of list one are greater than all values in the left segment of list two. If both statements are true then we’ve correctly divided the lists.

We’ll use Binary Search to repeatedly shrink the range within each list where the correct index could be. The search will end once both conditions above are met. If N+M is odd, the median value is max(left1, left2). If the lists’ combined size is even, the median value is the average of max(left1, left2) and min(right1, right2).

int medianOfSortedLists(vector<int> nums1, vector<int> nums2) 
{
    if (nums2.size() < nums1.size())
        return medianOfSortedArrays(nums2, nums1);  

    int s1 = nums1.size();
    int s2 = nums2.size();
    // we know s1 <= s2
    int low = 0;
    int high = s1;

    /* goal: find a pair of indices (one in each array) that
    * split their arrays such that
    * all leftward values <= all rightward values */

    /* Binary Search the shorter array and 
    * pick top-side middles; we'll take the 
    * values there and before it */

    while (low <= high) 
    {
        int cutIndex1 = floor( (low + high) / 2.0);
        int cutIndex2 = floor( (s1 + s2 + 1) / 2.0) - cutIndex1;

        int left1 = (cutIndex1 == 0) ? INT_MIN : nums1[cutIndex1 - 1];
        int left2 = (cutIndex2 == 0) ? INT_MIN : nums2[cutIndex2 - 1];

        int right1 = (cutIndex1 == s1) ? INT_MAX : nums1[cutIndex1];
        int right2 = (cutIndex2 == s1) ? INT_MAX : nums2[cutIndex2];

        /* Cases:
         * 1. l's <= r's
         * 2. too far right
         * 3. too far left */

         if (left1 <= right2 && left2 <= right1)
         {
            if ((s1 + s2) % 2 == 0)
                return (max(left1, left2) + min(right1, right2)) / 2.0;
            else
                return max(left1, left2);
         } 
         else if (left1 > right2)
            high = cutIndex1 - 1;
         else
            low = cutIndex1 + 1;
    }
}

Because this souped-up version of Binary Search is run on the shorter list, the full algorithm’s time complexity is O(log(min{N,M}))O(\text{log}(\text{min}\{N, M\})). It has O(1)O(1) space complexity. I recommend this video for a longer explanation with examples.