Algorithms Weekly by Petr Mitrichev

Thursday, March 3, 2016

A strategic week

The last week started with the middle-of-the-night SRM 682 on Tuesday (problems, results, top 5 on the left). Continuing the recent trend, the hardest problem did not crack, and the winner was decided during the challenge phase, where subscriber has squeezed the victory - congratulations!

The week ended with two contests on Sunday: first, the Open Cup Grand Prix of Bashkortostan catered to the team contest lovers (results, top 5 on the left). Past Glory team was unstoppable once again, making just one incorrect attempt over 12 solved problems!

Problem F's statement is so short that I'll quote it verbatim: "For each vertice of given undirected weighted graph calculate the length of shortest simple cycle, which contains this vertice". The graph had 300 vertices. Is this a textbook problem?

And finally, Codeforces hosted the Final Round of 8VC Venture Cup 2016 for those who prefer to compete on their own (problems, results, top 5 on the left). The round had a $2500 prize for the first place, which tourist has won convincingly thanks to some super fast problem solving - great job!

My story in this round is all about strategy and tactics. Having solved problem A, I've read the statement for problem B first, which looked quite tedious, so I've decided to check other problems. Having read problem C, I've realized that it's a well-known problem which I don't remember the solution to, so solving it at this point would put me at a disadvantage. Finally, problem D also looked tedious but at least brought 2x more points than problem B.

I've glanced at the scoreboard at this point, and it turned out that tourist has already submitted B. It was pretty obvious at this point that following his strategy would not give me a win, since he was quite a bit ahead in time, so in order to win I needed a different strategy, so I've decided to implement D, then E, and then attempted F. My hope when starting F with almost 1 hour to go was to solve it, and then come back to either B or C, while tourist would not be able to finish F in time because he'd spend a lot of time on A-E.

My prediction about tourist's result turned out to be correct, but I couldn't execute on my strategy - problem F did not crack in 1 hour :( After the contest, I required 20-30 more minutes to get my solution to work.

Here's the tricky problem: you are given a tree with n nodes, one of the nodes being empty and other nodes containing all numbers between 1 and n-1 exactly once. One is allowed to add at most one arbitrary edge to the tree, and then move the numbers along the resulting graph, where at each move we can move a number to an adjacent empty node, and the goal is to reach the given ending state from the given starting state in the least number of moves. Implementation details aside, can you find the algorithmic solution in 1 hour?

I hope I won't need to rely on unusual strategies in today's Hacker Cup final round, which starts in less than 2 hours :) There should be at least a scoreboard, and maybe more live coverage at the competition page.

A solution week

The Feb 15 - Feb 21 week, somewhat surprisingly, didn't have any algorithmic contests among the ones that I usually cover. However, let's use this opportunity to dive into the solutions of the problems I mentioned before.

The first one was: you are given a set with 200000 numbers, and need to find the subset of that set where the difference between the mean and the median is the highest. The most tricky part of this problem is the median: unlike the mean, it's hard to update incrementally, so the natural idea is to start by iterating over which number will be the median (let's assume we'll pick an odd-sized subset for now). Having fixed the median, we need to add k numbers smaller than it and k numbers larger than it, maximizing the mean. Now it's easy to see that those numbers should be as large as possible: k larger numbers, plus k largest numbers right before out median in the sorted order. Now, if we iterate over all possible values of k, we obtain a O(n²) solution.

In order to speed up this solution, let's look more closely what happens when we increase k. We add two numbers to our set - the kth largest one, and the kth largest one among the ones before the median. Let's call those two numbers a and b. The mean before adding those numbers is c/d (where d is 2k+1 or something like that), and now it becomes (c+a+b)/(d+2). Is it smaller or larger than c/d? Well, the mediant inequality tells that this number lies between c/d and (a+b)/2. So if (a+b)/2 is larger than the current mean, the mean will increase, otherwise it will not. The final step is to notice that (a+b)/2 is always going down since we pick a smaller number for both a and b at each subsequent step. Because of this, we can see that the mean is a bitonic function of k: first increases, then decreases.

This allows us to use binary search on k, comparing c/d with (a+b)/2 inside to see if we should go to the left or to the right (of course, we could just compare two adjacent means, but that might overflow the 64-bit integer type), and obtaining a O(nlogn) solution for the problem.

The second one was: given a bipartite graph with 45 vertices, find how many subsets of its vertices are vertex-dominating. A subset of vertices is vertex-dominating if any vertex of the graph either lies on this subset, or has a neighbor in this subset, or both.

Given that we have 45 vertices, the logical first step is to pick the smaller of its two parts, which will contain at most 22 vertices. Now we need to find something that works in O(2ⁿ) or similar complexity.

We can now afford to iterate over which subset of this part will be included in the dominating set. Having fixed that, we still have to cover:

The vertices in the first part that are not included in the subset.
The vertices in the second part that are not adjacent to any vertex from the subset.

Part 2 is easy: we must just include all those vertices into our dominating set. Those vertices, in turn, will cover some vertices from part 1, but there might be some more left over. Now our state is: we have already included some subset A of the first part and some subset B of the second part in the dominating set, the entire second part is covered, and in the first part there's some uncovered subset C which we're still to cover using the some subset of the set D of the vertices of the second part, where D is the complement of B.

This could again be solved via dynamic programming in O(|D|*2^|C|) time, yielding overall O(n*3^n/2) complexity for the entire problem. However, for n=45 that is too slow.

Here comes the most magical moment of this solution: actually, the set D doesn't matter that much, so we can do this dynamic programming just once instead of separately for every A. One might say that of course D does matter, as we can otherwise pick a vertex from B when covering C, and overcount. However, from the construction above we can notice that vertices from B and C don't have any edges between them, and thus don't affect the cover - we can take an arbitrary subset of B in addition to some subset of D. Because of this, we'll actually count each solution exactly 2^|B| times, and we can just divide the result of the inner dynamic programming by 2^|B| to get the required count! Now we have a O(n*2^n/2) solution.

Thanks for reading, and please cheer for me in the upcoming Hacker Cup onsite final round, which is scheduled to begin at 13:30 London time today!

Sunday, February 21, 2016

A vertex-dominating week

The Feb 8 - Feb 14 week featured the 8VC Venture Cup 2016 Elimination Round by Codeforces (problems, results, top 5 on the left). With 7 problems there was plenty to choose from, and I think that problem E was the most difficult algorithmically, despite not being valued the highest. You are given a set with 200000 numbers, and need to find the subset of that set where the difference between the mean and the median is the highest. At the first glance, it seems that either some greedy could work, or the problem would require at least O(n²) to solve. However, it turns out neither is true!

Open Cup 2015-2016 Grand Prix of China (results, top 5 on the left). Problem B had a very simple statement and yet an interesting solution: given a bipartite graph, find how many subsets of its vertices are vertex-dominating. A subset of vertices is vertex-dominating if any vertex of the graph either lies on this subset, or has a neighbor in this subset, or both. It is of course the constraints that make this problem interesting. The Open Cup problem told that the graph has at most 30 vertices, but I encourage you to solve a bit tougher problem: let's say the number of vertices is at most 45.

In my previous post, I've mentioned two interesting problems. The first one was about processing a sequence of 10 million numbers using just 1 MB of memory. To be more precise, you're given a way to generate the sequence: start with a given number X₀, then repeatedly compute X_i = ((X_i-1 xor A) + B) mod 2⁵⁰, where A and B are also given. You're looking to find the radius of each element in the sequence, defined as the largest number k such that the previous k elements and the next k elements are strictly less than the given element, and return the sum of the radii for all elements.

The were two key observations required to solve this problem. First, one needed to pay close attention to the way the sequence is generated. In most cases such random generators are given in problem statements simply to avoid handling large input data. However, in this case the generator had one additional important property: it was reversible. Not only can we generate the number that follows the given number, we can also find which number precedes the given number!

The second observation is the fact that the answer can't be very big. More precisely, for a sequence of length N it's O(N*logN): it's not hard to see that the elements that cover the given element with their radii lie exponentially further away. Together with the first observation, we can now write an extremely simple solution: just iterate over the sequence, and find the radius for each element in the most straightforward way, using the ability to generate the next and previous elements! This solution runs in O(N*logN) and needs only O(1) memory.

The other interesting problem was about speeding up exponential dynamic programming. You're given a 10x10 chessboard with some cells removed. You need to find any connected set of cells on it that has exactly W white cells and exactly B black cells, or report that there isn't any.

One could readily imagine a dynamic programming solution that goes row by row, remembering how many white cells we've taken so far, how many black cells we've taken so far, which cells are taken amount the last jagged row, and which connected components they form. However, such DP has too many states (roughly 10000 states of the jagged row with connected components, times 100 steps, times 50 white cells, times 50 black cells equals 2.5 billion, and although not all those states are reachable, a big fraction of them are) and will never run in time.

In order to speed it up, instead of remembering the number of white and black cells taken, we will find the maximum number of black cells for each number of white cells. This reduces the number of states by the factor of 50, and the solution fits into the time limit now. However, this doesn't give us the answer we need: in the end we need a concrete number of black cells, and we'll only know the upper bound.

Here comes the main idea, which in my opinion is amazing. If we have a connected set with a given number of white cells, but with more black cells than we need, we can start removing black cells while keeping the set connected. We can't do this forever, though - at some point removing any black cell will cause the set to become disconnected. However, we can notice that in such state the number of white cells is always strictly greater than the number of black cells! In other words, this simple solution will work just fine if the number B of black cells we need is >= the number W of white cells we need. And what to do if it's less? Well, we can swap the colors then and apply the same solution!

Thanks for reading, and check back soon for the last week's summary!

Monday, February 8, 2016

A Saratov week

Codeforces has hosted yet another company-sponsored round last week, AIM Tech Round (problems, results, top 5 on the left). scott_wu has demonstrated an amazing bug-finding performance by challenging six other solutions, but in the end just his problem-solving speed alone would be enough for the victory. Congratulations!

TopCoder SRM 681 on Saturday (problems, results, top 5 on the left) has provided the competitors with an unorthodox medium problem. I had no clue how to solve it, so hats off to kcm1700, the only competitor to solve all three problems and claim a well-deserved victory!

Here's what that problem was about. You are given a sequence of 10 million numbers. To be more precise, you're given a way to generate it: start with a given number X₀, then repeatedly compute X_i= ((X_i-1 xor A) + B) mod 2⁵⁰, where A and B are also given. You're looking to find the radius of each element in the sequence, defined as the largest number k such that the previous k elements and the next k elements are strictly less than the given element, and return the sum of the radii for all elements. The catch is, you have to do it while using at most 1 MB of memory - so you can only store about 1% of the sequence in memory at the same time!

Open Cup 2015-16 Grand Prix of Saratov (problems, results, top 5 on the left) has capped the week with another very nice problemset. Problem C, which we couldn't solve during the contest, turned out to have a very beautiful solution - so kudos to Past Glory team on solving it on the way to the first place!

The said problem looked somewhat standard on the surface. You're given a 10x10 chessboard with some cells removed. You need to find any connected set of cells on it that has exactly W white cells and exactly B black cells, or report that there isn't any. One could readily imagine a dynamic programming solution that goes row by row, remembering how many white cells we've taken so far, how many black cells we've taken so far, which cells are taken amount the last jagged row, and which connected components they form. However, such DP has too many states (roughly 10000 states of the jagged row with connected components, times 100 steps, times 50 white cells, times 50 black cells equals 2.5 billion, and although not all those states are reachable, a big fraction of them are) and will never run in time. So how does one solve this problem?

As requested in comments to my previous post, let's talk a bit more about one of the boomerang problems. You were given a 100x100 grid with each cell containing a boomerang pointing to two adjacent sides of this cell. When two boomerangs point towards one another, we call it a conflict. We can rotate a boomerang by 90 degrees in either direction at most K times (K <= 1000, we can pick K different boomerangs for rotation, or maybe rotate some boomerang several times). What is the smallest number of conflicts we can have after at most K rotations by 90 degrees?

The key idea in solving this problem is that resolving horizontal conflicts and resolving vertical conflicts can be done independently. It's not clear from the first sight, since a boomerang is "connected", and thus when rotating it we're affecting both its horizontal and its vertical conflicts. However, we can notice that rotating a boomerang by 90 degrees causes one of the two things: its horizontal hand starts pointing in the other direction, or its vertical hand starts pointing in the other direction. Moreover, no matter which position the boomerang is in, we can always achieve both those outcomes by a 90 degree rotation! In other words, one rotation can change horizontal or vertical situation, but not both, and that's why horizontal and vertical conflicts are resolved independently.

Having noticed this, the problem breaks into even smaller pieces: it's somewhat obvious now that each row's and each column's conflicts can also be resolved independently. So instead of one monolithic 100x100 2D problem, now we have 200 separate 1x100 1D problems. Each 1D problem is solved with a somewhat classical O(N²) dynamic programming approach which finds what's the largest number of conflicts we can resolve using A moves in the first B cells, leaving the last cell pointing in a given direction. Each subproblem's output is the largest number of conflicts that can be resolved using 1, 2, ..., 100 moves. Now we need to combine those outputs into the overall solution, which is once again a standard dynamic programming question: what's the largest number of conflicts we can resolve using A moves in the first B subproblems. The overall running time of this solution is O(N³).

Thanks for reading, and check back next week!

Saturday, February 6, 2016

A week with boomerangs

The end of Jan 25 - Jan 31 week was tightly packed with contests.

First off, TopCoder SRM 680 took place on Thursday evening (problems, results, top 5 on the left). tourist was once again the only contestant to solve the hard problem, and thus enjoyed a 400+-point victory margin - great job! It was especially impressive given that at the end of the coding phase there were 23 submissions for this problem, and one could not suspect all but one would fail.

Here's the tricky problem: you have a string with 2500 characters, each character being one of the 6 letters from 'a' to 'f', and a number k up to 6 of the same parity as n. You need to select k of the string's characters, and split the remaining (n-k) characters into (n-k)/2 pairs, in such a way that each pair has distinct characters. Pairing character i and character j together costs c[i] + c[j] + 100 * abs(i - j), where c is a given array. What is the minimum total cost to build all those pairs? The statement is a bit convoluted, but once you wrap your head around it, the problem itself is quite interesting.

Wunder Fund Round 2016 happened on Codeforces about one day later (problems, results, top 5 on the left). Egor has managed to pull out an amazing last-minute victory, submitting problem F on the last minute - and getting it pass the system test.

That problem is similar to the ones from mathematical competitions. You are given two multisets of n integers each, each integer between 1 and n inclusive. Find a non-empty subset of the first multiset and a non-empty subset of the second multiset, such that they have the same sum.

Come the next evening - and there's one more competition, this time the final elimination round of the Facebook Hacker Cup, with 25 spots in the onsite competition in London up for grabs (problems with Facebook login, results with Facebook login, top 5 on the left). The difficulty of the problemset was chosen very appropriately, and the contest was mostly about solving difficult problems instead of speed coding, which is awesome. Bruce has solved 4 problems with the smallest penalty time, and will be one of the favorites to win the final round!

I think the second problem was the most beautiful one in this round (picture on the left from the official website). You were given a 100x100 grid with each cell containing a boomerang pointing to two adjacent sides of this cell. When two boomerangs point towards one another, we call it a conflict. We can rotate a boomerang by 90 degrees in either direction at most K times (K <= 1000, we can pick K different boomerangs for rotation, or maybe rotate some boomerang several times). What is the smallest number of conflicts we can have after at most K rotations by 90 degrees?

Finally, the Open Cup returned to action for the first time in 2016 on Sunday with the Grand Prix of Asia (problems, results, top 5 on the left). In line with their awesome performance at the ongoing Petrozavodsk training camp, the Warsaw Eagles team won this contest as well - congratulations!

This round had so many interesting and beautiful problems, that I will highlight two. First, problem D asked you to count the number of ways to interleave two given permutations of size N (two ways are considered different if the resulting sequences of 2N numbers are different). N is up to 2000, and you need to find the answer modulo 10⁹+7. For example, if the given permutations are "1, 2" and "2, 1", then there are four such ways: "1, 2, 2, 1", "1, 2, 1, 2", "2, 1, 1, 2" and "2, 1, 2, 1".

Problem H was concerned with random walks on a 2D grid. You start from cell (0, 0), and make N random independent steps, moving in one of the four cardinal directions at each step. What is the expected number of cells you will visit? N is up to 5000.

That was one hell of a problem-packed post :) I hope you enjoy solving them!

A week without FFT

The Jan 18 - Jan 24 week started with TopCoder SRM 679 on Wednesday (problems, results, top 5 on the left). The hard problem in this round is somewhat weird: the solution is quite straightforward, but somehow a few people, including myself, missed it and tried to squeeze in an incorrect FFT-based solution that obviously times out. It didn't stop tourist, who won the round with an impressive 350+-point margin.

Here's how the problem went: you were given 500 bags, each containing a lot of cards, with each card containing an integer between 1 and 500 (more precisely, you have a 500x500 matrix, describing how many cards contain each integer in each bag). You're also given a set of "good" integers, each between 1 and 1000. For each pair of bags, you need to answer the following question: how many ways are there to pick a card from the first bag and a card from the second bag in such a way that the sum of their numbers is a good number?

The correct solution runs in O(n^3), while the somewhat obvious FFT-based approach is only O(n^3*logn), and thus too slow. Can you see the O(n^3) one?

Facebook Hacker Cup selection continued with Round 2 on Saturday (problems with Facebook login, results with Facebook login, top 5 on the left). This was the first round where the time of submitting one's solutions was taken into account, and thus the first round where the contestants were competing against each other, not just against the problems. Eryx has claimed the first place with a solid margin - congratulations!

The problems were quite technical again, but the hardest problem at least allowed one to simplify the solution using a creative idea. You were given a tree with 1000 nodes, and needed to paint it with 30 colors, with the cost of painting each node with each colors given as a matrix. So far, the solution is very straightforward - just pick the cheapest color for each node. However, there was a small twist: you also paid a given penalty P for each node that has neighbors of the same color. You pay P only once per such node, no matter how many neighbors of the same color does it have. Can you solve this problem in O(N*K⁴) (N=1000, K=30)? How about O(N*K³)?

Let me also explain the solution to a problem from the previous post: you were given a tree with 100000 vertices, and at most 100000 queries. Each query highlighted some subset of vertices, and to answer a query you needed to compute the minimum number of non-highlighted vertices to remove that would leave all highlighted vertices disconnected, or to report there isn’t a way (in case two highlighted vertices are directly connected with an edge). The total number of highlighted vertices in all queries also doesn’t exceed 100000.

First, let's learn to solve the problem for just one query. It's not hard to see that this can be done greedily: we go from leaves to the root, returning a boolean value from the subtree: does this subtree contain a highlighted vertex? When a vertex has more than one subtree containing a highlighted vertex, or when its parent is highlighted and it has at least one highlighted child, we need to remove it.

Given that our solution returns just one boolean value for each subtree, we can handle 64 queries at once by returning a bitmask from it, using bitwise operations to implement the logic described above. This is (arguably?) not an asymptotic optimization, but 100000²/64 for this to pass under the time limit.

Thanks for reading, and check back soon for the summary of the last week of January!

An Ewok week

The Jan 11 – Jan 17 week saw the return of the main algorithmic competition anchors in 2016: TopCoder and Codeforces.

A Star Wars-themed TopCoder SRM 678 happened very early on Wednesday (problems, results, top 5 on the left). Congratulations to freak93 on the victory and on being the only contestant to solve the hard problem correctly! Apparently the Ewoks are really good at algorithms, as only one human could solve their problem :)

Codeforces Round 339 took place at the usual Codeforces time on Thursday evening (problems, results, top 5 on the left). All problems were somewhat technical, but at the same time allowed one to demonstrate algorithmic competition experience. Problem D, for example, tested one’s skills in batch processing of requests on a tree. You were given a tree with 100000 vertices, and at most 100000 queries. Each query highlighted some subset of vertices, and to answer a query you needed to compute the minimum number of non-highlighted vertices to remove that would leave all highlighted vertices disconnected, or to report there isn’t a way (in case two highlighted vertices are directly connected with an edge). The total number of highlighted vertices in all queries also doesn’t exceed 100000. Such problems usually allow O(n^2), O(N*sqrt(n) and O(n*polylog(n)) solutions, with the former supposed not to run in time. However, in this particular case a O(n^2) solution could pass if optimized using bitmasks – can you see how?

Facebook Hacker Cup Round 1 on the weekend gave one a chance to get closer to the first onsite competition of the year, to happen in London in March (problems with Facebook login, results with Facebook login, top 5 on the left). The round also featured somewhat standard technical problems. The hardest problem was perhaps the least technical: you were given 16 players, a 16x16 matrix of who wins whom, and needed to find the lowest and the highest possible elimination round for each player in an Olympic elimination tournament with 4 rounds, over all possible seedings.

I’ve described a New Year problem last week: you are given a positive integer A with at most 1000 digits. Find any two positive integers X and Y such that X+Y=A, and both X and Y are palindromes when written in decimal notation without leading zeroes.

In order to solve this problem, let’s first assume there’s no carry – each digit of A is formed directly as the sum of corresponding digits of X and Y, which is always less than 10. In case X and Y have different lengths, there’s at most one solution with given lengths of X and Y and it’s easy to find. Without loss of generality, assume X has more digits than Y. The first digit of X is then equal to the first digit of A (remember, there’s no carry yet!). Since X is a palindrome, we know its last digit as well. But then we can find the last digit of Y by subtraction. And then we can find the second digit of X, and so on. In case they have equal lengths, there might be a whole lot of solutions, but all of them are equivalent in some sense – in effect, we know the sum of corresponding digits of X and Y, and have several ways to decompose each sum into two summands independently. For example, 4=1+3=2+2, so 44=11+33=22+22=12+32=21+23.

Now how does one handle the carry? First of all, notice that for the last digits, we can determine that carry as we go from right to left. For the first digits, however, we go from left to right, and thus need to know the carry from the next digit to determine a given digit by subtraction. When we don’t know it, we will do backtracking – try both a carry of 0 and a carry of 1 recursively. At the first glance, this would seem exponential, but after taking some time to contemplate, one can see that most branches would be truncated very early because some digit would get negative or more than 9, and in fact this backtracking runs very fast.

Thanks for reading, and check back soon for the next week’s problems!