Wednesday, December 28, 2016

A black radius week

Last week was very calm, as most algorithmic competition websites have called it a year or are busy preparing special New Year-themed competitions.

Nevertheless, AtCoder held its Grand Contest 008 right on Christmas Day (problems, results, top 5 on the left, analysis). In breaking with AGC tradition, the problems were quite technical and tedious this time. Big congratulations to apiad who has managed to get all of them accepted and won!

I have spent all 110 minutes on the attractive-looking problem F, severely underestimating the amount of things that can go wrong in its solution (I got it accepted after about 5 more minutes of debugging after the end of the contest). The problem simply asks: given a tree with some subset of its vertices colored black, consider all subsets of its vertices defined as "all vertices at distance at most d from some black vertex a" (for all combinations of d and a), and return the number of different such subsets. When the same subset is defined through several combinations of d and a, it must still be counted only once.

(Moscow Christmas on the left - compare to last year :)) In my previous summary, I have mentioned an interactive Open Cup problem: you needed to give a checkmate to a lone king on a 8x8 chessboard using just one piece - the amazon, that can move both as a queen and as a knight. To makes things harder, you are playing blindfolded and do not know the king's position or moves. More precisely, you are given the initial position of the amazon, and need to output such sequence of at most 50 moves that the king that starts in any position that is not checked by the amazon and makes any legal moves will eventually be checkmated at some point in this sequence, and will not be stalemated before that.

There are multiple approaches described in the post-match discussion, highlighting the fact that the problem allowed one's creativity to shine. However, the solution described by Endagorion also allows to answer the extra question I posted last week: how short can such sequence of moves be if the amazon starts at a1?

It turns out this shortest sequence is "a1 a2 b3 b4 c5 c6 c5 d4 d3 d4 e5 e6 e5 f4 f3 e5 f6" (16 moves). And to prove this, Endagorion simply runs a breadth-first search over all reachable states of the game. The state of the game, in this case, is the position of the amazon plus the set of positions where the opposite king can be. At first glance it seems like we might get 64*264 states, but in practice the number of reachable states is much, much lower - most likely because the set of reachable squares for the king fills any unattacked area reasonably quickly, and thus we get much fewer possibilities (just 312400 states are reachable, and only 54520 of those are traversed before we find a way to checkmate definitely).

Thanks for reading, and check back in 2017! Happy New Year!

Sunday, December 18, 2016

An amazon week

On Wednesday, TopCoder SRM 703 - the first TopCoder round after the TCO - took place (problems, results, top 5 on the left). Endagorion has claimed the victory thanks for an amazingly fast solution for the 1000 - he submitted it in just over 7 minutes! It turns out that a similar problem has appeared before, and he could reuse most of the code. Nevertheless, congratulations on the flawless execution!

Codeforces Round 385 on Saturday has gathered 8 nutella-rated participants, including the top 3 (problems, results, top 5 on the left, analysis). Tourist has guaranteed himself the first place in just over an hour, but couldn't solve everything because of extremely tricky geometric problem D. Congratulations on the win!

Here's problem E which has propelled him to the top, in a slightly simplified form: you are given n words with total length up to m, and need to check whether there exists a cyclic string that has at least two different decompositions into those words, in O(nm) time. For example, for a set {"aa", "aba", "ba", "bab"} the cyclic string "ababa"="babaa" has two decompositions: "aba" + "ba", and "bab" + "aa".

Finally, Open Cup 2016-17 Grand Prix of Peterhof has (probably?) wrapped up 2016 for many ACM ICPC teams (results, top 5 on the left). Problem E was the most unusual and thus the most creative: you needed to give a checkmate to a lone king on a 8x8 chessboard using just one piece: the amazon, that can move both as a queen and as a knight. To makes things harder, you are playing blindfolded and do not know the king's position or moves. More precisely, you are given the initial position of the amazon, and need to output such sequence of at most 50 moves that the king that starts in any position that is not checked by the amazon and makes any legal moves will eventually be checkmated at some point in this sequence, and will not be stalemated before that. Can you figure out how to do it? How short can such sequence of moves be if the amazon starts at a1?

Thanks for reading, and check back next week!

Saturday, December 17, 2016

A dfs week

Codeforces Round 383 was the main event of the last week (problems, results, top 5 on the left, analysis).  TooDifficult has regained the second place in the overall rankings thanks to his victory; his 2017 ACM ICPC World Finals rival mnbvmar is now ranked sixth thanks to his second place in this round. Well done!

In my previous summary, I have mentioned an unsolved NEERC 2016 problem where your program explores a maze with at most 20 rooms which look exactly the same. Each room of the maze has the same amount m (also at most 20) of outgoing one-way passages which also look exactly like one another, arranged in a circle. The only way not to get completely lost in this maze is to use the fact that each room also has a movable rock. Initially there's a rock is in the center of each room. When you leave a room, you can put the rock next to any outgoing passage, and moreover, you can choose whether to put it to the left or to the right of this passage (that means that there are 2m ways to put it). If you ever arrive in this room again, you will see whether the rock is placed to the left or to the right of a passage - but since all passages look the same, you won't have any other information! So now you will be able express the number of the passage to take, and the number of the passage to move the rock to by the number in clockwise order starting from the one marked with the rock when you arrive. Your goal is to visit all rooms in at most 20000 steps.

The solution is explained with pictures in the published analysis PDF (problem I), but let me repeat the outline here. We will implement a depth-first traversal of our graph. However, depth-first traversal sometimes needs to go back, and we can't do that since the passages are one-way. However, since the graph is strongly connected (it looks like I forgot to mention this property in the problem statement summary), there's always some way to get back. The main idea is: when leaving the vertex for the last time, mark the passage that leads as high up as possible in the depth first search tree. This way when we find ourselves in an already processed vertex, we can just follow the marked passages to get back to the current depth first search path. And inside that path, we will mark the passages that lead down this path, so that when we return to this path, we can then get down to the vertex that's being currently processed by the dfs. We can use left/right positioning of the rock to distinguish the vertices on the path ("grey") and the already processed vertices ("black"). There are some more technical details to the solution, but all ideas are above.

I find this problem very appealing for multiple reasons, one of them being that the solution is "tight": the amount of information we can store in the visited rooms turns out to be barely enough to perform the traversal. How can one come up with such tight problems?

Monday, December 5, 2016

A maze week

Last week was quite a bit calmer than its predecessors. On Monday, Code Festival 2016 wrapped up the festivities with its Grand Final (problems, results, top 5 on the left, online mirror results, analysis). It was W4yneb0t's day, as he managed to deny tourist a somewhat expected victory by solving the same amount of problems, but a more expensive set of them. Big congratulatons!

I did not cover a lot of ACM ICPC regionals this season, but now is a good time to start :) ACM ICPC 2016-17 NEERC took place on Sunday in St Petersburg, Barnaul, Tbilisi and Almaty (problems, results, top 5 on the left). One of the main events of the year for most of ex-USSR algorithmic competition community, and the main event of their entire algorithmic competition career for many teams who practice for multiple years for this one chance to advance to the World Finals. One can experience a very wide spectrum of emotions just by watching the NEERC award ceremony where some teams are full of joy and are not at all shy to share that moment with everybody, while others ruminate over the far-reaching consequences of just one bad day. Nevertheless, the community stays very close and friendly, and kudos to everybody for keeping the spirit going! And last but not the least, congratulations to the team SPb SU Base on the victory!

Problem I was left unsolved in the onsite competition, and it's a pity since I find it really beautiful. It's an interactive problem where your program explores a maze with at most 20 rooms which look exactly the same. Each room of the maze has the same amount m (also at most 20) of outgoing one-way passages which also look exactly like one another, arranged in a circle. The only way not to get completely lost in this maze is to use the fact that each room also has a movable rock. Initially there's a rock is in the center of each room. When you leave a room, you can put the rock next to any outgoing passage, and moreover, you can choose whether to put it to the left or to the right of this passage (that means that there are 2m ways to put it). If you ever arrive in this room again, you will see whether the rock is placed to the left or to the right of a passage - but since all passages look the same, you won't have any other information! So now you will be able express the number of the passage to take, and the number of the passage to move the rock to by the number in clockwise order starting from the one marked with the rock when you arrive. Your goal is to visit all rooms in at most 20000 steps.

The problem statement is quite abstract, so I encourage you to read in full in the PDF (problem I), especially the sample input/output. After you understand what's going on, however, I find it really exciting to solve!

In my previous summary, I have mentioned a CERC problem that had to do with bipartite matchings. You were given a bipartite graph with at most 20 vertices in each part (40 together). A set s of its vertices was called nice if there existed a matching that covers it - in other words, such that for every vertex from s there was an adjacent edge that belongs to the matching. Note that it's not necessary for each end of the matching edges to be in s. Each vertex also had an integer weight. Your goal was to count the number of nice sets (out of at most 240 total sets) with total weight exceeding the given threshold t.

I won't describe its detailed solution, but I will mention the main idea that makes this problem tractable. At first sight, we have 240 sets which is way too much to handle one-by-one. However, it turns out that a set s consisting of some set x of vertices of the first part and some set y of vertices of the second part can be covered by a matching if and only if both x can be covered by a matching and y can be covered by a matching (but those don't have to be the same matching)! This idea allows to reduce the number of sets to consider to 220, which is tractable.

Thanks for reading, and check back for this week's summary! 

A Makoto week

TopCoder Open 2016 Final fell on the Nov 21 - Nov 27 week (problems, results on the left, analysis by Bruce). The final results have all finalists sorted by rating, with one notable exception: rng_58 has completely dominated the field, already winning on the first two problems but also adding the 1000-pointer to make it clear that others don't stand a chance. Extremely well done! With this result, Makoto is 3 out of 3 for TCO finals.

Here's what the 1000-pointer was about. You are given an undirected graph with at most 14 vertices. You make at most 50000 copies of this graph, without any edges between them, to obtain a bigger graph (with at most 14*50000 vertices). Then, you replace the resulting graph with its complement: for each pair of vertices connected by an edge, you remove that edge, and for each pair of vertices that don't have a connecting edge, you add one. How many Hamiltonian paths does the complementary graph have, modulo 998244353?

Codeforces resumed its contests after a month-long break with Round 381 on Wednesday (problems, results, top 5 on the left, analysis). TooDifficult continued his streak of victories, adding this round to the last two TopCoder SRMs. Amazing consistency!

On Saturday, a brand new international onsite competition called Code Festival and ran by AtCoder took off in Tokyo. It has featured a diverse set of competitions, with the Code Festival 2016 Final being the closest to traditional rounds (problems, results, top 5 on the left, online mirror resultsanalysis). Only current and recent university students were allowed to join, nevertheless the field was top-notch. Facing very tough competition, tourist rose to the challenge and won in style by being the only one to solve all tasks. Congratulations!

Open Cup 2016-17 Grand Prix of Europe on Sunday has completed the run of 5 consecutive Open Cup weekends (problems, results, top 5 on the left, CERC results on the same problems, analysis). This time ITMO 1 was head and shoulders above everybody, winning 11-9 (11-10 if you take CERC into account) - really unbelievable result! My team in particular got stuck after solving 8 problems in 3 hours, and couldn't solve any of the remaining 4 tasks in the last 2 hours. Also notable is Makoto solving 9 problems single-handedly. He was really on a roll that week :)

Problem B in this round relied on a sound yet not widely known theoretical fact. You were given a bipartite graph with at most 20 vertices in each part (40 together). A set s of its vertices was called nice if there existed a matching that covers it - in other words, such that for every vertex from s there was an adjacent edge that belongs to the matching. Note that it's not necessary for each end of the matching edges to be in s. Each vertex also had an integer weight. Your goal was to count the number of nice sets (out of at most 240 total sets) with total weight exceeding the given threshold t.

Finally, Codeforces Round 382 wrapped up the week on Sunday night (problems, results, top 5 on the left, analysis). The coders in places from 3rd to 5th could've grabbed the first place if they could simply finish solving all problems in time, and that seems to have been quite doable, with more than 30 minutes left for Haghani and overtroll, and more than an hour left for MainDullMoeHand for just one problem - but they couldn't, and jcvb submitted his last problem with just 5 minutes to go to claim the victory. Well done!

In my previous summary, I have mentioned the problem that threw TCO 2016 Semifinal 2 into turmoil. You are given 1000 positive integers up to 1018. You need to find any subset of those integers with the following two properties:

  1. Bitwise and of all integers in the subset is zero.
  2. For any two integers in the subset, their bitwise and is not zero.

The problem statement sounds so simple, and yet the solution is very hard to spot, so let me describe it. Since the bitwise and of all integers in the sought subset is zero, for every bit (position in the binary representation) at least one of the numbers in the subset doesn't have it set. Here comes the key idea: if there exists any such subset, then there exists such subset and a bit such that exactly one of the numbers in the subset doesn't have this bit set. Why? Because when for each bit at least two numbers don't have it, we can remove any number from the subset without violating properties 1 or 2.

Now the problem becomes polynomial instead of exponential. Let's iterate over which bit will be present in all numbers except one, and which number will not have it. Which other numbers could we take into the subset? The numbers that have the chosen bit set and also have non-zero bitwise and with the chosen number. Property 2 will then always hold since all of those numbers have the chosen bit and thus non-zero bitwise ands, And for property 1, more numbers is always better, so we can afford to just take all numbers described above! We just need to check if their overall bitwise and (together with the first chosen number) is zero, and if yes, we have found our answer.

One reason this solution seems quite hard to spot is that it operates in a counter-intuitive manner. On the first step, we're reducing our subset to achieve the desired property. But on the second step, we're actually expanding it back.

Thanks for reading, and check back soon for the last week's summary!

Sunday, December 4, 2016

An and-clique week

The Nov 14 - Nov 20 week was busier. TopCoder SRM 702 on Tuesday was the last chance to practice before the TopCoder Open weekend (problems, results, top 5 on the left, analysis). xudyh won the second SRM in a row, and was the only one to solve all three problems. Congratulations!

The onsite event of TopCoder Open 2016, one of the main tournaments of the year, started with Semifinal 1 on Saturday (problems, results on the left, analysis by Bruce). Top 3 advanced to the final round next Monday, and everything was essentially decided by the speed on the 500-pointer. The problem statement was extremely simple: one needs to construct any weighted directed graph with at most 20 vertices that has exactly the given amount k of different minimum cuts. k is up to 1000.

Open Cup 2016-17 Grand Prix of Dolgoprudny took place early on Sunday (problems, results, top 5 on the left). Team Past Glory retook the ground they lost to ITMO 1 in the overall standings a week ago - well done!

Finally, TopCoder Open 2016 Semifinal 2 determined 3 more finalists (problems, results on the left, analysis by Bruce). Unlike the first semifinal, which went more or less normal, this round was very unusual. For the first 10-15 minutes none of the participants, and almost none of the observers, could figure out how to solve the easiest 250-pointer problem. Because of that, the advancement in this round hinged on the ability to give up on a problem and clear one's head before the next one. It's worth mentioning that the 500-pointer was also in no way obvious. Congratulations to Enot on successfully navigating the tricky round and coming out on top!

Here's the problem that puzzled everybody. You are given 1000 positive integers up to 1018. You need to find any subset of those integers with the following two properties:

  1. Bitwise and of all integers in the subset is zero.
  2. For any two integers in the subset, their bitwise and is not zero.
Can you see the key solution idea?

In my previous summary, I have mentioned an easy problem: you are given an n times n grid of uppercase English characters, where n is at least 3. This grid was built in the following way: first, we fill it with n different characters in such a way that each row and each column has exactly one occurrence of each character. Then, we replace exactly one character in the entire grid with another character. Your goal is to figure out which character has been replaced, and what was there originally.

Here's a solution that's very easy to implement. First, we count the number of times each letter appears in the grid, and find letters that appear at least twice. Those are the original letters. Then, for each row and column of the grid we check if they contain at least one appearance of those letters. We will find exactly one row and exactly one column that will be missing one of those letters - and those are precisely the row, column and letter that we need to output.

Thanks for reading, and check back later for more!

Saturday, December 3, 2016

A cosplay week

The Nov 7 - Nov 13 week woke up late with AtCoder Grand Contest 007 on Saturday (problems, results, top 5 on the left, analysis). The problemset seems quite well-rounded, offering competitors a choice of different strategies. cospleermusora has figured out the winning strategy this time - start with the three hardest problems since they're worth a lot more points, and then solve whichever easy problems there's time for. Great idea and execution!

Open Cup 2016-17 Czech Grand Prix took the usual Sunday spot (results, top 5 on the left). The ITMO 1 team won again and started to create quite a gap at the top of the overall standings. Amazing performance!

Problem J in this round was very easy, and yet it required some thinking to avoid too many special cases in code. You are given an n times n grid of uppercase English characters, where n is at least 3. This grid was built in the following way: first, we fill it with n different characters in such a way that each row and each column has exactly one occurrence of each character. Then, we replace exactly one character in the entire grid with another character. Your goal is to figure out which character has been replaced, and what was there originally.

In my previous summary, I've mentioned an interactive Open Cup problem: the judging program has a hidden non-degenerate triangle. The coordinates of its vertices are integers not exceeding 1000 by absolute value. You're allowed to make at most 1000 queries, and your goal is to determine the coordinates of the vertices of the triangle. In each query, you choose a line (more precisely, a half-plane), and you're told the areas of the two parts this line splits the triangle into (one of them can be 0 if it doesn't intersect the triangle).

Here is the solution of our team, which we've created together with Michael Levin: first, we figure out the bounding box of the triangle, using binary search with horizontal and vertical half-planes. At least one of the corners of the bounding box is a vertex of the triangle, which we can find using a 45-degree half-plane which cuts off just one small corner from the bounding box.

When we know a vertex and a 90 degree angle containing the rest, we can use binary search by angle using lines passing through this vertex to find the lines containing the two sides. Finally, we can find the remaining two vertices by binary searching along one of the lines with half-planes parallel to the other line.

This solution has 4 steps, with the first and last step sharing the same binary search problem (given a direction, find the first half-plane with this direction that has non-zero intersection area), so one needed to implement 3 different functions for the interaction. On the good side, the solution doesn't have any special casing at all.

Do you know a simpler approach?

Sunday, November 20, 2016

Yet another triangle week

The first November week included the fifth stage of the 2016-17 Open Cup, the Grand Prix of Siberia (problemsresults, top 5 on the left). Team Havka-papstvo dominated the proceedings, and topped the cake with the only passing solution for the very tedious problem 10 - congratulations!

I'd like to highlight the interactive problem 3, which turned out to be the second most difficult. The judging program has a hidden non-degenerate triangle. The coordinates of its vertices are integers not exceeding 1000 by absolute value. You're allowed to make at most 1000 queries, and your goal is to determine the coordinates of the vertices of the triangle. In each query, you choose a line (more precisely, a half-plane), and you're told the areas of the two parts this line splits the triangle into (one of them can be 0 if it doesn't intersect the triangle). There are two steps to solving this one: figuring out how to do it, and then figuring out how to do it in such a way that the implementation isn't a nightmare :)

In my previous summary, I've described another triangle problem: you are given 200000 points on the plane which are guaranteed to be randomly placed within a bounding box, and need to find three points which form a triangle with the largest area.

The solution is surprisingly straightforward: we find the convex hull of the given points, and then try all triangles with vertices from the convex hull to determine the largest one. It is correct because it's not hard to see that when we fix 2 out of 3 vertices of the triangle, the third one needs to be as far from that line as possible, which means it's a farthest point in some direction, which means it's a vertex of the convex hull. And it is fast enough because it turns out that a set of random points has only O(logn) points on average on its convex hull (see this paper).

Thanks for reading, and check back soon for the next week's summary!

A triangle week

The last week of October was relatively busy. First off, TopCoder SRM 701 took place on Wednesday (problems, results, top 5 on the left, analysis). The top 3 could be a foreshadowing for the upcoming TCO 2016 Semifinal 2, but unfortunately xudyh could not make it to Washington (does anybody know why?). Nevertheless, congratulations on the SRM victory!

Then, AtCoder held its Grand Contest 006 on Saturday (problems, results, top 5 on the left, analysis). apiad has dominated the field by being the only one to solve all problems, and he has also demonstrated an unorthodox strategy by starting from the hardest problems - so he would've been first even if he didn't finish the 3 easier ones!

And finally, Open Cup 2016-17 Eastern Grand Prix took place on Sunday (problemsresults, top 5 on the left). SPb ITMO 1 team has jumped into a big lead in the overall standings after this round - congratulations!

Problem B was on the boundary of "oh, beautiful observation" and "oh, nasty trick" :) Which feeling does it leave you with? You are given 200000 points on the plane which are guaranteed to be randomly placed within a bounding box, and need to find three points which form a triangle with the largest area.

Thanks for reading, and check back soon for the November news!

A Canada week

The Oct 17 - Oct 23 week featured just the Canada Cup by Codeforces (problems, results, top 5 on the left, analysis). If not for challenges, just 5 points would've separated eatmore and tzupengwang - congratulations to both! I could not make this round, so I don't have anything to share about its problems.

Check back soon for the hopefully longer summary of the following week :)

Saturday, November 19, 2016

An unshuffle week

During the Oct 10 - Oct 16 week, TopCoder has hit its next milestone: SRM 700 (problems, results, top 5 on the left, analysis). Despite the starting time of 4am, the first three spots were occupied by contestants from St Petersburg and Moscow. The first place went to _aid, who has recently joined the ACM ICPC World Champion team SPbSU Base - so all other teams will not have an easy time at 2017 World Finals :) Congratulations!

In my previous summary, I have mentioned a problem in the trademark style of Ivan Kazmenko: You're given the following random number generator, an implementation of the Fischer-Yates shuffle, and a program using them:

Function random (range):
  state := (state * 1664525 + 1013904223) mod 232;
  return (state * range) div 232;

Function shuffle (array):
  n := length of array;
  for i := 0, 1, 2, ..., n - 1:
    j := random (i + 1);
    swap(array[i], array[j]);
  return array;

state := seed;
n := 10000;
array := [1, 2, ..., n];
array := shuffle (array);
array := shuffle (array);

In other words, we shuffle the numbers from 1 to n twice. The particular generator used here is well-known, so no particular weaknesses could be expected except of course the fact that it's a linear congruential generator. The task is: given the final state of the array, find what the seed was.

There are two main ideas necessary to solve this problem. The first idea is: if we know the numbers that the generator returned on two concrete calls (i.e., on p-th step and on q-th step), there are only so many (on the order of 100) possibilities for the seed that we can try them all, and moreover, we can enumerate all of them efficiently.

In order to enumerate them, we can notice that the value of the state variable on p-th step is a linear function of the initial seed. Because of this, having two return values fixed leads to two inequalities that look like l <= seed * a <= r (mod 232). We can solve this system of inequalities using an algorithm similar to the Euclidean one.

The second idea is that it's not actually that hard to find two concrete return values of the generator. During each shuffle, each number participates in at least one swap: when the variable i passes through the cell containing it. It's also not hard to see that on average quite a few numbers will participate in exactly one swap. And because this set of numbers is fairly random for each of the two shuffles, there will be a significant amount of numbers that have participated in exactly two swaps overall. And since we know the starting and ending position for each number, we just need to guess which was its "middle" position after the first swap.

So here's the overall solution structure: we iterate over all numbers from n to 1 that have moved to the left after two shuffles. For each of those numbers, we iterate over the "middle" position between the final positions and the starting position. We find all seeds that would make this number do the corresponding jumps, and then check if each of those seeds leads to the overall result we see. We expect to find the seed after trying just a few initial numbers.

Thanks for reading, and wish me luck in today's TopCoder Open semifinal (starting time, broadcast link)!

Monday, November 7, 2016

A Fourier combination week

The Oct 3 - Oct 9 week was calmer. Intel Code Challenge Final Round has gathered the top competitors for 3 hours instead of the usual 2 we see on Codeforces (problems, results, top 5 on the left, analysis). Maybe TooDifficult did not notice this, as he finished all problems within the first 2 hours anyway, and thus won easily :) Congratulations!

On Sunday, Open Cup 2016-17 Grand Prix of SPb has completed the 3-week run (results, top 5 on the left). Reminding us how things usually go in Open Cup contests, Past Glory team have won convincingly, and were also the only team to solve problem I - great job!

Here's what the problem was about. You're given the following random number generator, an implementation of the Fischer-Yates shuffle, and a program using them:

Function random (range):
  state := (state * 1664525 + 1013904223) mod 232;
  return (state * range) div 232;

Function shuffle (array):
  n := length of array;
  for i := 0, 1, 2, ..., n - 1:
    j := random (i + 1);
    swap(array[i], array[j]);
  return array;

state := seed;
n := 10000;
array := [1, 2, ..., n];
array := shuffle (array);
array := shuffle (array);

In other words, we shuffle the numbers from 1 to n twice. The particular generator used here is well-known, so no particular weaknesses could be expected except of course the fact that it's a linear congruential generator. The task is: given the final state of the array, find what the seed was.

In my previous summary, I've mentioned a tricky AtCoder combinatorics problem: You are given a tree with n<=200000 vertices. Now we consider all C(n,k) ways to pick k vertices of the tree, and for each of them we consider the "convex hull" of the k vertices: the smallest part of the tree that connects all of them together. Your goal is to find the sum of the sizes of those convex hulls over all C(n,k) ways. What's more, you need to find n sums: for each value of k between 1 and n. Each sum must be computed modulo 924844033.

Even though the problem statement doesn't mention probabilities and expected values, the solution starts with an observation that is very similar to the linearity of expectation: in order to find the sum of the sizes of the convex hulls, we will find for each vertex the number of convex hulls containing it, and then add up those numbers.

In order to find the latter, let's find the number of convex hulls not containing it, and subtract it from C(n,k). In order for the convex hull to not contain a given vertex, all k chosen vertices must lie completely within one of the subtrees formed by removing this vertex from the tree. So if the sizes of all 2(n-1) subtrees of the original tree are a1, a2, ..., a2(n-1), then we need to find ΣiC(ai, k). Finding all ai is a standard task solved by a traversal of the tree, so we can solve our problem for one value of k in O(n). However, since we have n possible values of k to process, the total running time will be O(n2) which is too slow.

The idea to speed up such computation from O(n2) to O(nlogn) using Fast Fourier Transform is not new, but I don't think I've described it in this blog before. Let's reformulate the problem slightly: let bi be the number of subtrees of size i. We need to find ΣiC(ik)*bi for each k. Now let's transform our sum:

ΣiC(ik)*bi = Σibi*i!/k!/(i-k)! = 1/k!*Σi(bi*i!)*(1/(i-k)!)

Now we can notice that the remaining sum is nothing else but multiplication of polynomials, which can be done fast using FFT. To make it completely clear, let's denote ui=bi*i!, and vj=1/(n-j)!, and multiply two polynomials with those coefficients. The (n+k)-th coefficient of the product will be Σiuivn+k-i = Σi(bi*i!)*(1/(n-(n+k-i)!) = Σi(bi*i!)*(1/(i-k)!), just as we need.

To conclude, it also seems that this technique can be applied to a wide variety of sums, not just those involving C(n,k). The only property that we need from the function being summed is that it must decompose as a product of functions of n, k, and n-k. However, C(n,k) seems to be the only practically relevant function with this property. Do you have an example of a different such function in mind, or maybe you can even point me to other problems solved using this trick?

Thanks for reading, and check back for more!

Sunday, November 6, 2016

A week after a month

The last week of September was rather busy. The competitions started right on Monday with TopCoder SRM 699 (problems, results, top 5 on the left, analysis, my screencast). At the end of the coding phase, it seemed unusually easy for the recent TopCoder standard, with 8 submissions for the Hard problem. However, the system testing phase showed that even though the general idea of a solution is not hard to see, figuring out all details correctly within the 75-minute round is nearly impossible. matthew99a has thus earned his third SRM victory thanks to super fast solution for the 500, and a solid 100 challenge points to boot. Congratulations!

AtCoder Grand Contest 005 has followed on Saturday (problems, results, top 5 on the left, analysis - scroll down for English). tourist seemed to have started 10 minutes late, so there were some doubts as to who would win initially, but he quashed all of those by finishing all problems within an hour of starting. Amazing job!

The last problem was a great demonstration of the power of combinatorics. You are given a tree with n<=200000 vertices. Now we consider all C(n,k) ways to pick k vertices of the tree, and for each of them we consider the "convex hull" of the k vertices: the smallest part of the tree that connects all of them together. Your goal is to find the sum of the sizes of those convex hulls over all C(n,k) ways. What's more, you need to find n sums: for each value of k between 1 and n. Each sum must be computed modulo 924844033.

Intel Code Challenge Elimination Round started just 15 minutes after the end of AGC (problems, results, top 5 on the left, analysis), so it's super impressive that anta shows in the top 5 of the both events. However, he could not come close to FatalEagle, who has earned his first Codeforces victory thanks to solving all problems and finding 9 successful challenges while doing so. Congratulations!

The second stage of Open Cup 2016-17, Grand Prix of Eurasia, has wrapped up the week (results, top 5 on the left). Problem 1 looked very daunting initially, but as one unraveled it the actual solution turned out pretty and short. You are given an array with n<=100000 numbers, and want to find the segment of that array with the largest sum. That would be too easy, of course, so there's an added twist: you're allowed to do at most k<=10 swaps of two numbers in the array before picking a segment.

In my previous summary, I've mentioned another tricky Open Cup problem: you are given a connected undirected graph with at most 2000 vertices and 2000 edges, where each edge needs to be colored either red or blue. You need to do the coloring using a single alternating walk: you start in some vertex, then keep moving along the edges, coloring the first edge you pass red, the second blue, the third red again, and so on. You are allowed to pass the same vertex and even the same edge twice, and when you pass the same edge multiple times, the last color used stays. Is it possible to obtain the required coloring?

It seems hard to approach this problem because it seems that we can do anything: if we color an edge with the wrong color, we can come back later and use the correct color again, so it's not clear how to see if we're making good progress. The main observation is to look at this process from the end. The last edge we traverse must be colored with its intended color. The edge we traverse before it must also be colored with its intended color, unless it's the same edge. More generally, at each point we have some set of edges we've already traversed, which can now be traversed with any color, and the remaining edges which must be first traversed with their intended color, and our goal is to traverse each edge at least once.

Also, since can always go back and return to any vertex we've already visited with the same parity, the whole concept of a walk is a bit misleading now. We just have a set of (vertex, parity) pairs we can reach, and a set of edges that we've traversed at least once, and are gradually expanding both sets. If two vertices a and b are connected using an untraversed edge with color c, and we have reached (a, c), then we can traverse the edge, and reach (b, 1-c) pair. If the edge is already traversed, then we can also use it to reach (b, c) from (a, 1-c).

To put the solution together, we will maintain a queue of possible moves. When processing a move in the queue, we will add new moves that were made possible by this move to the queue. It's important here to remember that the new moves are of two types: those starting in the new vertex we've reached, and those that use the edge we just traversed with the other color, both in the reverse and in the same direction - many teams have forgotten to account for the latter. And of course we must start from some vertex and some parity - but since the graph is just 2000 in size, we can just iterate over all starting - more precisely, finishing :) - (vertex, parity) pairs.

Thanks for reading, and check back soon for the next week's summary! I promise, "soon" does not mean a month this time :)

Tuesday, October 11, 2016

An open week

The September 19 - September 25 week contained the first stage of Open Cup 2016-17: Grand Prix of Japan (results, top 5 on the left). Some teams have competed on this problemset earlier during the Petrozavodsk camp, and two of those teams remained in first two places after everybody else joined - congratulations to Moscow SU Trinity and KTH Omogen Heap!

Problem I in this round possessed the right mix of simple statement and interesting solution, while not being very hard: you are given a connected undirected graph with at most 2000 vertices and 2000 edges, where each edge needs to be colored either red or blue. You need to do the coloring using a single alternating walk: you start in some vertex, then keep moving along the edges, coloring the first edge you pass red, the second blue, the third red again, and so on. You are allowed to pass the same vertex and even the same edge twice, and when you pass the same edge multiple times, the last color used stays. Is it possible to obtain the required coloring?

In my previous summary, I've mentioned a super hard SRM problem: you're given a 50x50 grid, with each cell either free or blocked. You're also given at most 100 (potentially overlapping) 3x3 subgrids of this grid, and are looking to draw an L-shape or a C-shape in each of those subgrids in such a way that those shapes do not overlap, and use only free cells. An L-shape or a C-shape consists of 5 consecutive border cells of a 3x3 subgrid. There are 4 possible different L-shapes in each 3x3 subgrid, and 4 possible different C-shapes. You need to check if such drawing is possible, and if yes, find for each 3x3 subgrid whether it can have a C-shape as part of a valid drawing, and whether it can have an L-shape as part of a valid drawing.

The last part, which asks to see which shape each particular subgrid can have, does not really change the problem a lot - if we can solve the "is it possible" version fast enough, then we can just try placing each shape in each subgrid, and solve the same problem for the rest. So in the rest of this explanation I will concentrate on the "is it possible" part.

The key that opens this problem is: let's replace the 5-cell shape into four dominoes: a C-shape or an L-shape consisting of cells A,B,C,D,E in this order, can be viewed as a union of four dominoes: AB, BC, CD and DE. The eight possible dominoes along the border of a 3x3 subgrid can be split into four pairs of opposite dominoes (see an example on the left). Each 5-cell shape then corresponds to picking one domino in each pair of opposite dominoes.

One might say that this transformation is not accurate: not every way of picking one domino in each pair of opposites yields a valid 5-cell shape. However, it turns out that every way of picking one domino in each pair of opposites yields a set of cells that contains a valid 5-cell shape inside (see an example on the right).

Because of this, the following reformulation always has the same answer as the original problem: can we pick four border dominoes in each 3x3 subgrid, one from each pair of opposites, in such a way that dominoes from different subgrids do not overlap?

Finally, now it's not that hard that we have an instance of the 2-SAT problem: we have a few boolean variables (which domino to pick for each pair of opposites), and a few restrictions on pairs of values (stemming from the intersections). We can solve this problem in linear time.

Thanks for reading, and check back soon for the next week's summary!

Monday, September 19, 2016

A Russian Code week

Codeforces held two rounds this week, starting with Round 371 on Tuesday (problems, results, top 5 on the left, analysis). Despite quite a few attempts at solving the hardest problem E, only LHiC's solution was correct, and since he didn't solve D, the speed of solving the first four was the deciding factor for the top places in the end. Congratulations to geniucos on the victory!

Codeforces Round 372 on Saturday started the competitive weekend (problems, results, top 5 on the left, analysis). This time TooDifficult left nothing to chance, and claimed the first place by more than a thousand points thanks to being the only contestant to solve four problems. Congratulations!

After a tiny 15-minute pause, TopCoder SRM 698 picked up the flag (problems, results, top 5 on the left, my screencast). With almost an hour left to solve the 1000-pointer, I still could not come up with a single idea that would at least transform the problem into something potentially tractable, and not even with randomized backtracking or "subset" dynamic programming that could be close to passing. I was not the only one feeling this way, and so the contest was decided during the challenge phase. I've managed to find +100 that would've earned me the first place, but also picked up -100 on unsuccessful challenges along the way - I'm lucky that the screencast doesn't capture the sound of cursing - and ended up with the coding phase score. So did rng_58, but his coding phase score was significantly higher and thus he kept the victory. Congratulations!

Here's the stumbling 1000-pointer: you're given a 50x50 grid, with each cell either free or blocked. You're also given at most 100 (potentially overlapping) 3x3 subgrids of this grid, and are looking to draw an L-shape or a C-shape in each of those subgrids in such a way that those shapes do not overlap, and use only free cells. An L-shape or a C-shape consists of 5 consecutive border cells of a 3x3 subgrid (see the example on the left). There are 4 possible different L-shapes in each 3x3 subgrid, and 4 possible different C-shapes. You need to check if such drawing is possible, and if yes, find for each 3x3 subgrid whether it can have a C-shape as part of a valid drawing, and whether it can have an L-shape as part of a valid drawing.

Russian Code Cup 2016 Finals on Sunday has wrapped up the sixth installment of this tournament (problems, results, top 5 on the left, online mirror results, analysis). Extremely technical problems led to a lot of wrong submissions and very slow progress, but tourist still didn't leave any doubt with regard to the winner, solving the required three problems in just over an hour and almost getting the fourth. Congratulations on the second Russian Code Cup victory!

In the last week's summary, I have mentioned an algebraic problem from the TCO Wildcard Round: given three integers a, k and p, where p is a prime (more precisely, 109+7), find any integer b such that bk=a (mod p), or report that there isn't any. You also need to do it relatively fast, as one needs to solve this equation 300 times for different values of a (but k and p stay the same).

Ilya Mironov has pointed out in comments that there's a specialized algorithm for solving just this problem. However, I believe that it's more educational to learn to solve this problem using the understanding of the underlying math structure and standard algorithms that come with it. The object we're studying is remainders modulo p with the operation of multiplication. All remainders except 0 form the multiplicative group modulo p, and there's a well-known theorem that says that this group is always cyclic - in other words, isomorphic to the additive group of remainders modulo p-1. This isomorphism maps taking an element to the power of k to multiplication by k modulo p-1, and we know how to inverse that - it's just division by k modulo p-1.

That gives us a full solution to our problem: first, find out to which remainder modulo p-1 does the isomorphism map a. Then divide that remainder by k. Finally, apply the inverse of the isomorphism to get the answer. We also have to handle the case of a=0 separately since 0 doesn't belong to the multiplicative group, but there the solution is easy: b=0.

Of course, there are still implementation details to figure out. How does the isomorphism work, and how to evaluate it quickly? It's not hard to see that the isomorphism is completely defined by the multiplicative remainder g that maps to 1 in the additive group, as all other elements then have to be some powers of g, since gd must map to d for each d from 0 to p-2. Such remainder g is called a generator or a primitive root.

Moreover, all powers gd for d that are relatively prime with p-1 are also generators. Because we need to find any generator, and there are plenty, the most practical strategy for finding one is just trying a few remainders and checking if they are generators. In order to check if a remainder g is a generator, we must see if all its powers from 0 to p-2 are different, but since its order must divide the order of the multiplicative group, it suffices to check that gd is not equal to 1 for all values of d that are proper divisors of p-1.

Finally, having found a generator g, evaluating the isomorphism is called the discrete logarithm which can be solved using a meet-in-the-middle approach, also known as baby-step-giant-step here. Let's pick a number q around sqrt(p-1), and pre-compute baby steps d0, d1, ..., dq-1, and giant steps dq, d2q, ... (up to the first giant step with power that exceeds p-1). Now it's not hard to see that if we try multiplying our number a by all giant steps we will at some point get a result equal to some baby step: a*duq=dv, and then a=dv-uq, so a maps to v-uq mod p-1 under our isomorphism. The giant and baby steps can be precomputed since p is fixed.

The final missing trick is dividing a remainder w by k modulo p-1. This also comes naturally with the understanding of remainders modulo p-1. First, let's represent k as m*n where m is gcd(k, p-1), and n is relatively prime with p-1. If m does not divide w, there's no solution. If m does divide w, then we now need to divide w/m by n modulo (p-1)/m. Since n is relatively prime with (p-1)/m, it has an inverse o such that n*o=1 modulo (p-1)/m, and thus our division result can be found as o*(w/m) modulo (p-1)/m. The numbers o and m can be precomputed since k and p are fixed.

Phew! This looks like one hell of a solution, but actually it's much easier to code it than to explain. And since all steps come naturally from understanding of how remainders work, you don't need to memorize any of them - you can construct them on the fly using just a few clues that you remember.

Thanks for reading, and see you next week!