Algorithms Weekly by Petr Mitrichev

Thursday, September 24, 2015

A week with another 6

TopCoder SRM 668 was the first contest on the week in the early hours of Wednesday (problems, results, top 5 on the left). ACRush was the highest-rated competitor, most probably practicing before the upcoming final TCO elimination round. He has placed 6th, which is exactly the last qualifying spot in the TCO - but would he be able to repeat this performance in the TCO round? Read until the end of this post to find out :)

Enot was the winner of this SRM and the only contestant to solve all 3 problems - very impressive! This is his first SRM victory, maybe the streams have helped him improve?

Codeforces Round 320 (Bayan Thanks-Round) took place a few hours later (problems, results, top 5 on the left). Quite a few contestants have managed to solve 5 problems out of 6, so in order to win one also needed to find a few bugs in solutions of competitors. Um_nik and Egor have excelled on this front with +500 points each, but Um_nik had slightly more points from his solutions and has claimed the victory - congratulations!

Russian Code Cup 2015 Finals was one of the "major" tournaments of the year, with $4500 up for grabs for the first place (problems in Russian, results, top 5 on the left, broadcast with commentary in Russian). The problemsetters seem to have been worried about contestants finishing early, and made the last two problems just a tiny bit too hard - quite a few contestants were close on E, and I've found out after the analysis has been published that my solution on F was not that far from the correct one, but nobody could get either problem accepted.

Problem D was a nice logical puzzle: you are given 200000 points on the coordinate plane, and want to find two points such that their bounding box has a positive area, but does not contain any other points inside or on the border, or report that there's no such pair. After solving the "yes/no" part, don't forget to also think about finding the specific pair :)

And finally, TopCoder Open 2015 Round 3B determined the last 6 participants of one more major tournament (problems, results, top 6 on the left, my screencast, Egor's screencast). Scott_wu has won with comfortable margin, but the most exciting aspect for me as a viewer was Egor's climb into the qualifying zone during the challenge phase - here's the direct link into the critical moment of his screencast.

The medium problem was the nicest in my opinion. You were given 200000 cookies, with two players taking those cookies in turn one by one. Each cookie has some value for the first player, and some possibly different value for the second player. It is known that the second player always takes the cookie with highest value for him, and when there are several, with the highest value for the first player among those. The first player, on the other hand, is more flexible. He can take any cookie, and his goal is to maximize the total value of the cookies he gets in the end to him minus the total value of the cookies the second player gets in the end to the second player. What is the optimal strategy for the first player?

Now we know the 12 finalists of TopCoder Open 2015. There's a very heavy skew towards the ex-USSR this time:
Russian Federation - 7: Petr, UdH-WiNGeR, qwerty787788, kuniavski, Egor, Kankuro, Endagorion
Belarus - 2: tourist, subscriber
United States - 2: scott_wu, ecnerwal
South Africa - 1: bmerry

Thanks for reading, and check back next week!

Tuesday, September 15, 2015

A week with Gambler's Ruin

Codeforces Round 319 took place on Thursday (problems, results, top 5 on the left, solution analysis). I did not take part, but I heard that the problems were somewhat different in style from the last round - and yet the people in the first two places are the same. Huge congratulations to Marcin_smu and mnbvmar!

TopCoder SRM 667 happened one day later (problems, results, top 5 on the left, my screencast). The round was very tricky, with 250 inviting a lot of incorrect greedy solutions, the 500 requiring a lot of careful mathematical reasoning, and 1000 being a "constructive" problem which was hard to even approach.

W4yneb0t would be a clear winner thanks to his correct solution for the 1000 - if not for the challenge phase, where K.A.D.R found 10 incorrect solutions for the 250 and managed to climb to the first place despite solving only the easiest problem - great job!

Coming up with a good challenge case wasn't easy at all. Here's what the problem was about: you have 20 skills to learn and 50 exercises, each exercise will give you a certain subset of skills (the subset for each exercise is fixed, but can be different for different exercises). If there are k skills you don't have yet among the skills that the exercise gives you, then you spend k² effort on that exercise. What is the minimum total effort you must spend to learn all skills? Naturally, the greedy solutions just picked an exercise that adds the smallest number of skills at each step. Can you see how to make them fail? I'm also wondering how did people come up with such testcases during the round - was stress-testing the way to go?

The careful mathematical reasoning in the 500 was basically about finding a closed form solution to gambler's ruin. Of course, since this is a well-known task, one could replace the mathematical reasoning with Googling, which I duly did. I didn't know the term "gambler's ruin", though, so you can watch how I gradually converged to the right search terms starting from this moment of the screencast :)

Open Cup 2015-16 Grand Prix of Ukraine has ignited the new season of the Open Cup on Sunday (results, top 5 on the left). Quite a few veteran teams participated, but the first place still went to a team that will participate in ACM ICPC this year - great job, NNSU!

Problem K, while being quite standard, allowed one to practice various dynamic programming techniques from this blog post. It went like this: you are given N numbers, and need to split them into M consecutive groups. The first number in each group is multiplied by 1, the second number in each group is multiplied by 2, and so on, and you need to minimize the sum of those products. And of course, you need a solution that's faster than O(N³).

During the contest, my teammate Ilya got the divide and conquer approach accepted, after the contest I've solved it using something similar to Knuth's optimization in upsolving, and the post-match discussion mentions that convex hull optimization also worked well :)

Finally, let me come back to the 4x4 problem from last week, counting the number of AxB rectangles of 0s and 1s such that each CxD subrectangle contains the same number of 1s. C and D are up to 4, so the subrectangle is very small, but A and B are up to a billion.

The solution is a classical case of gradually simplifying the problem until it becomes tractable. The first step is: let's try to gather some more knowledge from the fact that each CxD subrectangle has the same sum. More specifically, let's shift a CxD subrectangle to the right by 1. Now we can see that two 1xD columns that are C apart horizontally must also have the same sum.

Now let's shift those two 1xD columns down by 1. We can see that the differences between the corners are the same: a[x+C][y+D]-a[x+C][y]=a[x][y+D]-a[x][y]. Note that if this difference is not zero, then we can uniquely determine the values in the corners: for example, if the difference is 1, then a[x+C][y+D]=a[x][y+D]=1, and a[x+C][y]=a[x][y]=0. This, in turn, simply means that: if we have a[x][y] not equal to a[x][y+D], then all values in corresponding rows with step C are equal: a[x][y]=a[x+C][y]=a[x+2*C][y]=...=a[x-C][y]=a[x-2*C][y]=..., and the same for row y+D; a similar statement is true for columns.

At this point we can see that we've learned a lot about relations between cells that differ by a multiple of C horizontally and by a multiple of D vertically. In order to dissect the problem further, let's look at each such subgrid independently: take some offsets 0 <= c < C and 0 <= d < D, and consider all cells with coordinates (c+p*C,d+q*D). The above observation has a very concise meaning for this subgrid: whenever two horizontally adjacent cells in the subgrid have different values, the columns containing them are constant; and whenever two vertically adjacent cells in the subgrid have different values, the rows containing them are constant.

Now it's quite easy to make the next step: we can notice that the two situations (two constant rows and two constant colums) are mutually exclusive: if there's at least one constant row, then there can be no two constant columns with different values, and vice versa. This, in turn, simply means that there are just two possibilities for the entire subgrid: either all rows are constant (we will call such subgrids "horizontal"), or all columns are constant (we will call such subgrids "vertical"). Of course, a subgrid can be both horizonal and vertical at the same time, in case all values are equal.

Having learned the exact structure of each subgrid, it's time to go back to the full grid. It consists of C*D subgrids, each in the form described above. Let's look at the fact that two 1xD subrectangles that are C apart horizontally: they must have the same sum. They include cells from D different subgrids. The corresponding cells in horizontal subgrids are equal, so we can simply disregard them. The corresponding cells in vertical subgrids might not be equal, so we have some constraints on the vertical subgrids now. In a similar manner, looking at Cx1 subrectangles that are D apart vertically helps us get some constraints on the horizontal subgrids. Moreover, it's not hard to see that those constraints are not just necessary, but sufficient: if every two 1xD subrectangles that are C apart horizontally have the same sum, and every two Cx1 subrectangles that are D apart vertically have the same sum, then it's not hard to see that all CxD subrectangles have the same sum, too.

So we have reduced the original problem to making sense of the constraints induced on the horizontal/vertical subgrids. We can note that horizontal and vertical subgrids have independent constraints, so we can analyze them separately. Moreover, the vertical subgrids, for example, have constraints in groups of at most D, and each such group of at most D vertical subgrids is constrained independently, so it suffices just to analyze one of them.

And finally, we can see that the remaining problem is not hard anymore. We have a group of n <= D vertical subgrids, and we need the sum in each column to be the same. If this sum is m, then there are C(n,m) ways to pick one column, and thus C(n,m)^t ways to pick all subgrids, where t is the number of columns (which is equal either to A/D or A/D+1, depending on the offset of the subgrid). Summing this over all values of m, we get the total number of ways to fill these n vertical subgrids.

All that's left is to multiply the answers for the independent problems we've solved to get the answer for the problem. Well, almost all :) In the above solution we relied on knowing which subgrids are vertical and which are horizontal. In reality, we don't know that, but we have just C*D <= 16 subgrids, so we can iterate over all 2^C*D ways, compute the answer for each case, and add those up.

This solution, will, however, count some grids twice: when a certain subgrid is constant, it is both vertical and horizontal, so we will count a grid containing such subgrid several times. In order to avoid that, we should remove constant subgrids from one of the sides, let's say from horizontal subgrids, and that will allow us to count each grid exactly once. In order to count the number of ways to fill a group of horizontal-but-not-constant subgrids, we have to use the inclusion-exclusion principle: start with the above computation that allows constant subgrids, subtract the number of ways to fill the grids such that one subgrid is constant, add the number of ways to fill the grids such that two subgrids are constant, and so on.

Finally, all loose ends are tied and we have a solution. Despite the length of the explanation, the code itself ends up being quite short. And the key idea, again, is to continue to simplify the problem in many small steps, and not give up early.

Thanks for reading (has anybody read this far?), and check back next week!

Monday, September 7, 2015

A week with 4x4

TopCoder Open 2015 Round 3A took place on Saturday (problems, results, top 6 on the left). First 6 places in the onsite competition in Indianapolis were handed out to those who could solve either the medium or the hard problem, with another 6 selected in Round 3B that will take place in two weeks. According to the official website, we will have a semifinal round and a final round onsite, but the number of advancers from the semifinal isn't known yet. A similar structure was used in 2009, with 18 onsite participants competing in a single semifinal and 8 qualifying to the final. Scaling that down to 12 onsite participants means we'd have 5 contestants in the final, but of course only TopCoder knows for sure.

The hard problem has left me stunned: how come such simple question has never been studied before, and yet requires a complex but doable solution? The problem simply asked about the number of AxB rectangles of 0s and 1s such that each CxD subrectangle contains the same number of 1s. C and D are up to 4, so the subrectangle is very small, but A and B are up to a billion.

Coming back to a problem I mentioned last week: given a tree and a vertex selected in it, what's the maximum number of different vertices a walk of a given length L starting from the given vertex can visit?

The main idea of the solution is that walks in a tree do not have a lot of freedom. More specifically, if a walk in a tree starts and ends in the same vertex, then it traverses each edge an even number of times - for each step "down" there's a corresponding step "up", which in turn means we traverse each edge either 0 times or at least twice. Because of this, a cyclic walk of length L visits at most L/2 different vertices in addition to the starting/ending vertex. Suppose now that the walk is not cyclic, going from some vertex A to another vertex B. Again, since walks in a tree do not have a lot of freedom, it's not hard to see that this walk can be represented as the direct walk from A to B along the tree, plus some cyclic walks inserted into it, so we get 1 new vertex per edge when walking from A to B, and 0.5 new vertices per edge when walking along the cycles, and the maximum number of different visited vertices is 1+d+(L-d)/2, where d is the distance between A and B. Now it's clear that we should simply pick B to be as far from A as possible, but at most L edges far.

Thanks for reading, and check back next week!

Monday, August 31, 2015

A week with SSE

TopCoder SRM 666 happened very early on Wednesday (problems, results, top 5 on the left). Zxqfl won this round despite tough competition that included two coders with "target" rating yeputons and Endagorion - congratulations!

The easy problem in this round was a nice graph theory exercise. You're given a tree and one vertex of the tree is selected. What's the maximum number of different vertices a walk of length L starting from the given vertex can visit?

Codeforces Round 318 ("RussianCodeCup Thanks-Round") took place on Saturday (problems, results, top 5 on the left). Marcin_smu has claimed the first place thanks to a solution to problem E that managed to pass the system tests - congratulations! In the post-match discussion it turned out that both passing solutions for problem E were not intended to pass: Marcin's solution gave a wrong answer on some inputs, and the only other passing solution from logicmachine relied on SIMD CPU instructions to speed up a O(n^2) solution, while the intended complexity was O(n*sqrt(n)).

While both issues could probably be avoided by investing more effort into the round preparation, I think they actually highlight one of the main good aspects of competitive algorithmic programming: it's extremely objective. Every solution is judged against the same testcases, under the same time and memory limits, and completely automatically.

Thanks for reading, and check back next week!

Sunday, August 30, 2015

A week with 42

On Saturday morning, TopCoder Open 2015 Round 2D gave everybody the last chance to get into the top 40 and advance to Round 3 (problems, results, top 5 on the left, parallel round results). Of course, getting into top 1 is still better than getting into top 40, so congratulations to tomerun on winning this round!

Here are the statistics on Round 3 advancers by country:

Russian Federation - 42: Kankuro, kuniavski, pashka, Endagorion, qwerty787788, 2rf, niyaznigmatul, VArtem, darnley, yeputons, eatmore, SergeiFedorov, Egor, Baz93, Petr, hmich, meshanya,LoRd_TaPaKaH, andrewzta, Belonogov, Jedi_Knight, Burunduk1, ilyakor, mikhailOK, RiaDWaW, homo_sapiens, knightL, Michael_Levin, Um_nik, KalininN, Dlougach, Merkurev, Akhi_Anton,HellKitsune, Zlobober, Enot, UdH-WiNGeR, malcolm734, Shapo, _aid, Nikita.Podguzov, Gassa
Japan - 24: uwi, ir5, hogloid, chokudai, Komaki, anta, snuke, stqn, __math, zerokugi, sky58, kawatea, semiexp, asi1024, Lepton_s, ynasu, j_gui0121, tomerun, tozangezan, LayCurse, sugim48, kusano,DEGwer, wo[]
China - 22: Blue.Mary, xhm1994, Sevenkplus, jiry_2, xyz111, ACRush, cgy4ever, zhj1997, zyxwvu164, nhzp339, liympanda, sky_love_high, OIerRobbin, wenhanhuang, jiangyouno1, Meriones,UESTC_elfness, panjf1987, matthew99a, justever86, Shizhouxing, Hillboy
Ukraine - 10: RAVEman, Vasyl[alphacom], dzhulgakov, K.A.D.R, maksay, Gluk, vlad89, LeBron, sdya, Deretin
United States - 10: pieguy, theycallhimtom, lg5293, iridescent, scott_wu, aceofdiamonds, xiaowuc1, SourSpinach, Gaytrodex, ecnerwal
South Korea - 6: Kriii, ainu7, RRx, kcm1700, sukh1222, xhae
Belarus - 5: tourist, ivan_metelsky, forest, Arterm, subscriber
Poland - 5: Eryx, Psyho, pparys, Marcin_smu, mnbvmar
Croatia - 4: lovro, IvL, Zuza, ikatanic
Taiwan - 3: dreamoon, ShikXD, tmt514
South Africa - 2: bmerry, HiltonLange
United Kingdom - 2: al13n, PaulJefferys
Netherlands - 2: krijgertje, davidv1992
Czech Republic - 2: fhlasek, cibulka
Bulgaria - 3: espr1t, danch0, anrieff
Slovakia - 2: baklazan4247, Xellos0
Latvia - 2: eduardische, Alex_2oo8
Brazil - 2: ffao, rafaelhirama
Hungary - 1: ecsirke
Australia - 1: John Dethridge
Georgia - 1: nika
Finland - 1: Sisu
Switzerland - 1: W4yneb0t
Romania - 1: klamathix
Sweden - 1: Gullesnuffs
Bangladesh - 1: shakilaust
Indonesia - 1: handojo1
Mexico - 1: macs
Norway - 1: Im2Good
Iran - 1: EbTech
Canada - 1: zxqfl
Singapore - 1: eugenesh

Codeforces Round 317 ("AimFund Thanks-Round") took place on Saturday evening (problems, results, top 5 on the left). Subscriber has continued his excellent recent form with another victory - great job!

Problem C was a very nice exercise that involved careful reduction to a graph problem, and the observation that the graph problem in question is actually not difficult at all :) The problem statement was: you're given a boolean expression as an AND of several OR-clauses, where each OR-clause is an OR of several terms, and each term is either a variable, or a negation of a variable, and you need to check if the expression evaluates to true for some variable values. Sounds familiar? Well, the familiar problem is NP-complete, whereas this one had one additional restriction: each variable appears at most twice in the formula (including appearances in negated form).

In order for the formula to evaluate to true, each OR-clause has to evaluate to true, meaning that at least one term in each OR-clause has to evaluate to true. If a variable has just one appearance, or if both its appearances are the same (both negated or both not negated), then we can just set its value to make all OR-clauses containing this variable evaluate to true, and forget about those OR-clauses. In case one of the variables has only one appearance in the remaining OR-clauses, we can also set its value, and so on. After this process ends, we're left with some OR-clauses and variables that appear exactly twice in them, once in normal form, and once in negated form.

Here's where graphs come into play: let's make the OR-clauses the vertices of an undirected graph, and connect two OR-clauses with an edge if they share a variable. The problem now asks: is there a way to pick a direction for each edge in this graph in such a way that every vertex has at least one incoming edge?

Had the original problem been formulated in graph terms, the number of solutions would probably be much higher. The remaining graph problem is not difficult, can you see its solution?

In last week's summary, I've promised to explain the solution to Merlin QA, so let me recall the problem statement first. You are given 100 8-dimensional vectors, and you're adding them in one of 100! possible orders, but with a small twist: every time one of the coordinates of the sum is negative, it becomes zero. Here's a 2-dimensional 2-vector example: if you have vectors (5, -2) and (-3, 1), then adding them in the given order yields (0, 0) -> (5, -2) -> (5, 0) -> (2, 1), while adding them in the reverse order yields (0, 0) -> (-3, 1) -> (0, 1) -> (5, -1) -> (5, 0). Your goal is to pick the order of addition that maximizes the sum of coordinates of the resulting vector.

The key idea is: let's consider a slightly different problem! Instead of making each coordinate equal to zero whenever it becomes negative, let's say that we can change any coordinate to zero at any time.

But is this problem really that much different? It's not hard to see that the answers to those problems will always be the same. On one side, setting coordinates to zero whenever they become negative is included in setting them to zero at any desired time, so the answer to the first problem is less than or equal to the answer to the second problem. On the other hand, if we ever reset a positive coordinate to zero or skip resetting a negative coordinate to zero, the answer will not increase, and thus the answer to the second problem is exactly equal to the answer to the first problem!

The newly formulated equivalent problem is much easier to solve, because there's nothing "conditional" happening. For each vector, its contribution to the answer is equal to the sum of its coordinates that will not be reset to 0 after this vector is added. If we consider the last time each coordinate is reset to 0, and reorder the coordinates in decreasing order of that time, then each vector will always contribute a sum of its first k coordinates for some k. Of course, we should now pick k greedily to maximize the contribution. Since we don't know the last time each coordinate is reset to 0, we should consider all 8! possible permutations of coordinates, and apply the above greedy solution to each of them.

Thanks for reading, and check back soon for this week's summary!