Wednesday, September 28, 2011

TCO 2011 Algorithm Finals live coverage

16:29 - and it’s over, and I guess so is this coverage. Thanks for following! Thanks a lot to TopCoder, to all organizers and problem preparers, and to all participants and sponsors. It was a great event!

16:28 - Studio champion is abedavera.

16:25 - Mod Dash winner is Yeung.

16:23 - no changes in Marathon standings compared to the provisional results. Final sequence: Psyho, wata, ainu7, jdmetz, nhzp339, ploh, ACRush, chokudai, wleite, Rizvanov_de_xXx, RAVEman, komiya.

16:21 - full results: rng_58, bmerry, nika, ACRush, mikhailOK, PaulJefferys, [[iwi]], lyrically.

16:20 - now algorithms. lyrically’s 250 fails. PaulJefferys’ 250 fails. Everything else stands. rng_58, bmerry, nika!

16:11 - the awards ceremony has started. The first category up is design.

16:07 - and one more system test preview: lyrically says that he discovered a timeout for his 250 during the intermission - his solution had an additional log-factor.

16:04 - the reasons for resubmissions were also revealed: nika resubmitted because of special handling for 1xn, where the sum is exactly 0 or 9*n. mikhailOK resubmitted because of no handling of “n solutions” at all. And ACRush resubmitted because one of the arrays in his solution was up to 600, and numbers that arise might be up to 1000. He says that his original version might also still work, since those big numbers are never useful for a correct solution.

16:01 - and a shocker: this is actually rng_58’s last match as a participant for some time! He will be joining mystic_tc to help run SRMs from October.

15:57 - some news from the floor: first, the three Japanese guys did not agree on the strategy beforehand - it was a coincidence. Second, PaulJefferys said he was just 3 minutes away from finishing the 1000. Third, lyrically had the “no -1” case working very quickly, and was debugging the -1 case for the last half hour - and it is actually the easier case!

15:35 - no more announcements for now, the awards ceremony is scheduled for 16:00. I would expect algorithm finals results at 16:30 or so.

15:32 - they have also announced winners of a raffle, which I won’t post here.

15:19 - they started to announce the winners... of various side competitions. The winner of the MemSQL poker tournament is 7ania7 (prize: MacBook Air), the winner of the treasure hunt is mikhailOK (prize: trip to TCO2012), the winner of FIFA tournament is nika (prize: trip to TCO2012), the winners of the amazing race are theycallhimtom, pieguy, mikhailOK (prize: trip to TCO2012), the winner of the holiday photo contest is mikhailOK (prize: trip to TCO2012).

15:18 - rng_58 says he didn’t test his 1000, but the last two challenges have really increased the confidence it’s correct.

15:15 - ACRush says that his solution before resubmission might be correct as well: it had a DP array that went up to 600, but the input might contain numbers up to 999. But when the input contains a number more than 450, the answer is no solution, and his solution most probably would still return no solution. So 1xn was not the reason for his resubmission.

15:11 - mikhailOK says that his resubmitted solution still has a bug: when the number of solutions is 0 modulo 10^9+7, but not 0 proper, it will return wrong answer. But I doubt that will fail the systests.

15:09 - ACRush and lyrically both challenge rng_58’s solution, most probably on a random testcase. It survives!

15:08 - the rumor has it that the reason for mikhailOK’s resubmit was that he actually didn’t cover the “n solutions” case at all.

15:07 - lyrically still seemingly testing his own 250 solution on a large testcase. Maybe he wants to make sure it’s non-trivial.

15:06 - 3 and a half minutes to go in the challenge phase. I still expect challenges, at least at the last minute.

15:05 - rng_58 still very calm and doesn’t bother to read 500s.

15:05 - no, [[iwi]] actually started to read the 500s again.

15:04 - [[iwi]] is not challenging as well. Just moves a window around the screen repeatedly.

15:03 - rng_58 is not challenging. He just looks at the scoreboard.

15:02 - everybody reads 500s. No action there yet.

15:00 - nika brings down [[iwi]]’s 1000-pointer. This looks to be a blind challenge.

15:00 - challenge has begun! 10 minutes that may bring a lot of news.

14:55 - ACRush is preparing testcases for 500. bmerry is preparing the 1xn testcase for 500, which makes us think his solution handles it correctly. lyrically is preparing a random large testcase for 250.

14:54 - [[iwi]] submits the 1000-pointer with less than 1 minute to go! Unfortunately, that’s only enough for 7th place. The order before the challenge phase is rng_58, bmerry (3 challenges below), PaulJefferys, ACRush, nika, mikhailOK, [[iwi]], lyrically.

14:53 - rng_58 opens the 500-pointer - the only way to reach him is through challenging, so I guess he needs to try to steal those challenges away from others, too.

14:52 - Free ice cream has shown up outside the arena

14:51 - nika has been coding a stress-test to compare his solution against a dumb one, a pretty good move I guess. He could also generate some challenge cases that way. ACRush is also testing the 500, then 250 - he abandoned the 1000.

14:48 - rng_58 submits the easy to move into first place. He didn’t test anything except the examples, but for 250 that should be enough.

14:46 - we’re sorry for no updates recently - the internet connection is very flaky here, and goes off for several minutes from time to time.

14:45 - very exciting last 10 minutes. rng_58 must be really nervous now. lyrically still has a good chance at submitting the 1000, too.

14:36 - rng_58 submits the hard, and lyrically submits the easy! rng_58 will be in the lead if he solves the easy in the remaining 17 minutes. I don’t anticipate any of the 5 “usual strategy” guys to even come close on solving the 1000.

14:35 - the Japanese contestants’ strategic move makes the contest really exciting. Kudos!

14:33 - he resubmits, and opens the 1000. Everybody is working on 1000 except lyrically who’s solving 250. Top 5 are: bmerry, PaulJefferys, ACRush, nika, mikhailOK.

14:32 - mikhailOK still debugging his solution. I guess he wants to make sure he resubmits it just once.

14:30 - all 3 Japanese coders seem to be implementing the solution we described below. [[iwi]] has implemented everything except “-1” case, and is passing some of the examples. lyrically has got all examples except the last one passed, but decided to open the 250 - maybe to switch subjects for a short time. rng_58 has written lots of code that looks correct, too - but I’m not sure how close he is to submitting.

14:25 - it looks like mikhailOK has found the same bug, and is fixing his code now, one more resubmit coming I guess. This might be actually quite important at the challenge phase as two of the guys didn’t resubmit :)

14:22 - mikhailOK also submits and moves into third place. Now 5 coders have two problems submitted and 3 have none. But those 3 might actually still be on a winning strategy, since almost any score on 1000 will be higher than those scores on 500, and 250 is really quick to code.

14:20 - and nika also resubmits and opens the 1000. Now the top 4 are bmerry, PaulJefferys, ACRush, nika. We think that the reason for resubmits might be 1xn testcases since those should be handled separately.

14:19 - PaulJefferys also submits 500 to get into third place, and opens the 1000.

14:18 - ACRush has been testing the 500 for a long time, and finally resubmitted! Meanwhile, bmerry has submitted the 500 as well. The top 3 are: nika, bmerry, ACRush. bmerry and ACRush are reading the hard, nika still hasn’t opened it.

14:12 - the solution for the hard is still quite tedious since one has to deal with “-1” as well.

14:10 - nika is the first to submit the 500! He’s now in the lead. ACRush submits the 500 as well, but score is lower and he’s still in 2nd place. Both guys didn’t open the 1000 yet.

14:07 - so here’s the deal (mostly by Vitaliy): every of the 2^18 ways gives you the following information: we can go from position X to position X+delta for all X in some segment [L, R] (which is derived from the fact we must not go outside the borders). Now, we put all ends of those segments for both halves into one set of events, sort them, and go from left to right. We maintain how many segments are there open now for each delta in each half, and that allows us to quickly find the number of combinations for the current X, for the total running time of 18*2^18 or something similar.

14:03 - I’m surprised that nobody has submitted the medium yet. We must be missing something. The hard seems to be some kind of meet-in-the-middle (we just iterate over all 2^18 ways for each half), but we can’t figure out how to combine the halves yet.

13:55 - one hour to go. In the hard problem, we’re not asked about “permutations” exactly: several cards may go into one slot. The slots are numbered.

13:54 - meanwhile, 5 coders have submitted the easy, in this order: ACRush, nika, bmerry, PaulJefferys, mikhailOK. The 3 Japanese coders are still solving the hard, the rest are solving the medium.

13:46 - the medium problem: you start with a 50x50 grid of digits, and calculate sums of “crosses”: c_ij = sum(a_lm where l equals i or m equals j). You now need to reconstruct the original grid, and return it, or return the number of solutions if there are multiple solutions (or no solution if there are none). This one looks to be obvious as well: if we sum all “crosses” corresponding to one row, we will sum all numbers in that row n times, and all other numbers once. By subtracting two of those, we can find the difference between every two rows, and similarly for columns. Then, we iterate over the sum for the first row and for the sum of the first column. After that, we know the sum in every row and the sum in every column, and then we know all numbers (row + column - cross = number). Since the sum of first row plus sum of first column is the first cross plus a small number, there are only 450*10 cases to check for those two sums, and each case can be checked in 50*50 operations, which should be fast enough.

13:41 - the Japanese coders will now know that 250 is very easy, so their strategy seems to be quite solid.

13:39 - nika submits the easy for 230.88 points.

13:36 - ACRush submits the easy for 240.80

13:34 - the size of the board in the easy is 20x20, there are 20 throws. So I guess one has to calculate the expected probability when there are n throws left and the current difference in scores is m, and then iterate over all possibilities for the throw and choose the one that gives the best winning probability, resulting in a simple DP.

13:34 - in the hard Number of slots is not greater than 10^6, Di are up to 10000. Also there is information “card 1 can be Dn positions right or left to the card n”.

13:32 - the hard problem statement: count the number of card permutations (modulo “some large number”), where you are given the information like “card i+1 can be exactly Di positions right or left to the card i”. There are also special cases: Di = 0 - two cards must be at the same position, Di = -1 - card positions may be arbitrary. Number of cards is not greater than 36.

13:31 - the easy problem statement: you are given a rectangular board of digits, and you throw darts at it. At each throw, you select a 3x3 square and your dart hits each of its cells with probability 1/9. There are two players that throw darts one after another, and the one who scores more wins. In case of a tie, a fair coin is tossed. What is the probability that the second player will win after the given number of throws?

13:30 - all 3 Japanese contestants have opened the hard problem!!! All others start with easy.

13:28 - the introduction is over, 10 seconds to go!

13:25 - contestants are lining up for an introduction, in alphabetical order :)

13:23 - 6 minutes to go, still no loud introduction. Are they going to skip it?

13:20 - here are the results of a prediction contest that a Russian programming contest news website ran (before the semifinals took place):

13:17 - 12 minutes before the start.

13:14 - today, the coverage is provided by Petr, KOTEHOK, marek.cygan and Vitaliy. If you have any questions for us, I guess the best way to ask them would be in the Competition Arena, in the room where all contestants are (“2011 TCO Algorithm - Championship Round”). Contestants won’t see them (and our clever answers), don’t worry!

13:12 - lyrically seems to have just coded an efficient max-flow implementation. PaulJefferys is solving the easy problem from the first semifinal (which is the only practice round available, I guess).

13:08 - 7 out of 8 contestants use C++, and they’re busy writing long lists of #include’s. MikhailOK, being the only Java coder, has nothing to do and spends his time just thinking about nothing. Go Java! Go Russia!

13:02 - waiting for the main event to start. Contestants were just allowed on stage. See for a small overview of who competes today!

TopCoder Open 2011

The most prestigious event for individual algorithmic programmers is taking place right now in Hollywood, Florida - it's TopCoder Open 2011 time. I have already been eliminated (twice :)), but that means I will provide live coverage for the finals, where 8 best coders will compete. Here are the participants (in alphabetical order):

[[iwi]] from Japan, currently ranked 43rd in the world. He's already advanced to the TCO semifinals twice, in 2008 and 2009, but this is the first time he made it to the finals.
ACRush from China, currently ranked 3rd in the world. He has already been in the TCO finals twice, in 2008 and 2010, but is still to score his first win.
bmerry from South Africa, currently ranked 4th in the world. He has also been in the TCO finals twice, in 2007 and 2010, but haven't won yet.
lyrically from Japan, currently ranked 9th in the world. I think it's his first trip to the TCO - and he got through to the finals (the semifinal also happens onsite).
mikhailOK from Russia, currently ranked 26th in the world. It is also his first trip to the TCO.
nika from Georgia, currently ranked 7th in the world. He's already been to the TCO semifinals back in 2008, but it's his first time in the finals.
PaulJefferys from the United Kingdom, currently ranked 19th in the world. He's also been in the TCO finals last year, in 2010.
rng_58 from Japan, currently ranked 5th in the world, and the current TopCoder Open champion - he won in 2010, and is the only finalist to have won previously. He has never been to the TCO before last year.

Tune in to this blog today at 13:30 TopCoder time (click the link for other timezones) for live coverage, in less than 4 hours!

Tuesday, September 27, 2011

TCO 2011 Algorithm Semifinal 2 commentary

14:51 - The results have been announced. ainu7’s 600, Kankuro’s 900 and ACRush’s 250 and 900 failed. [[iwi]], rng_58 and ACRush advance to the finals, unbing, nika, mikhailOK and Kankuro to the wildcard. That's it for this commentary!

14:47 - Kankuro’s 600 has integer overflow. We know nothing about other solutions that might fail.

14:44 - scoreboard after challenge phase:

14:43 - scoreboard before challenge phase:

14:40 - he also tried to challenge Kankuro’s 900, also unsuccessfully. That’s it for the challenge phase!

14:39 - RAD. tried to challenge [[iwi]]’s 900, unsuccessfully.

14:38 - unbing brought down Kankuro’s 600. That moves him into 5th place on preliminary standings!

14:38 - rng_58 and ainu7 stopped reading others’ solutions, other ten contestants are still trying to find some bugs.

14:37 - Let me remind that this blog is actually a team effort :) Petr, KOTEHOK, bmerry and Vitaliy trying to bring you the latest information.

14:36 - R.A.D.’s 250 is still popular, but may be it’s too large to be safely challenged.

14:35 - most of the challenge phase is over, just 4 minutes remaining. It’s just 10 minutes at the onsite rounds. I still expect a few last-second challenges, though.

14:34 - Two contestants are reading Kankuro’s 600, but seems that they do not see a bug there.

14:32 - Four contestants are reading solutions of ACRush. What are they trying to find there?

14:32 - ACRush tries to break ainu7, but unsuccessfully.

14:31 - mikhailOK’s 600 is successfully challenged by ACRush.

14:30 - Contestants like to read 250, 600 and 900 of the others nearly equally. The most popular viewed solutions are Kankuro’s 900 and R.A.D.’s 250.

14:29 - 40 seconds before Challenge phase.

14:28 - ACRush is doing something with a large random case of many numbers. Kankuro has tested his 600 and got a negative number, his solution seems to be wrong. Some contestants are watching history and write test generators.

14:25 - Coding phase has ended. Some people think that 600 must be the most challengeable problem though there are seven sample tests. We’ll see in the few minutes.

14:23 - Kankuro has submitted 900 and now he is second in the scoreboard with all three problems complete. Some other contestants are still testing their solutions, but they seem not to pass the samples.

14:19 - the scoreboard is: ACRush, [[iwi]], Kankuro, mikhailOK, rng_58, nika, ainu7, unbing, LayCurse, pieguy, _Romka_, RAD.

14:18 - ACRush is the first to submit 900 for 551.77, [[iwi]] follows closely after him, but just for 383.96. It is enough to put him into second place, though.

14:13 - nika was one of the first to open 900, but now he has some five-dimensional dynamic programming uniformly filling his screen, seems that it is very hard to debug

14:13 - ainu7 has resubmitted his 600.

14:12 - rng_58 submits the medium. He’s in 4th place after ACRush, Kankuro and mikhailOK.

14:11 - ACRush finally compiled his solution for 900, but it did not pass the first sample

14:09 - RAD. submits the easy. Is there time still left for him to solve another problem? He opens the 600 with 15 minutes to go.

14:07 - ainu7 submits the medium for 305.75 - he still doesn’t have the easy. He opens the hard.

14:06 - finally milkhailOK was the third to submit his 600 scoring 276.88 points

14:04 - mikhailOK has passed large sample cases 5 and 6 in 600. He also has a function “getStupid” in his code which may be used in something like stress-tester.

14:03 - there seems to be not much difference caused by 600/900 instead of 500/1000, as all scores on 600 are low and it seems most likely that all scores on 900, if any, will be higher.

14:02 - mikhailOK is testing 600 and it seems to pass at least some non-trivial cases, may be we can expect submit from him after some time; most of others are still coding something.

13:57 - most of contestants working on 900 have just non-moving flashing cursor in their code, they might be thinking of something; though [[iwi]] is writing something like a testing-code.

13:57 - at this point, I think it should be pretty obvious to the contestants that many 600s and 900s may fail, so accuracy will be the deciding factor. I guess there are two main strategies - make sure you get one absolutely correct and forget about the second, or submit two hoping that at least one is correct :) Not many have the second strategy available, though.

13:54 - Kankuro was the second to submit 600 (337.64 points). He still continued to test it for some time, then finally opened 900.

13:54 - 30 minutes left.

13:53 - ACRush is writing his 900 pretty fast.

13:52 - [[iwi]]’s 900 works on some cases, but on sample case 2 it gives a negative answer.

13:51 - RAD. and ainu7 are the only contestants still to submit anything.

13:50 - it seems like at least three contestants are testing their 600, but it does not work even on small cases.

13:48 - the feeling is that the 600/900 in this round are completely about implementing an intuitively easy solution that is really hard to implement.

13:45 - ACRush submits the 600 after testing it for a very long time!

13:42 - Vitaly has suggested a nice implementation of 900 solution: you can compress coordinates, enumerating them from 1 to 300, make 300x300 matrix, then go from top to bottom, monitoring the current segment on the corresponding y-coordinate, expanding it when there is a point outside its current border of the segment, and shrinking the segment, if there is no point in the whole rectangle lying to the left from the left border of the segment (or to the right from the right border) and completely downside to the current y.

13:36 - but this seems to be fixable if handled carefully. Let’s build the “left border” and “right border”. If they don’t intersect, we’re done. Whenever they intersect, we may choose either path. Let’s try to formalize this :)

13:34 - however, the below solution for 900 has a problem: suppose we have points (0,9),(1,10),(9,0),(10,1). Then “minimal” paths from (1,10) to (10,1) and from (0,9) to (9,0) interfere. The correct solution in this case is to take cells (1,9) and (9,1) and an arbitrary path connecting them.

13:31 - it looks like nika abandoned the 600 and is working on the 900.

13:28 - the last problem doesn’t seem very difficult as well. Let’s consider the topmost point, the leftmost point, the bottommost point and the rightmost point. Then we just need to connect those in the following way: consider going from topmost point to leftmost point. It looks like this:
and so on. So we just need to find all points (x,y) such that there’re no other point (x’,y’) such that (x’>=x) and (y’>=y).

13:23 - [[iwi]] went straight from the 250 to the 900. It looks like a variant of convex hull, where the points are discretised. A set is convex if one of the shortest paths between A and B (for any A, B in the set) through the lattice grid is contained within the set. The return value is the area. The constraints on the given coordinates are quite large (~10^9). There are up to 300 input points.

13:20 - there’s lots of corner cases there, though.

13:18 - half of the contestants submitted 250, four of them are opened 600 but did not start coding

13:18 - and chains of odd lengths are basically counted by “layers”. Cells with x>=maxx-dx and y>=maxy-dy have chain of length 1, then chains with (x>=maxx-3*dx, y>=maxy-3*dy, and at least one of x
13:15 - the 600 seems easier than 250. All cells can be split into chains (by connecting (x, y) with (x+dx, y+dy) and (x-dx, y-dy)). For every chain that doesn’t have any coin, the answer is the size of the chain divided by 2 - either 0 or 1 cells in each chain will be unused. For chains that have coins, they are effectively split into smaller chains. So the only thing we need to find out is how many chains of odd length are there.

13:15 - ACRush submits the 250 for 223.21.

13:09 - nearly half of the contestants started coding, nika submitted his solution and gained 239.26 points

13:12 - nika has opened the 600. There is a chessboard (very big - up to 10^9 by 10^9) with some coins on it. You need to find out how many times you can place a coin on (x, y) and another on (x + dx, y + dy) such that you never place a coin on a square which already has a coin. dx and dy are given (also up to 10^9). The initial set of coins has up to 50 coins.

13:10 - column player can win in that situation because at her move, there are more columns than rows, so it’s always possible to choose one that doesn’t spoil the invariant.

13:09 - and question marks don’t add much - since the condition is independent for every row, it’s a pretty simple one to check even in the presence of question marks.

13:09 - nika gets the first submission on the 250.

13:07 - collectively, we’ve invented a solution: the column player wins if in each row, there is at least one cell of her color. The row player wins, if there is at least one row that is all of her color (or, in other words, if it is not true that each row there is either a cell of column player’s color or of the draw color).

13:05 - nika is starting to write solution, the others are still reading the statement

13:04 - matrix is 50x50. People don’t seem to know how to solve this yet.

13:02 - Everybody is starting with the easy problem.

13:01 - the easy problem is: given a NxN matrix with each cell colored with one of 3 colors, two players play the following game: 1st player crosses out a row, then 2nd player crosses out a column, and so on N-1 times until just one cell is left. If that cell is one color, the first player wins, the second color - the second player wins, the third color - a draw. Some of cells are ‘?’, meaning they are of each color with equal probability. What is the probability that each player wins?

12:59 - feel free to ask questions in Arena’s Chat Room 1.

12:58 - Contestants are finally at their positions. Best of luck to them!

12:57 - minus 3 minutes.

12:55 - Looks like there are 250, 600 and 900 point problems.

12:54 - Competitors are standing in the line preparing to be called to the stage. Intro is beginning.

12:49 - During setting up, people write often some well-known algorithms. One of most popular algorithms during this time is min-cost flow. Some people just write templates or solve practice

12:44 - Petr, Vitaliy and KOTEHOK writing this commentary today. bmerry might also offer some insight.

12:38 - People are setting up their workstation and writing solution templates.

Saturday, July 9, 2011

Integral bounded knapsack problem solvable in O(knapsack_size*num_of_items)

The bounded knapsack problem is: you are given n types of items, you have ui items of i-th type, and each item of i-th type weighs wi and costs ci. What is the maximal cost you can get by picking some items weighing at most W in total?

This problem is apparently reasonably popular, and there are many articles about it. However, most of those seem to be interested in approximate solutions or performance on random inputs. But what if we want to treat it as a usual algorithm problem - what is the best possible worst-case running time for solving it?

The best algorithm I could find on the Internet has complexity O(W*n*log(max(ui)). It goes like this: instead of having ui items of type i, we create several new types that are multiples of type i, for example items with weight 2*wi and cost 2*ci, then the same for 4 and so on, and declare that we have just one item of each type. We create the new types in such a way that the number of new types is logarithmic, and anything that was possible to represent using the old items is also representable using the new items and vice versa. We get a 0-1 knapsack problem with n*log(max(ui) types which leads to a dynamic programming solution with the above complexity.

However, when this problem was given at a Codeforces contest yesterday, several people came up with a O(W*n) solution for this problem. First, we start with the standard dynamic programming solution: let dpk,w be the best cost that we can get by packing the total weight of w using the first k item types. Each new type is then handled as follows: dpk,w=min(dpk-1,w, dpk-1,w-wk+ck, ..., dpk-1,w-uk*wk+uk*ck). This dynamic programming has O(W*n) states, and each state is processed in O(max(ui)).

But we can process each state in O(1) amortized time! Let's take a look at the above recurrence. First, we notice that we can separate all values of w into wk groups, based on the remainder of division on wk, and those groups can be handled separately. Then, for each group, the problem we need to solve is to find min(ai, ai-1+c, ai-2+2*c, ..., ai-k+k*c). By setting bi=ai-i*c, this expression is transformed into min(bi+i*c,bi-1+(i-1)*c+c, ...), which is just i*c+min(bi, bi-1, ..., bi-k). Thus our problem is reduced to finding minimums of groups of k+1 consecutive numbers in a given array.

And this, in turn, is a well-known problem that is solvable in O(size of array) using one of the two methods: we can either maintain a sequence of incremental (from the end) minima for a segment of size k+1 and update it quickly when we shift one position to the right, or we can just separate the entire array into blocks of size k+1, and calculate the prefix and suffix minima for each block - this allows to find the minimum for any block of size k+1 by splitting it into two blocks with precomputed answers.

Is it well-known that bounded knapsack is solvable in O(W*n) worst-case?

Sunday, June 5, 2011

ACM ICPC 2011 World Finals - prizes for first solution for each problem

Did you know that they gave $1500 to the team with the first accepted solution, and $1050 for every other team that solved one of the problems before others in the ACM ICPC 2011 World Finals? You can find all teams that got these prizes at the bottom of the Official Results.

People who went to the World Finals awards ceremony obviously do, but I think this hasn't been mentioned online. What do you think - does this make sense?

Thursday, June 2, 2011

Binary search and eps in comparisons

The usual wisdom about floating-point comparisons inside binary search is to make comparisons exact, to avoid losing precision (as opposed to comparison using some error tolerance denoted as eps). For example this:

for (int step = 0; step < 100; ++step) {
  double middle = (left + right) / 2;
  if (f(middle) > 0) right = middle; else left = middle;

is preferable to

for (int step = 0; step < 100; ++step) {
  double middle = (left + right) / 2;
  if (f(middle) > eps) right = middle; else left = middle;

when f is a monotonically increasing function, because even with small eps, there's a danger that the corresponding error in the binary search parameter will be much bigger. On the other hand, even if our comparison is incorrect for equal values due to rounding errors, the binary search will still converge correctly since equal values may only appear in one point and everything will be correct in points very close to it.

But all above is true only if f is monotonically increasing! If it is just non-decreasing, and has a plateau, going without eps gets you hugely incorrect results. This bit me today in the hard problem of SRM 508: my failed solution that passes after adding error tolerance to the last comparison in enough() (TopCoder login required).

I guess the lesson is that when trying to be too clever (instead if adding eps to all comparisons), one needs to be extra careful :-)

Tuesday, May 31, 2011

Reflections on ACM ICPC 2011 World Finals Problems

I've taken some time to reflect on the World Finals problems - hopefully that will be interesting for you to read. If you were not following the World Finals closely, you should probably skip this post :)

The problems are at, the unofficial solutions from Per at

I've really liked the problemset, so please don't take my words below as criticism - I've just tried to look at the problems from different angles and classify them in different ways, and maybe suggest some improvements for the future.

Problem A - To Add or to Multiply - ad-hoc (number theory?) (by ad-hoc I mean "need creative thinking to solve but no necessary prior algorithm/mathematics knowledge or experience solving similar problems"), most difficulty in corner cases and finding the lexicographically smallest answer.
Problem B - Affine Mess - exhaustive search (do what's described in the problem statement) + solving a system of 2 linear equations, most difficulty in corner cases.
Problem C - Ancient Messages - ad-hoc, out-of-format problem, as there's no precise definition of valid input.
Problem D - Chips Challenge - maximum flow (reduction to flow is the most difficult part).
Problem E - Coffee Central - exhaustive search (do what's described in the problem statement) + the набившая оскомину idea about using inclusion-exclusion for rectangle sums. Most difficulty in finding the lexicographically smallest answer.
Problem F - Machine Works - DP plus incremental convex hull within one quadrant. The key to producing NlogN solution is the somewhat well-known idea about incremental convex hull to find the maximum of a set of linear functions. Most difficulty is inventing this speedup (if not known in advance) and implementing incremental convex hull.
Problem G - Magic Sticks - ad-hoc, geometry. The solution has two main parts - inventing and then proving (or believing) the key mathematical fact about throwing away the longest edge, and solving a sub-problem where you have to maximize the area of the polygon with the given sides. The sub-problem is again quite well-known and was a significant advantage to the teams who saw it before. However, I believe the most difficult part was the longest edge fact. There was also a catch about one edge taking more than Pi angle which many teams forgot to account for.
Problem H - Mining Your Own Business - graph theory, ad-hoc. The most mathematically beautiful problem of the contest in my opinion. The main difficulty is constructing the "if and only if" condition for the set of escape shafts.
Problem I - Mummy Madness - interval trees (or priority queues), ad-hoc. The main difficulty is realizing (a.k.a proving or believing) the problem can be reduced to checking if a square is completely covered by a set of squares. The NlogN solution to that sub-problem is very well-known and should be very easy for top teams.
Problem J - Pyramids - quite straightforward DP (backtracking should also work). I don't see any significant difficulty at all here, but if I were to choose the main difficulty, it would be building the lexicographically largest answer in the DP solution.
Problem K - Trash Removal - a well-known problem. Solvable in NlogN using rotating calipers, but since the constraints were so low, that solution is an overkill. The only difficulty with such small constraints was to realize that the line should be parallel to the line connecting some pair of vertices. The main difficulty is the geometric formulas, but those are very easy as well.

Problem H was really standing out from the problemset, in my view, because it required graph theory thinking but at the same time involved no heavyweight algorithms or implementation difficulties. I would love to see more problems like this, but they are very hard to come up with. Problem C is unusual (so one might argue it gets close to the boundary of teams might reasonably expect) but still good. Problems E, F, I, J, K are "professional" problems - the most difficult part of each lies in a "standard set of tools" that most competitive programmers hone from contest to contest. This is not a negative thing - those ideas form the standard set of tools exactly because they're most useful "building blocks" of various algorithms. Problem K may be too well-known though. Problem G is borderline as it combines a beautiful ad-hoc part with quite difficult algorithm that might be well-known to some teams but not others. Problem D is borderline (but still very good in my opinion) because the "reduce to maximum flow" trickery can also be viewed as a part of "standard set of tools", and thus the problem is quite "professional" too. Problems A, E, J require relatively little thought, and have main difficulty in figuring out the corner cases and careful implementation, which is not exciting, and I'd say 3 such problems is too much (but maybe I'm too subjective after several teams I've supported got buried in those issues).

To spice up the long text, here's my picture with the World Champions:

Please share your thoughts :)

Monday, May 30, 2011

ACM ICPC 2011 World Finals

Here's my (hopefully) live commentary from the ACM ICPC 2011 World Finals. This post will be updated as the contest goes on.

8:31 - there's a big countdown screen that shows 28 minutes before the start of the contest. The rumor has it that the Peabody ducks will walk on the contest floor before the contestants :)

8:38 - 21 minutes before the contest start, the teams are coming in, right after 5 ducks.

8:43 - 16 minutes before, everyone is seated. I guess it's now up to Bill to entertain everybody.

8:50 - 9 minutes before, Bill is trying hard.

8:53 - to Adrian: I would expect after about 1 hour of the contest. We might try to publish the photos if we get them earlier :)

8:53 - 1 minute before! It looks like they decided to skip some minutes.

8:55 - the contest has started! It looks like there's 11 colored balloons.

8:56 - unlike yesterday, the scoreboard on the big screen is working. Teams are sorted alphabetically for now.

8:58 - no problem statements for spectators yet.

8:59 - no submits yet.

9:02 - live scoreboard from Kattis appears at

9:05 - we got the problem statements from Tatyana Churina!

9:13 - SPbSU and SPbSU ITMO solved K! No other submissions were visible before the scoreboard went down.

9:14 - according to zibada's chat, Perm solved K as well.

9:19 - photos of problem statements are being uploaded to Picasa. Sorry for the poor quality, hopefully it's still readable.

9:22 - 13 teams have solved K, no other submits. Time to look at the problem statements…

9:24 - problem K is classical: find the two parallel lines closest to each other that will fit the given polygon (maybe rotated) between them. It's not hard to see that the line must be parallel to the line connecting some pair of vertices, so you can just try them all. For even faster solution, we should just check adjacent vertices of the convex hull.

9:28 - problem C is quite straightforward as well. You are given 6 pictures of hieroglyphs, and need to write an OCR that will recognize them. The constraints say that the hieroglyphs in input may be distorted by the shape will be "topologically equivalent" to one of the 6 given pictures. The pictures are chosen such that one can just check the number of connected components of the white pixels (kudos to andrewzta for noticing that).

9:30 - meanwhile, there are lots of teams that solved 1 problem. All solved K except Tsinghua, who solved C, and Shanghai, who solved E.

9:33 - problem E is: given at most 500K points on a 1000x1000 grid, and at most 20 queries X, find the X by X square in (x+y, x-y) coordinates that has the most points inside.

9:34 - the constraints are so low that I believe the straightforward solution should pass: using inclusion-exclusion we can find the number of points in each square in O(1), then we can just check 20 million squares.

9:36 - it looks like official problem statements are up at I hope you've enjoyed my photos in the meantime :)

9:39 - problem J, which several teams have attempted but failed, is: given a number C, represent it as a sum of numbers of form a_i=i*(i+1)*(2i+1)/6, b_i=4*i*(i+1)*(2i+1)/6, c_i=a_(2i+1)-b_i (sums of squares, sums of even squares, sums of odd squares). Each a_i, b_i, c_i may be used at most once. If there are several representations, choose the one with the smallest number of addends. If there are still several ones, pick the lexicographically largest one.

9:43 - meanwhile, three teams have 2 problems: SPbSU and SaratovSU have J and K, Tsinghua has C and K.

9:44 - 6 teams with 2 problems now: NNSU and Hangzhou have E+K, Taurida has J+K.

9:45 - I would expect that simple backtracking (trying all possible choices for the next number in decreasing order, and truncating the search when current best answer is less than remaining sum divided by the maximum possible value for the rest) works for J, since the numbers are quite dense and thus the be split will always be found very quickly.

9:48 - 3 problems solved for Tsinghua, C+J+K.

9:50 - problem A, which is the only remaining problem that has attempts on it, is the following: given two numbers a and m, and two segments [p, q], [r, s], one must create such sequence of operations +a and *m that will transform every number in [p, q] into some number in [r, s].

9:51 - moreover, one should output the program of minimum length, and smallest lexicographically among those with minimum length.

9:59 - here's one idea: even if we add before we multiply, we still add a multiple of a. So in order to check what is the minimal number of steps required to transform a given segment [q, w] to become a segment inside [r, s], we can just iterate over how many times k we multiply, check that there's a multiple k*a of a that, after being added, will move the result inside [r, s], and the number of steps will be the number of multiplications plus the sum of digits in k when written in m-ary system (roughly). So in order to solve this we must solve a sub=problem: given a segment [l, r], what is the number with the smallest sum of digits (in m-ary system) inside that segment? That is a standard problem that is solved by considering the number of digits of that number that coincides with corresponding digits of l (or r).

10:04 - The above solution yields the lexicographically smallest representation as well if implemented carefully. This looks quite tricky to implement, though. KAIST is the only team to solve the problem.

10:06 - The top of the scoreboard is: Saratov, Zheijang, NNSU, Taurida, Tsinghua - all with 3 problems.

10:10 - Problem G is: you are given N segments connected in one row via flexible joints, and you need to assemble one or more polygons in such a way that every segment is a side of at most one polygon and the segments are not allowed to intersect except at endpoints. Your goal is to maximize the total area of the polygons.

10:14 - Tsinghua got 4! A+C+J+K.

10:16 - There are at most 500 segments in problem G. I guess the first question is does it ever make sense to form more than one polygon?

10:19 - Yes, obviously it does. Suppose one middle segment is much, much longer than the others so no polygon can be formed using it. In this case, the segments to the left and to the right of this segment should be treated independently. We can then build similar examples in each part to have the best answer include even more polygons.

10:21 - What doesn't make sense is to have one polygon have non-consecutive set of segments as its sides - we obviously can increase the total area by joining this polygon with the adjacent one then. So it seems that the solution should be: for every [l, r] part of the given sequence of segments, find the best polygon (this is a standard problem). Then, run a standard DP to find the best total area.

10:24 - The only trick I guess is to solve the "standard problem" for so many cases. The standard solution, as Google tells us, is that the optimal polygon with the given sides is a polygon inscribed in a circle, and we can find the radius of the circle by binary search. But that gives a running time of O(n^3*log(precision)) which may be too big.

10:30 - I would expect that it is fast enough, since n^3 is divided by 6 (the number of segments is n^2/2, and the average length of one is n/3). With the super-fast computers, that should be fast enough.

10:32 - Three teams have 4 problems: Zhejiang, Taurida, Tsinghua.

10:48 - problem H is: given a graph, you need to mark the smallest number of its vertices in such a way that after removal of any vertex, there's a path from any vertex to some marked vertex. Unless I'm missing some corner case, this is reasonably straightforward (suggested by Burunduk1): first, we find the tree of biconnected components of the graph, then we choose one vertex in each of the leaves of this tree (in each biconnected component that is a leaf, we must choose a vertex that is not the articulation point that connects this component to the rest of the graph). In case the entire graph is biconnected, we must mark any two different vertices. This is obviously less than the optimal answer since all those marks are required (what happens if we disconnect a leave of the biconnected component tree?), and it's not hard to see they are sufficient.

10:48 - SPbSU, NNSU, Warsaw and Ural also have 4 problems solved.

10:56 - as discussed offline, I expect the winner to solve 8 or 9 problems.

11:02 - problem D: you are given a 40x40 grid of ".", "/" and "C" letters. Your goal is to replace as many "."s by "W"s as possible in such a way that: the total number of "C"s and "W"s in each row and column is at most the given fraction of total number of "C"s and "W"s, and the total number of "C"s and "W"s in the first row is equal to the total number of "C"s and "W"s in the first column, the same for second row and second column, and so on.

11:14 - NNSU has 5.

11:18 - problem F: you are given 100000 machines, each described by the day it appears on (D_i), the price to buy it (P_i), the money you get when you get rid of it (R_i), and the profit it brings on each day (G_i). You may own at most one machine at a time, you can get rid of an old machine and buy a new one in one day, a machine brings profit only on days between the one when you bought it and the one when you got rid of it (exclusive). You can only buy a machine exactly on the day it appears on, and you start with the given amount of money C. What is the maximum possible profit?

11:21 - this actually looks solvable: first, let's start with straightforward N^2 DP: we solve the problem "what is the most money we can earn from the previous machines if we buy k-th machine on the day it appears" by iterating over which machine will we buy before k-th. Then, we will optimize the solution to work in O(N*polylog(N)). More specifically, suppose the previous machine is m-th. Then the maximum profit to get before buying k-th machine is ans[m]-P_m+R_m+G_m*(D_k-D_m-1), and we can only do this if ans[m]+C>=P_m. The second condition doesn't depend on k at all, and the first only depends on D_k. So essentially when choosing the best m for the given k we're choosing the highest among several linear functions of D_k. We can find that quickly, for example, by keeping track of the convex hull of the existing functions.

11:34 - meanwhile, Zhejiang, Tsinghua and Saratov also got 5.

11:36 - and Tsinghua got 6.

11:49 - problem B is: you are given integer coordinates of 3 points before and after the following transformation: first, you rotate the plane such that Ox axis passes through an integer point on the border of [-10,10]x[-10,10] square. Then, you round the coordinates (up in absolute value if the fractional part is 0.5). Then, you multiply all coordinates in each axis by a non-zero integer (different for different axes). And then, you add an integer constant to all coordinates in each axis (different for different axes). You're asked whether there exists such transformation for the 3 given points, and if yes, whether those transformations transform each integer point in the same way. It looks like one can just iterate over all possible rotations, then find scaling and translation numbers based on linear equations, and then check the resulting transformation. The biggest difficulty I guess would be handling corner cases where all points have the same coordinates and suchlike.

11:50 - meanwhile, Zhejiang also got 6 and is in second place.

11:56 - SPbSU solves 6 and moves into second place.

11:58 - problem I is: you are standing in a cell of an infinite plane, and you have 100000 mummies going after you. Each mummy stands in some cell. You make moves in turn: first, you can move to any 8-adjacent cell. Then, each mummy moves to a 8-adjacent cell that is closest to you. You lose if at any point a mummy is in the same cell as you. What is the maximal number of moves you can survive?

12:02 - Moscow solves 5th problem but it's G! Now go for the easy ones! Saratov got 6.

12:32 - had a lunch break. Lots of teams with 6, teams with "standard" set of problems have better penalty time.

12:34 - and Tsinghua solves F. I didn't expect them fixing the problem so quickly.

12:47 - problem I solution together with FedorTsarev: first, let's assume that once the mummy catches us, it follows all our moves. It means that we need to find the largest time X such that we can be in a point distinct from all mummies at time X, and that can be done by binary search. To check a specific X, we believe the following proposition works: we take a 2X+1 times 2X+1 square with center in our position, and check if it's completely covered by 2X+1 times 2X+1 squares centered at mummies'. This relies on assumption that the way the mummies move in the problem statement is "optimal", but this is kind of obvious, since their move always provide strictly optimal distance to us both by x and by y. And checking if a square is covered by other squares can be done in O(NlogN) using the sweeping line method and keeping an interval tree that stores how many times each cell is covered.

12:47 - meanwhile, 1 team has 7 problems and 9 teams have 6 problems.

12:56 - the standings are frozen. The only problem for me to solve is D. I'm going to the contest room to watch the balloons brought to the teams.

13:13 - three teams in the 7/6 group have submitted problems, but none of them seems to have the corresponding balloon, so it looks those were wrong submissions.

13:20 - still no new top teams with non-standard balloon colors. Maybe they've stopped delivering them?..

13:30 - it looks like new balloons are indeed not delivered to top teams at all. Warsaw team were congratulating each other twice, so I assume they have 7.

13:32 - it looks like Moscow got A!

13:43 - no further updates. The only way to figure something out is to look at team's exctited/disappointed reaction.

13:45 - or maybe Warsaw actually has 6?

13:55 - the contest is over! Still no news.

13:59 - Saratov has 7.

14:08 - Taurida has 6.

14:15 - SPbSU, Saratov, Moscow, Nizhny Novgorod, FAU, Donetsk, Ural, Krakow all have 7.

14:17 - Zhejiang has 8! Warsaw and ITMO have 6.

14:20 - Waterloo has 7!

14:21 - Michigan has at least 7!

14:23 - most likely results: Zhejiang, Tsinghua, SpbSU, Nizhny, Saratov, FAU, Donetsk, Krakow, Moscow, Michigan, Ural, Waterloo.

14:29 - no, it's not! Zhejiang - 8, Michigan - 8, Tsinghua, SpbSU, Nizhny, Saratov, FAU, Donetsk, Krakow, Moscow, Ural, Waterloo - all 7.

14:40 - that's it. I'll probably not update this post anymore, unless some interesting comments surface. Thanks for following my blog today!

ACM ICPC 2011 World Finals - pre-contest

ACM ICPC 2011 World Finals are happening these days in Orlando. Tomorrow, as usual, I will be doing live commentary on the tasks and on the contest itself. Today, about 9 hours before the start of the contest, I write about what happened before the contest - using style borrowed from Mike's notes.

On Friday, we went to see the ocean and the Kennedy Space Center, where most American space launches happen. The space center was very disappointing - it looks like the only reason to go there would be to view an actual launch (and the next launch is in July), that should be very impressive and maybe even educational. What we found there was:

Real-size models of most American rockets, Russian ships and the Shuttle:

Space-themed and not-so-space themed attractions and fountains (really nice in 30C heat):

The ocean is nice and warm, but comes with some jellyfish bites :)

On Saturday, the organizers took us to SeaWorld resort. The resort welcomes you with a traffic jam on the parking entrance:

They started with an IBM history recap given on a stadium where the killer whale show (Shamu) takes place. Not everyone was excited:

Then we were given lots of time to explore SeaWorld, which was quite exciting. The most exciting part was the dolphin show, but I'm yet to download it from the camera, so here goes the still part of SeaWorld:

On Saturday, the Opening Ceremony and two Practice Sessions took place. Here's how the contest area feels:

And here's how the teams look:

Good luck to all teams, and especially to the team in the last picture - and stay tuned for the live commentary tomorrow right in this blog! The contest starts around 9am local time (click the link for other timezones).

Saturday, January 15, 2011

Solution to the alternating jumps problem

Let me remind about the problem: we alternate jumps to the left and to the right, possible jumps to the left are given as set A, possible jumps to the right are given to set B. Is any integer point reachable?

When we first approach this problem, the alternating jumps condition restricts our thinking, so the natural move is to get rid of it. How? Let's pair up the jumps to the left and to the right. Then we have a new set of possible jumps C = {b - a, for b from B and a from A} that consists of all differences between B and A. We can then jump by numbers from C arbitrarily, so the sub-problem that we need to solve is:

Given a set C of integers, which integer points are reachable if we jump by numbers from C starting from 0?

This is where you have to apply the previous knowledge. More specifically, the fact we're interested in is: given a set of integers C = {c1, c2, ...}, their greatest common divisor g (the largest positive integer that divides every number in C) is representable as g = q1c1 + q2c2 + ... where q1, q2 and so on are integers.

This fact is about what we need to solve our sub-problem. First, we note that every integer point that is reachable by jumping by numbers from C is divisible by g (as a sum of several numbers divisible by g). What we hope for is that the reverse is true as well: every point divisible by g is reachable.

The above representation of g = q1c1 + q2c2 + ... almost gives us the strategy to reach point g: each qi means the number of times we jump by the corresponding amount ci. The only problem that some qi may be negative, and we can't do a jump minus one times.

Now the question is: take one number c from C. How do we jump by -c? If we can do that, we can use the above formula to reach point g. Let's consider the case where c is positive. Here's where we need another non-trivial idea: let's check if there's any negative number if C. If not (all numbers in C are non-negative), it's obvious that we can't reach -c (or any other negative number). However, if there's a negative number -d in C, we can reach -c by doing the -d jump c times, then doing the c jump d-1 times: -d*c+c*(d-1)=-c.

In case c is negative, a similar argument shows that if there's a positive number in C, then we can reach -c, otherwise we can't reach any positive number.

Let's summarize the result that we've arrived at. If C has at least one positive number and at least one negative number, then we can reach point g. We can reach point -g in the same manner. But then we can reach any point divisible by g by repeating the sequence!

Having (almost) solved the sub-problem, we return to the main problem. The solutions for the sub-problem correspond to the points that we can reach after an even number of steps. What about points reachable after an odd number of steps?

First, consider the case where C is "bad" - has only non-negative or only non-positive numbers. We know that not all points are reachable via even number of steps. Moreover, we know that a lot of points are not reachable via even number of steps: all positive points or all negative points. But then it's obvious that not all points are reachable via odd number of steps as well: if we'll always be at a non-negative point after an even number of steps, one more step can only bring us this far into the negative half of the line - so all numbers below a certain value (minus maximal element of A) are not reachable, and the answer to the main problem is "NO".

Then, consider the case where C is "good". We know that every number divisible by g is reachable via an even number of steps. What does one more step (subtracting a number from A) change? We can reach every point that has the same remainder of division by g as minus number from A. But what remainders of division by g can numbers from A have?

Since g = gcd(a1-b1, a2-b1, ...), then a1-b1 and a2-b1 are divisible by g. But then their difference, which is a1-b1-a2+b1=a1-a2, is divisible by g. Similarly, the difference between any two numbers in A is divisible by g. So all numbers from A have the same remainder x of division by g!

We've finally summarized all numbers that are reachable: those divisible by g and those that have the remainder of g-x of division by g. When does that cover all numbers? When g equals 1, or when g equals 2 and x equals 1. With larger g, some remainders are not covered. With g equal to 2 and x equals 0, odd numbers are not covered.

We're almost there - here's what we have now. When all number from C have the same sign, the answer is "NO". Otherwise, if g is the greatest common divisor of C, then the answer is "YES" if and only if g equals 1 or g equals 2 and a1 is odd.

The only problem is that C may have too many elements: when A and B have 10000 elements each, C may have as many as 100 million! The last step is to notice that we don't need to find C. All we need to know is:
- does C have a positive number?
- does C have a negative number?
- what is the greatest common divisor of all numbers in C?

The first question (and the second one similarly) is answered easily. C has a positive number if and only if the largest element of B is larger than the smallest element of A.

And the last question can be answered using the trick mentioned above. Consider an arbitrary element of  C: ai-bj. Since we already know that differences between numbers in A (or numbers in B) are divisible by g, we can represent this number as the sum of three numbers divisible by g: ai-bj=(ai-a1)+(a1-b1)+(b1-bj). That allows us to consider the greatest common divisor g1 of the following set C1 of numbers: a1-b1, ai-a1 for all i, and b1-bj for all j, and prove that g1 equals g. This is easy: g divides g1 since g1 is the greatest common divisor of numbers that are differences of numbers from C. But g1 divides g for the same reason: every number in C is a sum of three numbers from C1! So g1 equals g, and we can find g as the greatest common divisor of C1 which has at most 20000 elements.

Phew! This was a long solution, not because it's complicated, but because it has many steps that have to be carefully executed. Each particular step is either obvious or represents a relatively easy mathematical fact. However, in order to successfully walk the entire path and not make any mistake along the way (take a look at the contest results: the problem in question is problem A) you have to be really careful and have a good understanding of where you are and what is left to prove. That's why I said this problem is really about formal reasoning skills.

To spice up the long textual post, here's a picture of our Moscow State University team at the ACM ICPC 2003 World Finals in Beverly Hills (by David Hill, from the official World Finals photo archive):