A lot of sites offer programmatic access to their content via API’s. The main advantage for the producer of this content is to be able to control finely what they export and how they export it (and also avoid being scraped) while clients receive the data they need in a structured and documented way. These API’s are usually strictly monitored to make sure that clients can’t abuse them. For example, the producer will typically limit how often clients can call a certain API, how much data they can transfer every minute, etc…
Today’s coding challenge is based on these requirements.
We want to make multiple calls to a site that only allows to use a key 500 times every 10mn. You are given a collection of keys (strings), a max count (e.g. 500) and a time period (e.g. 10mn) and your task is to implement getNextKey() as follows:
public class SlidingWindowMap { public SlidingWindowMap(Set<String> keys, int maxCount, long periodMs) { // ... } /** * @return a key that has been used less than `maxCount` times during the * past `periodMs` milliseconds or null if no such key exists. */ public String getNextKey() { // Your brilliant solution goes here } }
I’m keeping the statement of the problem as open as possible to encourage exploration, but feel free to ask for clarifications if something is not clear. Please post your solution on a pastebin/gist like site (or wherever your code will be easy to read) and link to it in your comment. All languages are welcome.
#1 by Nwanda on September 2, 2012 - 6:34 pm
http://pastebin.com/9WP4cdch
#2 by Rogach on September 2, 2012 - 6:56 pm
Quick and not-very-efficient solution in Scala:
https://gist.github.com/3606621
#3 by Jim White on September 2, 2012 - 8:54 pm
Untested Groovy version:
https://gist.github.com/3607071
#4 by Rogach on September 2, 2012 - 9:38 pm
@Jim White:
Consider the following scenario: you have only one key (“a”), and the limit is 10. It seems that after the system would request 10 keys, your implementation will stop returning the key, despite the time limit could be already exceeded. Am I correct?
#5 by Jim White on September 2, 2012 - 9:42 pm
Yeah, I realized I totally misunderstood the task. What I think is a correct version is there now (it uses both a PriorityQueue and an ArrayQueue).
#6 by Dimitar Dimitrov on September 3, 2012 - 1:39 am
Not quite sure I got the requirement (i.e. in what order shall we return the key less than N times?, when is a key released?)
Anyway, here is an attempt in Groovy – no idioms have been used, so Java translation should e straightforward https://gist.github.com/3608443
#7 by Weeble on September 3, 2012 - 5:20 am
Hmmm. Two observations:
1. It doesn’t seem like it matters which order we dole the keys out. Suppose there are four keys, [A, B, C, D]. You could give out A 500 times, then B 500 times, etc. Or you could give out A, B, C, D, A, B, C, D… cycling through them all 500 times. Either way you do it, you can dole out 2000 keys, and from then on you repeat from the start, and can only dole out a key 2000*(k+1)+n if 10 minutes have passed since you doled out key 2000*k+n. I think the pool of keys is just a distraction – solving the problem for one key with maxCount=2000 is equivalent to solving for four keys with maxCount=500 or 2000 keys with maxCount=1.
2. If you want 100% precision, it does not seem like you can get around storing the timestamp of when you fulfilled all of the last 2000 (=keys * limit) requests. You could reconstruct this information just by polling getNextKey fast enough, so it must be held in there. However, it seems possible there may be a trade off, allowing occasional false negatives (returning null when technically there is a free key) in return for reduced storage.
#8 by pseudon mousie on September 3, 2012 - 7:37 am
http://pastebin.com/iiAjfC3V
ruby. ignores such petty concerns as “input validation” and “efficiency”!
#9 by pseudon mousie on September 3, 2012 - 7:52 am
oh hey, discussion showed up when I submitted my comment! Cool.
@weeble : ” I think the pool of keys is just a distraction solving the problem for one key with maxCount=2000 is equivalent to solving for four keys with maxCount=500 or 2000 keys with maxCount=1.”
That’s how I treated it; didn’t worry about preserving order on the keys when I retrieved them for checking.
#10 by Jim White on September 3, 2012 - 11:42 am
Thanks to Dimitar’s solution I realized that this is actually super simple. Just duplicate each key maxCount times and use the oldest one if its earliest usage is out of the window.
https://gist.github.com/3613264
#11 by Jim White on September 3, 2012 - 11:52 am
That also makes randomizing the order of the keys trivial. Just use a random number for the initial timestamp value.
#12 by Sony Mathew on September 4, 2012 - 6:04 pm
Please delete the previous comment (and this comment) as it has corporate information.
#13 by Sony Mathew on September 4, 2012 - 6:05 pm
http://pastebin.com/n75TcNJX
#14 by Martin on September 4, 2012 - 6:38 pm
Why all the randomness? I must be missing something. Here’s a python attempt
https://gist.github.com/3629939
#15 by Jérôme on September 4, 2012 - 7:23 pm
A attempt with old style java using a round robin array as a window. Simple and should be efficient.
https://gist.github.com/3630300
#16 by Matthew on September 5, 2012 - 1:55 am
Throttled and non-throttled simple approaches in Java – http://pastebin.com/YHnGFymP
#17 by Sony Mathew on September 6, 2012 - 5:48 pm
Fixed an un-optimized loop (with an early break).
http://pastebin.com/328R0sq9
#18 by Hermann on September 7, 2012 - 8:02 am
My previous post was not saved. Here it is again
http://pastebin.com/W66PMzYz
http://pastebin.com/fxZ72TGb
#19 by Weeble on September 9, 2012 - 2:05 pm
I’ve looked at the space/precision trade-off I posited above, and written up my solution here:
http://clockworkcodex.blogspot.co.uk/2012/09/key-usage-coding-challenge.html
#20 by Rafael Naufal on September 16, 2012 - 10:51 am
Cedric, my latest comment was wrong. Can you delete it, pls? This is the new one:
New Java version: https://gist.github.com/3733384
-
http://rafaelnaufal.com
#21 by Adrian Smith on September 23, 2012 - 1:47 am
http://pastebin.com/D4hdHZcy
#22 by Thierry Kormann on October 4, 2012 - 4:57 am
@cedric: please delete #24. It sucks. I resign and think my attempt in #23 is the right one. I have added two comments to explain the idea. Sorry for the poor naming and flood…
http://pastebin.com/Aifx4nzJ
#23 by Stanislav Kurilin on November 7, 2012 - 1:21 am
Like this https://gist.github.com/4030723 ?