Hacker Newsnew | past | comments | ask | show | jobs | submit | possiblywrong's commentslogin

Following is the closed form solution linked in the article (from a since-deleted Reddit comment):

    from functools import reduce

    def recurrence(ops, a0, n):
        def transform(x, op):
            return eval(repr(x) + op)
        ops = ops.split()
        m = reduce(transform, [op for op in ops if op[0] in ('*', '/')], 1)
        b = reduce(transform, ops, 0)
        for k in range(n + 1):
            print('Term 0: ', a0 * m ** k + b * (m ** k - 1) / (m - 1))
> This is really only interesting if a particular (potentially really large) term of the sequence is desired, as opposed to all terms up to a point. The key observation is that any sequence of the given set of operations reduces to a linear recurrence which has the given solution.


It's interesting that the two forward versions of the algorithm were added to Wikipedia just a few months ago. (The OP article is from 2020.)


> Second, the least significant bits that come from this generator are biased.

I don't think this is true; I'm guessing that the author borrowed this observation from one of the various other articles linked in this thread, that address the situation where we draw a 64-bit random unsigned integer and divide by `1<<64`, instead of drawing 53 bits and dividing by `1<<53`. The former does introduce a bias, but the latter doesn't (but does still limit the fraction of representable values).


I'm not sure which PRNG is used here, but some PRNGs have regularities in the lower bits.

This is e.g. the case for classical LCGs and the xoroshiro*+ family of PRNGs:

> If, however, one has to generate only 64-bit floating-point numbers (by extracting the upper 53 bits) xoshiro256+ is a slightly (≈15%) faster generator with analogous statistical properties. For general usage, one has to consider that its lowest bits have low linear complexity and will fail linearity tests (https://prng.di.unimi.it/)

> When m = 2k there is a further problem. The period of the bth bit of an LCG (where bits are numbered from the right, starting at one) is 2b , thus although the period is 2k, only the high bits are good and the lower order bits exhibit a clear repeating pattern. (https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf)


You are correct. The first example time in the article, "2024-12-25 at 18:54:53 UTC", corresponds to POSIX timestamp 1735152893, not 1735152686. And there have been 27 leap seconds since the 1970 epoch, not 29.


Author of the linked article here; you're right that just the expected number of guesses is roughly monotonic. However, that expected number of guesses (along with Ballmer's expected payout) increases, not decreases, as N increases, with a sharp step downward, not upward, each time N reaches a power of two.

The intent here is to reasonably compare hypothetical job interview questions for different N. If N=100, then Ballmer offers 6 dollars in exchange for one dollar per guess, and the linked Gukov article shows that Ballmer's expected return is negative.

What if he instead challenged an interview candidate with N=127? What about N=255, or N=511, etc.? He presumably wouldn't continue to offer only 6 dollars to play in these cases, but also presumably a similarly nice round number of dollars... and now the question/conjecture is interesting: from the updated graph, it looks like Ballmer's expected return decreases as N=2^k-1 increases, but does it always remain positive? Or is there some sufficiently large key space beyond which even an optimal guesser can't ever win on average?

To me, this question seems easier to ask when the expected return is presented in this way. That is, does each sawtooth interval always span zero expected return?


> Note that if F(x,y)=0, then the point (x,y) is exactly on the circle. If F(x,y)>0, then the point is outside of the circle, and if F(x,y)<0 then the point is inside of it. In other words, given any point (x,y), F(x,y) is the distance from the true circle line [my emphasis].

This last is not quite true. The exact distance from the circle, call it G(x,y), is the corresponding difference of square roots, i.e.,

  def G(x, y, r):
    return math.sqrt(x * x + y * y) - math.sqrt(r * r)
and G(x,y) isn't just the square root of F(x,y), and indeed doesn't behave monotonically with respect to F(x,y).

It's an interest property of Bresenham's algorithm, that I've never seen even stated let alone proved in the literature, that this doesn't matter, and the algorithm is indeed exact in the sense that it always chooses the next point based on which is truly closest to the circle... despite using an error function that is only an approximation.


There was some amount of "standardization" of checksum methods to help with debugging typos. For example, Nibble magazine used a couple of programs-- the one I remember using was Key Perfect-- that would scan your program in memory after you typed it in, and report a series of short hex checksums, one for each reasonably small chunk of program (bytes for machine language, or lines of BASIC code). The magazine article would include the correct Key Perfect checksums for the program; compare against the generated report, and a difference meant you "only" had to scan a dozen lines of source instead of hundreds.


Author of the linked article here; re marking movement of shadows, I wrote a follow-up that discusses some alternatives, mainly that one:

https://possiblywrong.wordpress.com/2014/07/05/using-a-watch...


Author of the article here; I suppose I should confess that I did consider automating the counting, which did seem like an interesting problem... but I rejected the idea, for what I think is good reason: the sorting was the time-consuming part. If we skip the sorting, and depend on the code to get the counting right, and we assume that we are at worst off by one-- or more realistically, off by two, to account for the questionably-identifiable chunks of paste and such-- then how often do we need to go back for manual verification?

With hindsight (although up-front simulation bore this out as well), we see that we would have to go back and manually verify anywhere from 8% (off-by-one) to 25% (off-by-two); that's every fourth pack. Now consider how much more unpleasant that manual counting is, since the Skittles are haphazardly un-arranged in the image. In short, I'm unconvinced that an automated-- while similarly accurate-- accounting would be that much more efficient.


> the sorting was the time-consuming part

I'm not sure I understand why sorting is necessary.

> Now consider how much more unpleasant that manual counting is, since the Skittles are haphazardly un-arranged in the image.

It's actually very easy: my program outputs images annotated with each recognized Skittle circled with the guessed color: https://imgur.com/a/jlPWXRf You don't need to manually count, just make sure all the circles are the correct colors.

I'm also pretty confident you could improve the accuracy quite a bit by improving the lighting when taking the photos, and/or incorporating something more sophisticated like a neural network (probably trained with Skittles from several bags, separated by color+rejects, with many photos in different random arrangements)


> You don't need to manually count, just make sure all the circles are the correct colors.

I guess this takes me a while, and seems significantly more error-prone to me than the corresponding re-count when they are sorted. Granted, it's pretty easy when the Skittles are sorted as they are in these images. But how quickly can you scan this image: https://imgur.com/a/KpddGdH and check whether there are any errors? And how confident are you in that visual check?

You are right that this approach may be good enough to find a duplicate, which was the primary objective of the experiment. But I had hoped that this might also serve as a useful dataset for future student exercises, in probability, or even in just this sort of computer vision project... but I wanted to have accurate ground truth, so to speak. Inspecting your spreadsheet, it looks like this algorithm is still less than 95% accurate, even if we only evaluate the "clean" images with Uncounted=0.


Author of the article here; you have a point that there is no explicit discussion of validating this assumption, beyond the variability shown in the colored curves in the "count per pack" plot.

Having said that, this small sample is indeed reasonably consistent (or at least not inconsistent) with that iid assumption for the color of each individual Skittle. We would not expect to see any 80+% red packs even assuming that color was perfectly uniformly iid, because the probability of observing such a pack is so small (less than 10^(-19)).

However, still assuming this model, we should expect to see packs with very small proportion of reds... and we do, with one pack having just 3 red Skittles, for example. The entire distribution of proportion of red follows the assumed binomial distribution very closely.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: