Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is not the obvious solution to the problem? I used to use this method to compute test averages in my head when I was in elementary/middle school. Teachers were often willing to read off the (anonymized) scores of a test, but weren't willing to do the math and figure out the average. Also used it in college to figure out what grades I needed to stay above a 3.0 GPA, and on the SATs and AHSME to figure out my margin of safety.

IMHO, a more interesting interview question is:

"How would you efficiently compute a sliding-window average, where each number has an associated time and each time a number comes in, you want to quickly compute the average of all numbers whose timestamps fall within the last 5 minutes?"

This type of problem comes up all the time in the financial industry - I had to implement it for my last employer. The best algorithm I can think of takes amortized constant time and linear (proportional to window size) space. Anyone able to think of a better one? Wall Street wants you. ;-)



The best algorithm I can think of takes amortized constant time and linear (proportional to window size) space. Anyone able to think of a better one? Wall Street wants you. ;-)

It's easy to prove that linear space is optimal -- you can't store N bits of information using less than N bits of space.


Well, here's a naive one written in Lua. it keeps a fifo list of {timestamp, number} pairs. The fifo is implemented as a double-ended queue in a closure. The averaging function takes in the current time and a new number. Pop off the pairs that are outside of our sliding window and subtract their value from a tally. Push on our new value and add its value to the tally. Return the tally divided by the size of the queue. This algorithm takes linear time with respect to the rate of transactions (i.e. how many do we expire each call) and linear space with respect to window size (how many entries fit in our sliding window).

  function create_sliding_averager (window_span, debug) 
      local window_span = window_span -- seconds
      local tally = 0
      local fifo = { tail = 1, head = 0 }
      local debug = debug
  
      return function (amount, current_time)
          local head = fifo.head + 1
          local tail = fifo.tail
  
          fifo[head] = { timestamp = current_time, amount = amount }
          tally = tally + amount
  
          local cutoff, old_time = current_time - window_span, nil

          if fifo[tail] then
            old_time = fifo[tail].timestamp
          end
  
          if debug then
              print ('cutoff: ', cutoff, 'old time:',
                    old_time, 'current:', current_time)
          end

          while old_time ~= nil and old_time < cutoff do
              tally = tally - fifo[tail].amount
              if debug then
                print ('expiring: ', fifo[tail].timestamp, fifo[tail].amount)
              end
              fifo[tail] = nil
              tail = tail + 1
              old_time = fifo[tail].timestamp
          end
  
          fifo.tail = tail
          fifo.head = head

          if debug then print(tail, head) end
          return 'avg: '..(tally / (head - tail + 1)), 'size: '..(head - tail + 1)
      end
  end
  
  averager = create_sliding_averager(10) -- 10 second average
  
  while true do
      local rand = math.random()
      local time = os.time()
      print (averager(rand, time))
  end
I would usually use a small library to implement (push(), pop(), peek()) on the fifo but this example will run in a stock interpreter. On my system the average for rand() hovers around .5 so it passes my initial smell test. It's hovering around 80k entries in a busyloop with a 10 second window.

This is a pretty brute-force solution. I'll see if I can think of something more elegant.


That was the algorithm I used, though of course it was in Java. It's actually amortized constant time (assuming a sensible dequeue implementation), because each number you push on can be popped off only once. If you end up popping lots of entries at once, it means that you went a long time without popping anything as the dequeue was populated.


> The best algorithm I can think of takes amortized constant time and linear (proportional to window size) space.

Sounds about right for what we do to compute Geometric Brownian Motion.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: