As others have pointed modern processors are three orders of magnitude faster in terms of clockspeed than processors from back then. There's also the fact that modern processors do a lot more per clock than this guy: using 64-bit wide datapaths, and executing multiple instructions every clock cycle rather than taking multiple clock cycles to execute an instruction, and having multiple independent cores. One rule of thumb is that your performance tends to increase with the square root of the number of transistors you use so you'd expect another 3 orders of magnitude increase in performance from the 6 orders of magnitude more transistors, for a modern chip being 6 orders of magnitude faster at executing some algorithm overall (if there's a normal amount of parallelism to extract).
Now, normally you have to worry about increasing clockspeeds having diminishing returns, since memory latency remains constant despite a faster CPU clock. But anything that could run on the amount of RAM the 6502 could handle would fit in a modern processor's L1 cache, and the scheduler is perfectly able to hide L1 latency so I think ignoring this factor is fair in this case.
The x1000 is a huge understatement. For example these days CPUs are much more optimal in terms of cycles per instruction and inversely instructions per cycle. Back then when multiplication of two word-sized(8 bits back then) values took 24 cycles, these days we can do that in 12 cycles for 64-bit values. Because of superscalar processing and thus instruction level parallelism, we can typically do 2-4 ALU operations in parallel(given that there's no data dependencies) and thus increase the instruction throughput 2-4 fold. Then, because of SIMD features and data level parallelism we can do same operaton on multiple data(say, operate on a vector of 4 elements in a single cycle) and thus we eliminate the need for repeated instructions.
This all gets a bit complicated in modern days because of memory access costs and caches which try to alleviate the costs, but the idea is that modern CPUs are likely to be around 10 times as fast per-clock as 6502 and because of multiple cores and threads that value goes to something like 40-60. Add the huge increase in clock speed and you're a bit south from x100_000 in optimal case.
I would hope every programmer would write some core on a C64 to really learn how much RAM the 64 KB really is. You can actually waste some of it and in some cases it really is "enough so that I don't have to optimize". :) Real hard-core people would go with VIC-20 which as only 5120 bytes of RAM, or Atari 2600 with 128 bytes of RAM. One could imagine there's nothing you can do with them but oh boy how wrong one would be! Heck, a single tweet is 140 characters. And you can fit that in 128 bytes. You really can... :)
Moore's law is about the number of transistors. The 6502 had ~3500. Modern desktop CPUs have ~2,500,000,000 (see also http://en.wikipedia.org/wiki/File:Transistor_Count_and_Moore... ).
So that's more like 1,000,000x (thus six orders of magnitude).
Then again, the grandparent was talking about speed, and speed doesn't scale linearly with the number of transistors.
Not to mention further performance advancements in processor design since then (pipelining, SIMD, etc...), further increasing throughput above the 1000x threshold. One should also consider the increases in word length, adding the ability to process more data in less time.
If you want to add 2 32-bit integers, on 6502 you'll need something like the following, assuming this is a 32-bit integer you're actively working with and are probably about to use again fairly soon:
That's for a total of 38 cycles. So on the computer I started programming on, you could do ~52,000 32-bit adds per second.
By comparison, for a modern Pentium, according to Intel's docs, a 32-bit add (again, on data you're using) takes 1 cycle, end to end.
ADD ESI,EDX
So on the laptop that's in front of me, which is a crap one, you could do 2,530,000,000 32-bit adds per second. A 48,000-fold performance increase. Maybe 96,000 times, if you have no dependency chain (ADD throughput is 2 per cycle).
This ignores the fact my modern computer has 2 cores.
And that's loading/storing to/from the zero page (the first 256 bytes of memory). Loading/storing from higher addresses requires 4 cycles.
But, "ADD ESI,EDX" is adding two registers isn't it? So I think you need to include the loading/storing of those registers back to memory for a more fair comparison.
I haven't touched 6502 assembly in over 20 years. Brings back memories. :-)
If you're adding constants, you might as well load each byte of the result directly, when you need it. (I can't tell where the LSB is coming from in this code - perhaps it isn't a constant? - this example doesn't resemble any code I've ever had to write.)
Perhaps the code is intended to be modified at runtime, but then you'd then still want one of the operands loaded from memory, I think (otherwise why not just precalculate the results?), and I've generally found the (fairly substantial) fixed expense not to be worth it anyway.
Anyway, overall I think you're being a bit unfair to the x86 with this comparison.
I recall waiting a couple of minutes for my computer to just boot in the 80s. When I want to use my phone, it becomes usable in well under a second.
Waiting a few seconds every time I hit save was fun. Didn't stop me from developing a ferocious ^S reflex. Fortunately, save is fast enough not to be noticeable these days, to the extent that it usually happens automatically now.
Watching a WYSIWYG font menu draw each individual entry was fun. We certainly don't get that pleasure now.
But yes, things certainly felt faster in the 80s.... /s
My Apple IIGS could be "operational" pretty fast if all you were after was a BASIC prompt with no disk access. If you wanted a BASIC prompt with disk access, that took some seconds. If you want to actually load useful software, it took quite a while.
In terms of clockspeed, the 6502 ran at 1 to 2 MHz. Today's processors are at most running at around 2 to 4 GHz, so in terms of "order of magnitude" 1000x is spot on. Of course, on a clock-for-clock basis modern architecures are a lot wider too, which will also account for better performance. But clockspeed is simple enough.
Wikipedia lists the 6502 as having between 1 MHz to 2 MHz and the Samsung Galaxy SII as having around 1.2 GHz.[0] I'm guessing that's where the 1000x comes from... Or, you know, it's just a nice big number. ;P
[0] Of course, that's ignoring multiple cores, better microcode, caches, etc.