From how I understand the presentation, they ran into trouble because of code si...

From how I understand the presentation, they ran into trouble because of code size. Basically stack code is much denser because you don't need arguments for registers and, given the 16kbytes -> 6 kbytes is more than just half from going 32->16bit, appearantly a bit more efficient for implementing their program too I guess.

The size was a concern because of the limited size (16kbyes) of bram modules on an fpga.

I can't say why it performs so much better though, maybe just because it's simpler & 16 bit so it's smaller and can be clocked faster on the fpga? That is just a guess though.

I'm not sure if it is fair to compare the J1 to the BOOM, the latter does a lot more I'd think. That being said, it shows that for such use cases a stack machine can be better suited and much less complex to design, maintain and work with.