The primary concern with this isn't necessarily the extra allocation and GC (though that plays a part too), it's the fact that you're chasing a pointer for every object. That's murder for the cache. MUCH faster to just have the data in-line in hot loops like this.
If Valhala ever lands, you will be able to do as in C#, D,...
Until then, the best way currently, is to make use of Panama to create a C struct like memory layout, and have accessor methods for the low level details.
Trivially solved by C# - you can write an algorithm implementation with the same data structures that would perform the same as in C, C++ or Rust but often with more convenience.