Good to know. I went to recheck, because I swore I didn't see anything about that when I looked earlier, but they now say 3.2Tbps infiniband... not sure if they changed it or I was just blind.
Hetzner has excellent connectivity: https://www.hetzner.com/unternehmen/rechenzentrum/
They are always working to increase their connectivity. I'd even go so far to claim that in many parts of the world they outperform certain hyperscalers.
I used to have a dedicated server there and what happened to me is that my uploads were fast, but my downloads were slow. Looking at an MTR route, it was clear that the route back to me was different (perhaps cheaper?). With google drive for example I could always max out my gbit connection. Same with rsync.net
Also I know that some cheaper Home ISPs also cheap out on peering.
Now, this was some time ago, so things might have changed, just as you suggested.
Less than 50$ will be really hard, at least in any form of professional setup (so not hosted in a random basement ;) ).
Our lowest at Genesis Cloud at this time are instances with an RTX 3060Ti for 0.20$/hour which adds up to 146$/month ( https://www.genesiscloud.com/pricing#nvidia3060ti )
Though, this includes free storage, no egress fees and has a lot more power than a Jetson.
If you need to optimize for low cost hosting, did you already check whether you actually must have a GPU for your use case? Modern CPU have some impressing capabilities.
Here we have another instance of self-promotion that does not align with what GGP mentioned. Thank you, Genesis and Lambda, for promoting yourselves in a startup thread. Given your long-standing presence in this industry, I would expect better engagement from you.
You were doing so well, self promoting and engaging with the community on your post earlier. I didn't expect to see you stoop to this level of commenting.
Maybe it's time to step away from the keyboard for a while?
I appreciate and respect every user who contributes by asking questions, providing feedback, or sharing suggestions. However, it is disappointing and unreasonable to witness self-promotion from companies that have been established in this industry for a considerable period of time under a startup thread.
Moreover, the fact that their self-promotion does not align with the intention of the original discussion and GGP explains their purpose. Their primary goal is not genuinely assisting or finding a solution.
In such cases, as you can imagine, it's challenging for me to maintain respect.
Many years ago at university I got to play with a system built by a former student that took wood blocks (children's toys) to build structures on top of a table monitored by kinect cameras. It would then identify features and generate a floor plan.
Now imagine combining this! It would allow for a whole new level of exploration of ideas.
Can you share a bit what setup you use to generate the images? Do you run your own GPUs?
Competitive prices (billing by the minute, only pay when you actually run an instance).
High reliability (professional DCs, customized hardware to suit requirements).
Good connectivity (traffic is also free, no in-/egress fees).
High security level (full VMs with dedicated GPUs with proper separation of customers instead of shared hosts with docker).
Free storage.
A great support team.
Green energy (no greenwashing by carbon offsetting, we use energy sources that are renewable and carbon free at the source (geothermal/hydro)).
I could go on...
Would love it if you just try our services, after sign up there are free credits available for risk free testing.
While I do not have any A100 handy right now I have an instance running on Genesis Cloud with 4x RTX 3090.
A quick, very unscientific, test using the oobabooba/text-generation-webui with some models I tried earlier gives me:
* oasst-sft-7-llama-30b (spread over 4x GPU): Output generated in 28.26 seconds (5.77 tokens/s, 163 tokens, context 55, seed 1589698825)
* llama-30b-4bit-128g (only using 1 GPU as it is so small): Output generated in 12.88 seconds (6.29 tokens/s, 81 tokens, context 308, seed 1374806153)
* llama-65b-4bit-128g (only using 2 GPU): Output generated in 33.36 seconds (3.81 tokens/s, 127 tokens, context 94, seed 512503086)
* llama (vanilla, using 4x GPU): Output generated in 5.75 seconds (4.69 tokens/s, 27 tokens, context 160, seed 1561420693)
They all feel fast enough for interactive use. If you do not have an interface that streams the output (so you can see it progressing) it might feel a bit weird if you often have to wait ~30s to get the whole output chunk.
If you want to try it on one reach out to me (email in profile). We rent those out in the cloud. Would allow you to confirm performance before buying one for local use.
Source: Was personally involved in design of that deployment.