Hacker Newsnew | past | comments | ask | show | jobs | submit | phdelightful's commentslogin

I just put Anubis in front of my self-hosted forge this morning because AmazonBot had helped itself to 750 GiB (!) of traffic to my public repos this month!

At least, it claimed to be AmazonBot…


Are they in this space? [1] One could map the ranges into a web daemon and rate limit them or just 'ip route add blackhole ${cidr}' each cidr block.

[1] - https://ip-ranges.amazonaws.com/ip-ranges.json


I just do this for the IP ranges of Amazon, OpenAI, Huawei and other companies that run these insane crawlers: it's 100% effective and it doesn't annoy real users with a captcha or some PoW thing. There's simply no reason for them to reach my homeserver other than to scrape the hell out of it.


I didn't check thoroughly, but the first one I happened to grep out was not on that list:

"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36"

"x-forwarded-for":"44.210.204.255" "x-real-ip":"44.210.204.255"

This is a bit outside my area of expertise, so I don't know how reliable these x-forwarded-for and x-real-ip are.


One of the places to look it up would be bgp.tools [1] The IP is purported to belong to Amazon and the ASN has some interesting tags. [2] Any form of forwarded-for can be spoofed and should only be considered from expected up-stream proxies such as a CDN and they should have a CDN specific IP header that would be listed in their documentation. Typically the first column in access logs will be the REMOTE_ADDR which is the actual network connection but if using a CDN that would be the CDN IP.

If a CDN does not have an option to block cloud and Tor CIDR blocks then that should be a feature request.

44.210.204.255 is included in 44.192.0.0/10 which is listed in the AWS CIDR ranges. Use one of the online subnet calculators to find IP ranges of CIDR blocks. This is likely a Tor exit node.

Blocking the CIDR blocks I listed in the thread would have included this node as well. Here [3] are a few shell functions for getting some of the cloud CIDR blocks. I must have been inebriated when I wrote those. This site may not be reachable during blood moons or when the nanosecond is divisible by zero.

Here [4a][4b] are a couple decent subnet calculators. There are some command line tools for playing with CIDR blocks and IP addresses to see if an IP is included in a CIDR block but this varies by Linux distribution so perhaps look for a generic python script.

To get a list of Tor exit nodes to blackhole route, look at [5]. This updates often. Just clone the entire repo. Unless your site is related to government dissent or anonymous porn then most traffic from Tor exit nodes will likely just be bots and thus riff-raff.

Seconds after I linked realhackers bots showed up and got a zero byte response. Poor lil HN servers must get a lot of trash non stop. I hope I get some delicious bots today.

[1] - https://bgp.tools/

[2] - https://bgp.tools/as/14618

[3] - https://ai.realhackers.org/_get_cloud_cidr.txt

[4a] - https://mxtoolbox.com/subnetcalculator.aspx

[4b] - https://www.vultr.com/resources/subnet-calculator/

[5] - https://github.com/firehol/blocklist-ipsets/blob/master/clea...


Thank you, I learned a few things!


That's all of Amazon AWS, not just Amazon's AI system.


Yup, mostly. There are more ranges for the Amazon store too.

It would be rather nifty if Amazon and other companies would confine AI to specific CIDR or a dedicated ASN but I would not hold my breath on that one. AI crawlers will likely muddy the waters for everyone else.


That list is a tad bit too long. Why don't they enforce a rule on these big corps to publicly state which range does what.


That would indeed by handy but I think the answer is that people would block specific ranges. By not segmenting into specific groups people are forced to either:

- play the game of whack-a-mole

- use difficult implementations of user validation checks that potentially cause pain for real humans

- block all Amazon CIDR blocks which they know most corporations will not do.

This forces the majority to just tolerate whatever comes out of their networks.


That's why you need to regulate them behemoths.


In my logs it appears like this:

BOT","cluster_name":"EU","cluster_region":"EU","connection_type":"corporate","country":"US","device_type":"ROBOT","duration_ms":0.391,"duration_us":391,"filter":"","ip":"52.1.106.130","isp":"Amazon.com, Inc.","level":"info","msg":"Request evaluated","org":"Amazon.com, Inc.","os":"","ref":"","region":"Virginia","result":false,"time":"2026-05-15T13:33:20Z","ua":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36","why":"bot"}

3.227.180.70

23.21.175.228

23.23.137.202

from all these IPs.


At least, it claimed to be AmazonBot

It's good that you mentioned this; smear campaigns are definitely not a new thing, and I suspect a lot of this DDoS'ing that's going on is a plot to accelerate towards Big Tech's authoritarian dystopia. Basically extortion.


i see the bots with user agent claude bot, using AWS IPs.

I've also seen Google bots with AWS IP ranges. You gotta look at their ASN/ISP/ORG


Do you have a robots.txt?


> We are writing to inform you that starting Monday, June 15, 2026, crawl preferences for Amazonbot will be managed solely through the industry-standard directives.

They will in the future, but not today.


Parent's article says

> Starting from version 6.6 of the Linux kernel, [CFS] was replaced by the EEVDF scheduler.[citation needed]


It’s <= a Radeon 7600 GPU (28 CUs RDNA3 vs 32), so I’m not sure I’d have advertised it as a 4k60 machine. Then again I’m not a marketer so what do I know. 4k60 is a flexible target with FSR I suppose.


What coordinate in the space is furthest from any named color? It looks like there are some relatively large voids in the blue/purple boundary area but it’s hard to say.


perceptual distance is quite different from Euclidean distance in this RGB space. Like if put two swaths of color side by side and said “how similar are these?” to samples of people, the groupings would not much resemble this cube.

They’ve done this! It’s shown on a “chromaticity diagram”, and is useful for comparing what colors different screens/printers/etc can reproduce. (It’s 2D not 3D cause it’s normalized for luminance or brightness.) Color science is weirdly fascinating:

https://en.wikipedia.org/wiki/Color_space?wprov=sfti1#


Here's the list of colors it works off of: https://github.com/meodai/color-names/blob/main/src/colornam...

I'm trying to figure it out.


For Euclidean distance it seems to be in the neighborhood of (59, 250, 60) which is a bright green, although of course Euclidean distance is not perceptual distance. The blue at (57, 42, 214) also is up there.


oh Id love to add this to the tooling of the color names list. How did you figure out what the largest gap was?


Pick points at random, then use a general-purpose optimization method (the optim function in R) to find local maxima. I don’t claim this is a good way to do it.


You can choose other color spaces here which is neat and helps visualize this a bit more accurately.


I think I’d buy something with Strix Halo or Strix Point if there was official ROCm support. As of 6.4.1 from earlier this month there’s still not, as I understand it. I’d be delighted to be corrected on this matter.


There is (unofficial) ROCm support for Strix Halo with ROCm 6.4.1. But like Llama.cpp and such were seg faulting but ROCR-based OpenCL was working and other workloads.

ROCm GPU Compute Performance With AMD Ryzen AI MAX+ "Strix Halo": https://www.phoronix.com/review/amd-strix-halo-rocm-benchmar...


By the way thanks for working on this! I read all of your reviews on this device and it's been very informative.


Does it work with RustiCL?


I haven't gotten around to trying it but it's on my TODO list if having the time before needing to send the review unit back (likely next week or so I'd expect)


What’s AMD's strategy in not having consumer chip support for ROCm? It's puzzling. No way to get critical mass of development interest if the bar to entry is high.


Their plan appears to be TheRock [0]. Further open sourcing and engaging the community and leveraging that to expand support faster.

There are some recent discussions on YouTube about it [1], including one with a senior VP [2].

[0]: https://github.com/ROCm/TheRock [1]: https://www.youtube.com/watch?v=6tASUo7UqNw&t=4551 [2]: https://www.youtube.com/watch?v=0B8JOtS2Tew


I think that part of the issue is the split between CDNA for data centers [1] and RDNA for consumer products [2] with AMD only having the money to focus on the bigger data center market. There are rumors that both architectures will be merged into UDNA in the future, which will hopefully improve ROCm support, but for now it's lacking

[1] https://www.amd.com/en/technologies/cdna.html [2] https://www.amd.com/en/technologies/rdna.html


> There are rumors that both architectures will be merged into UDNA in the future

It's not rumor. It came straight from an executive: https://www.tomshardware.com/pc-components/cpus/amd-announce...


The strategy (seems to be) targeting data centres and focusing support efforts on the cards most likely to be used in one. There is an expectation that ROCm will work on pretty much everything but their drivers aren't good so in practice it is dicey whether it actually does.


You can run theoretically run ollama on it, as with the earlier APUs (I did it on an M780 by allocating 16 of the machine's 32GB to the iGPU). I am _very_ interested in getting my hands on one because I see it as a decent compromise between power, RAM capacity (with soldered-on RAM, it's got pretty good latency) and performance.


The article goes on to say what the author thinks is bad about this:

> We’re not raising emotionally intelligent kids. We’re raising kids to navigate human unpredictability as if it’s a design flaw. Because when you grow up with a machine that always gets you, messy human behavior feels broken. We’re not preparing kids to handle people.

I don’t think there’s anything wrong with escaping into fantasy in the right time and place, but young kids (and even well-adjusted adults) can have problems self-moderating and letting fantasy substitute for engaging with reality.


I compiled it for Ampere and counted 6834 actual F32 operations in the SASS after optimizations. I only counted FFMA, FADD, FMUL, FMNMX, and MUFU.RSQ after eyeballing the SASS code, so there might even be more. It's possible the FMNMX doesn't actually take a FLOP since you can do f32 max as an integer operation, and perhaps MUFU.RSQ doesn't either, but even if you only count FFMA, FADD, and FMUL there are still 3685 ops.

  nvcc -arch=sm_86 prospero.cu -o prospero
  cuobjdump -sass prospero | grep -E 'FFMA|FADD|FMUL|FMNMX|MUFU\.RSQ' | wc -l


I basically have an even simpler version of something like this for my own personal use too. I found it pretty easy to write in Go and my area of expertise is decidedly not web frontend/backend. I’d recommend it as a fun little project if you’re looking for something to do.

For mine, I paste in a video or playlist URL and it downloads the video and creates a lower resolution transcoded version suitable for streaming to my phone. It also extracts an audio-only version in case that’s more appropriate.


I have one too, it's honestly a very fun area to program around, and I'm not going to be surprised if this thread is full of me-toos.

Mine is specifically meant to help get videos onto plex in exactly the way we want - with particular emphasis on playlists, taking the numbering and putting it in plex format, and transcoding any codecs (detected via ffprobe) i know certain shitty players (smart TVs) will have issues with. Along with putting it in the right spot on the filesystem with the right permissions and user+group set so it serves correctly over samba too (for management from windows / via GUI).


Plex is the destination for my setup, too. I have a bookmarklet I can click when I'm on any Youtube (or other video) page that sends the URL to a local Flask app that's just a wrapper for calling yt-dlp with the right args and post-processing.


I have something similar as a simple PHP script on a shared hosting service. I can't PHP well anymore so it's probably the worst and most insecure code I've produced by a big margin. Does it do the job? Yes.


Have a repo you can share?


No unfortunately, not only is it too tangled (not irredeemably, but I've never made an effort t try to make it cleanly ploppable) with the rest of my home-rails-server monolith, but the code is all also ridiculously bad, written in 2000 separate 5 minute scraps of time, all while standing up and holding at least one baby.

I call it "dadware".


Interesting! How do you stream it to your phone? I imagine its on the local network?


The Ars Technica article:

https://arstechnica.com/tech-policy/2025/02/youtube-briefly-...

(PS: Ars Technica is a bit sluggish for me this evening. Not sure why.)


Since you asked for “all the feedback,” there’s a typo on your landing page:

“The Meha API utilizes it's home-grown” -> “its”

Also, I got a relay access denied error when I tried to email you at info@meha.ai


Awesome we just fixed these issues thanks for letting us know.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: