Hacker Newsnew | past | comments | ask | show | jobs | submit | ceencee's commentslogin

I feel like this is frankly uninformed. Many iceberg shops seem to rely heavily on Spark as a primary engine. And databricks has a history of being a hostile oss force with the culture of the spark project being toxic from the start and delta’s questionable commitment to being a community project.


Kafka can sustain sub 20ms at millions or even billions per second scale. Processing time delays is bad consumer code and partition design smell. Aka , your consumer shouldnt depend on a slower resource within an ordering domain. This can also be mitigated with an async consumer


This is a ridiculous statement if you really build an EDA. Kafka is what enables the decoupling.


Pretty much every bank uses kafka ad the central messaging layer. What people are missing in almost every post here is the write once read many without data duplication and with different offsets is the killer app for Kafka beyond just the near infinite scale which is also super appealing. The failure modes are way way better than Rabbit as well. Note: I owned the streaming platform for a top 5 bank in the us.


Yeah, I'm sorry to others, but if you require the guarantees and compliance that Kafka provides, Kafka wins, especially at this kind of scale. I'd love to see RabbitMQ scaled out to handle hundreds of trillions of events per day and able to retain years worth of highly durable, immutable, and replayable event storage.

Ultimately, this comparison is apples vs oranges...


Who is running single az deployments who also cares about data loss and availability? Seriously? I’ve personally supported 1000s of kafka deploys and this isn’t a thing in the cloud at least. There is no call for wanting fsync per message, it is an anti pattern and isn’t done because it isn’t necessary. Data loss in kafka isn't a real problem that hurts real world users at all.


I was grabbing beer with a buddy who has ran some large - petabytes per month - Kafka deployments, and his experience was very much that Kafka will lose acked writes if not very carefully configured. He had direct experiences with data loss from JVM GC creating terrible flaky cluster conditions and, separately, from running out of disk on one cluster machine


> There is no call for wanting fsync per message, it is an anti pattern and isn’t done because it isn’t necessary

1. Don't have to do it by message

2. It's used by many distributed db engines, kafka and (i think) zk are the outliers here, not the other way around


Kafka is not a "db engine". zk is a "db engine" in the same way 'DNS' is a "db engine".


Oh, DNS is definitely a database engine [1] ;)

[1]: https://dyna53.io


Ah yes, the semantic argument. Fyi - pulsar and etcd do use fsync


No one is arguing with you. You were making an argument based on a misinformed software category assertion and the error was pointed out. So r/fyi/til maybe?


I can't list names about the "unserious" people who aren't running multi-AZ, but this is the approach to durability that MongoDB took ~15 years ago and they have never lived it down.

It may just be that data reliability isn't a huge concern for messaging queues, so it's less of an issue, but pretending the risk isn't there doesn't help anyone.


I see posts like this a lot, and it makes me wonder what the heck you were using Kafka for that Postgres could handle, yet you had dozens of clusters? I question if you actually ever used Kafka or just operated it? Sure anyone can follow the "build a queue on a database pattern" but it falls over at the throughputs that justify Kafka. If you have a bunch of trivial 10tps workloads, of course a distributed system is overkill.


They didn’t say that the Kafka clusters they personally ran could have been handled with Postgres instead.

They first gave their credentials by mentioning their experience.

Then they basically said ”given what I know about Kafka, with my experience, I require other people who ask for it to show me that they really need it before I accommodate them - often a beefy Postgres is enough”.


A major ecommerce site. We had hundreds of thousands of messages/s but for most use cases YAGNI.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: