ceencee's comments

ceencee · on Dec 31, 2023

I feel like this is frankly uninformed. Many iceberg shops seem to rely heavily on Spark as a primary engine. And databricks has a history of being a hostile oss force with the culture of the spark project being toxic from the start and delta’s questionable commitment to being a community project.

ceencee · on Sept 19, 2023

Kafka can sustain sub 20ms at millions or even billions per second scale. Processing time delays is bad consumer code and partition design smell. Aka , your consumer shouldnt depend on a slower resource within an ordering domain. This can also be mitigated with an async consumer

ceencee · on Sept 19, 2023

This is a ridiculous statement if you really build an EDA. Kafka is what enables the decoupling.

ceencee · on Sept 19, 2023

Pretty much every bank uses kafka ad the central messaging layer. What people are missing in almost every post here is the write once read many without data duplication and with different offsets is the killer app for Kafka beyond just the near infinite scale which is also super appealing. The failure modes are way way better than Rabbit as well. Note: I owned the streaming platform for a top 5 bank in the us.

leetbulb · on Sept 20, 2023

Yeah, I'm sorry to others, but if you require the guarantees and compliance that Kafka provides, Kafka wins, especially at this kind of scale. I'd love to see RabbitMQ scaled out to handle hundreds of trillions of events per day and able to retain years worth of highly durable, immutable, and replayable event storage.

Ultimately, this comparison is apples vs oranges...

ceencee · on May 15, 2023

Who is running single az deployments who also cares about data loss and availability? Seriously? I’ve personally supported 1000s of kafka deploys and this isn’t a thing in the cloud at least. There is no call for wanting fsync per message, it is an anti pattern and isn’t done because it isn’t necessary. Data loss in kafka isn't a real problem that hurts real world users at all.

jakewins · on May 15, 2023

I was grabbing beer with a buddy who has ran some large - petabytes per month - Kafka deployments, and his experience was very much that Kafka will lose acked writes if not very carefully configured. He had direct experiences with data loss from JVM GC creating terrible flaky cluster conditions and, separately, from running out of disk on one cluster machine

dilyevsky · on May 15, 2023

> There is no call for wanting fsync per message, it is an anti pattern and isn’t done because it isn’t necessary

1. Don't have to do it by message

2. It's used by many distributed db engines, kafka and (i think) zk are the outliers here, not the other way around

eternalban · on May 15, 2023

Kafka is not a "db engine". zk is a "db engine" in the same way 'DNS' is a "db engine".

ahachete · on May 15, 2023

Oh, DNS is definitely a database engine [1] ;)

[1]: https://dyna53.io

dilyevsky · on May 16, 2023

Ah yes, the semantic argument. Fyi - pulsar and etcd do use fsync

eternalban · on May 16, 2023

No one is arguing with you. You were making an argument based on a misinformed software category assertion and the error was pointed out. So r/fyi/til maybe?

nemothekid · on May 15, 2023

I can't list names about the "unserious" people who aren't running multi-AZ, but this is the approach to durability that MongoDB took ~15 years ago and they have never lived it down.

It may just be that data reliability isn't a huge concern for messaging queues, so it's less of an issue, but pretending the risk isn't there doesn't help anyone.

ceencee · on Oct 18, 2021

I see posts like this a lot, and it makes me wonder what the heck you were using Kafka for that Postgres could handle, yet you had dozens of clusters? I question if you actually ever used Kafka or just operated it? Sure anyone can follow the "build a queue on a database pattern" but it falls over at the throughputs that justify Kafka. If you have a bunch of trivial 10tps workloads, of course a distributed system is overkill.

Cederfjard · on Oct 18, 2021

They didn’t say that the Kafka clusters they personally ran could have been handled with Postgres instead.

They first gave their credentials by mentioning their experience.

Then they basically said ”given what I know about Kafka, with my experience, I require other people who ask for it to show me that they really need it before I accommodate them - often a beefy Postgres is enough”.

anotherhue · on Oct 18, 2021

A major ecommerce site. We had hundreds of thousands of messages/s but for most use cases YAGNI.