Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Never use MongoDB (2013) (sarahmei.com)
131 points by mikecarlton on Feb 1, 2021 | hide | past | favorite | 141 comments


This article keeps coming up every once in a while and reminds me of all those “Why JS sucks”, “Never use PHP”, “Java is enterprise only”, “Ruby only works on hobby projects” etc...

But then in real life people built great software with all the above, so I’ll just say a great classic: pick something you know, use it well, build something good, end of story!

No tool will fix wrong assumptions or bad design, we can dive into philosophy here but I’m more of a practical person so... :)


This advice is fine for programming languages, but not for data stores. Using them incorrectly (or in MongoDB's case and sometimes MySQL's, correctly) could lead to things like data loss or crashed servers. Simply building your product on one of them is not proof they are good enough.


No, there are programming languages that are just plain bad to use, his philosophy is wrong not only for datastores but for reality in general.

All tools, including programming languages can be bad no matter the skill of the user.


Glad to see that you actually know what is right ;)

All tools can be bad, but that means that probably you cannot use it well or build something good with it. Every good enough tool can be used to build something good enough, or else it wouldn’t be good enough. And it’s also realistic to assume that every tools popular enough is good enough for something.

But you see, it’s starting to get philosophical over here


You can create the grand canyon with an infinite supply of plastic spatulas. Does that mean a plastic spatula is good tool for creating grand canyons? Of course not.


Found the person that doesn't like BrainF*ck. /s


With Excel as a data store!


> “Why JS sucks”, “Never use PHP”, “Java is enterprise only”, “Ruby only works on hobby projects”

All of these have at least a little bit of truth to them, and you should know about the downsides of technologies, even if you decide to use them anyways.


I would argue that a lot of weaker elements in the stack (e.g. PHP) work well because of a stronger database.

A good database system does a lot to catch errors (including consistency problems), isolate them, and roll them back. Moreover, it will allow you to move performance problems into the database, where it's likely to be handled more efficiently with less application code (e.g. joining in the database is likely to be more efficient than a naive join algorithm implemented in the application).

Some will argue that these are misfeatures and should be handled in the application. In some cases, that is true; but you are probably going to need some other aspects of the stack to be very robust and performant to get reasonable results.

In other words (please excuse my examples as they are intended for illustration and not flamebait), PHP over Posgres might be fine; Haskell over MongoDB might be fine; but PHP over MongoDB is playing with fire.

I'd still say that, in most cases, the database layer is the first place to start to work toward a robust system. Even a proven-correct Haskell program can fail miserably if there was a minor bug three versions ago that wrote some bogus data that wasn't caught by a good database layer.


The only practical application of MongoDB I can justify to this day, is if you have a form builder, that allows users to build completely custom forms. Forms.io gives you the form schema in such a scenario as JSON out of the box. That gets matched with answers as JSON.

Saving this to MongoDB directly, rather than SQL seems to simplify things.

With anything else, at the end of the day, you are enforcing FK constraints anyway, so might as well use SQL.

I never had issues with MongoDB performance.

One caveat to this is that I am yet to see a project that needs database sharding in real life, and I have worked on projects with millions of entries in a table and hundreds of writes a minute.


Most relational databases have some level of json support nowadays. So I'm not sure how much it simplifies in that case.


There is appeal to the json being directly searchable in the database, ala MongoDB.

Specifically for analytics/reporting.

The problem is that most analytics/reporting/BI tools SAY they support MongoDB and then they tell you to write a "connector" for each entity, at which point it's easier to just move it to SQL.


It really depends. Things can be built despite of the technology. I saw this especially at an employer who was heavily invested in golang. Countless times I've thought to myself that they wouldn't be having the issues they were having, sunken costs, reinventing the wheel, etc. if they used a proven technology like the JVM instead of drinking the kool aid and using the latest fad of the day. Tons of money was sunken into it, and it was made to work by force, but not everyone would be able to bear that cost, and it still causes massive inefficiencies due to poor tooling.


It would be more useful if you addressed the specific issues brought up in the article, rather than generically tried to dismiss all articles critical of any programming language...


In each of those language examples, there are examples of great software that migrated away from those languages for different reasons.


Your post implies that there is no tool on earth that "sucks" and that it's not the tool, it's the person.

It's impossible for EVERY tool to be good. This isn't reality. There has to be tools that are patently bad to use and people have used these bad tools to build great things. But it doesn't change the fact that a tool can be horrible to use.

I would argue that at the time the article was written, Mongo was definitively a bad tool. Things have changed, but not all things.


Precisely. It is extremely crucial to discuss shortcomings and criticize tools otherwise how are we going to improve if we all roll along singing the tunes of each other and never questioning anything?

This kind of attitude and softness towards criticism - "All tools are great" is not how we need to operate. Professional, well articulated and constructive criticism needs to be on the table.

I downvoted the GP for this reason.

I advise everyone here to listen to criticisms and write them as well. Don't be afraid of some kind of a backlash, express freely.


Well I guess we are discussing different angles of this matter, I actually didn't thought that my comment would go to the top since it was supposed to be a random naive statement.

I'm not about making things that simple by default, but I don't like those absolutist titles like "Never use MongoDB", also because the years since 2013 actually proved the article to be kind of wrong.

"Never do X" is what I tell to children about matters that they wouldn't be able to understand, and making a point like "we used an immature tool that wasn't the best choice for what we were building, and on top of that we used it wrong, so you random guy should never use it for anything, ever" sounds like fearmongering to me and it's not something suitable to my taste.

But I get your downvote :+1:


Criticism must have context so your statement was indeed a bit naive and too generic.

In the case of MongoDB, many people thought that with it you'll have all of the benefits of, say, PostgreSQL, without having to think much about your data schema. History has shown that this is not the case -- as usual, it's about tradeoffs. There are no absolute wins: if you want an RDBMS, you have to put more work in X, if you want a document store then you have to try really hard with Y.

"MongoDB vs. RDBMS" is a very old and tired argument by now but it basically boils down to: people started using MongoDB wide-eyed, optimistically and with more enthusiasm than engineering skill and of course, there were harsh reality checks.


> people started using MongoDB wide-eyed, optimistically and with more enthusiasm than engineering skill and of course, there were harsh reality checks

It should be noted that VCs also burnt a lot of marketing money to deliberately create that blind enthusiasm and optimism.


Each one of us has his or her own way of expressing opinions, from my point of view there is more than just those three lines of text in what I wrote, some people got it, other people didn't, and that's what I find great about HN comments.

I really appreciate that people took the time to express their opinions starting from a simple statement like the one I wrote, and I don't think that each and every comment needs to be a complete analysis of every possible nuance of the subject we are commenting about.

I'm not a monolithic person, I prefer a conversational approach when discussing opinions, but that's just a matter of personal taste.

Anyway, to make my point I'll use as an example two projects I was personally involved and still bring food on the table: NodeJS + MongoDB (started 7 years ago, still on it); MariaDB + PHP + FORTRAN (started 4 years ago, after the first two years only consulting from time to time). Fun fact: right now I could swap one DB for the other and I wouldn't mind the difference, maybe some minor tricks here and there.

You can get burned with MongoDB (totally inconsistent data model? Well, that sucks bad), you can get burned with PostgreSQL or other SQL databases (thousand line long business critical stored procedures starts getting inefficient? Self join time explodes after some time in production? Ouch...)

The main point I wanted to make is what jeff-davis (your sibling) summarized for me with words way better than my own: "If you are building something and excited, then keep going, don't stop because a blog told you 'never'"

Too often I read blog posts like the one we are commenting when I was starting with software development as a job, they scared the shit out of me! I didn't have any experience on the tools so reading that some much more experienced guy working on some Silicon Valley billion dollar corporation was saying "This is shit: you touch it, you die" didn't bring anything useful to me, just stress...

Over the years I've heard everything and its contrary about... anything! There is normal people here on HN, there are young devs starting right now that read those comments, not everyone is building mission critical rocket firmware, billion users social networks or AI/ML/Blockchain fanta-finance trading bots (not that you are implying anything like that, far from it) but my advice after some years in the business is what I wrote: know your tools, get to like them, get proficient, use them well and try to build something good... your personal experience, be it success or failure, is worth a thousand times more than the success or failure narrated on a random guy/girl blog from N years ago


> The main point I wanted to make is what jeff-davis (your sibling) summarized for me with words way better than my own: "If you are building something and excited, then keep going, don't stop because a blog told you 'never'"

This is a weird perspective. Most of us work for money and other people are calling the shots on what's in priority to develop right now. Your statement reads like you're commenting about hobby projects?

> Too often I read blog posts like the one we are commenting when I was starting with software development as a job, they scared the shit out of me!

Not to be reductionist, just trying to understand: your point of view boils down to "I skipped a good tool because I bought a fear-mongering tech blog article", is that correct?

If so, obviously one has to form their own opinions as they gain experience. But I wouldn't feel that strongly about those articles. My takeaways are: (a) critical thinking is important and (b) you can't ban people from posting. ¯\_(ツ)_/¯

> I didn't have any experience on the tools so reading that some much more experienced guy working on some Silicon Valley billion dollar corporation was saying "This is shit: you touch it, you die" didn't bring anything useful to me, just stress...

Not sure what point you're making here. Our job is riddled with stress in general (even if you only take this as an entry point: the brain hates learning and wants us to stay still by default). Confronting that reality means exiting software development for many.

> your personal experience, be it success or failure, is worth a thousand times more than the success or failure narrated on a random guy/girl blog from N years ago

That is absolutely true. But we should recognize filter bubbles regardless. I for example only heard one successful MongoDB story and the team switched to another DB 3 years later.

---

If your general takeaway is: "use whatever you want" then sure, nobody can dispute that since we live in a free world. However:

(a) When comparing tools for certain jobs, there are objectively worse tools compared to others. This reality should not be avoided, or denied, or brushed aside with generic statements.

(b) "Just use whatever you like" is not a sound career advice. Some tools will give you better career opportunities, others will make people frown at you for being the bleeding edge adopter but can give you a competitive advantage, and some will just burn you.

It's not about if you can achieve X with any technology. For the most part, you can do pretty much anything with everything. Still doesn't make it viable or the solution with the least friction and most reward though.

I felt your comments were too generic, too optimistic, and rather uninformative. Hence my responses.


I could go point by point on the first part of your reply, but there is something underlying your view that makes me feel we come from places that work a little different. I don't know where you're from, I'm from Italy so let me put down some points:

- here almost every single company is incredibly small (average is 4 employees)

- we have lots of companies (66 every 1000 adult citizen)

- IT panorama here is like 15 years behind the US

- after university or technical school you know almost nothing, just some basic abstract notions

So you have to learn every single thing by yourself, and if you want to work with today technology you will not learn it at $WORKPLACE because your boss probably is a 60 years old guy who wrote COBOL programs but want you to build "something like Facebook in the next two weeks".

The only window we have on today technology is on the internet, many tools that come cheap somewhere else are quite expensive for us and you must be lucky to know that one guy that can give you some hints.

In my case I've studied in technical school and then did IT engineering at the local university, which I dropped with a couple of exams to go after I had to argue with a professor that solved it's own exercise in a broken way and couldn't understand why my solution worked. In the last 15 years working, until now, I didn't know anyone here in my hometown who can work with anything outside some legacy ASP stuff, basic PHP on CMSs like Wordpress or some basic Java applications (mostly Android), DBs are always MySQL. Just a friend of mine who uses QT, which he learnt by himself and after changing three jobs now he is finally in a dev team, but on the previous three he was the only dev in the company. (I know other people that work with "today's" tech but they are not from my province)

I don't know if I can communicate my point, but reading that $PERSON who works at $COMPANY says that $PRODUCT is shit because he had problems scaling to 100mln users is a totally useless point for all the people outside the few tech bubbles around the world.

It's not about "everything is good", I never said that everything is good, but in the real world™ people built great things using almost any tech, maybe even "uncool" tech: it worked, they made money, they were successful, their company served customers for years and years, and maybe the chosen tech had a positive role in the process. My point is about people that say "Never", "Always", "This is broken", "You'll get burned", "It doesn't work"... that's the part that is not constructive: it didn't work for you, for your case, maybe you did some mistakes, but $PERSON must stick to his/her own reality, not try to extend their experience to everyone else.


> I don't know if I can communicate my point, but reading that $PERSON who works at $COMPANY says that $PRODUCT is shit because he had problems scaling to 100mln users is a totally useless point for all the people outside the few tech bubbles around the world.

Can relate to this a lot and I agree. US / Silicon Valley tech blog articles aren't always informative and one has to apply a lot of critical thinking if they want to extract any value from them. I am with you here.

> It's not about "everything is good", I never said that everything is good, but in the real world™ people built great things using almost any tech, maybe even "uncool" tech: it worked, they made money, they were successful, their company served customers for years and years

I would immediately agree with you if the result-oriented people actually made good money [almost] always regardless of technology. But it isn't universally the case, so very often you have programmers who know modern (and very good) tools but are required to use ancient ones because "they just work".

Truth is that no, very often they don't just work; they "just work" due to the suffering of countless underpaid programmers who can't say "no" because they are afraid for their family's livelihoods. Let's not conflate concepts. ;)

I've seen people replace legacy PHP + MySQL + ssh/rsync scripts for deployment with a managed service like Heroku (or Docker/k8s, or bare metal on two instances) and a more modern programming language -- hosting bills dropped by anywhere from 50% to 800% and like 3/4 of the IT personnel became superfluous (and were fired in a few months). There is a lot of efficiency that's routinely left on the table and is never reached for.

It's a nuanced discussion. Generalizations like yours and like mine can't nail it. I am 100% with you that not everything has to be "new" and "cool": absolutely! But this same argument is also used to drag down or even stop measurable progress, productivity gains and expense savings.

There's a balance to be struck and both programmers and businessmen have too extreme points of view -- on the both sides of the spectrum -- and don't achieve it.

> "Never", "Always", "This is broken", "You'll get burned", "It doesn't work"... that's the part that is not constructive: it didn't work for you, for your case, maybe you did some mistakes, but $PERSON must stick to his/her own reality, not try to extend their experience to everyone else.

Agreed in principle, but not in your particular example. MongoDB is supposed to actually store data. For several years in a row it was failing even in that. You can't claim you are a database and lose people's data. So I definitely would side with that author's viewpoint that "MongoDB sucks" because yes, it does, or at least it did for years in the past and nowadays I am too burned by it to try it out again.

> I don't know where you're from, I'm from Italy so let me put down some points:

RE: the tech landscape of the local market, I am from (and in) Bulgaria and it's almost the same here. I was getting Ruby on Rails offers in 2020 and it was touted as a "bleeding edge modern technology". Those recruiters gave me good chuckles.

In the meantime I am working with Elixir for 4.5 years and Rust for 2.5 years now...


I agree that absolute statements like "never" obscure useful discussion.

I would reword your comment as: "If you are building something and excited, then keep going, don't stop because a blog told you 'never'". I think that's what your main point was, and it's a good one.


Indeed, never do X is pretty strong and needs equally strong arguments to back it up. But it can be true. There are tools that are superseded by better ones.

Equally, it’s important to also tone down “X is nothing but the best” and praises should also require equal and opposite constructivism.


I hate the logic that because people have successfully used something, it is good. Having used MongoDB extensively, it is absolutely true that MongoDB is powerful and useful. However, it is also true that I would never choose it over something else for a new project. It has likely improved since 2013, but so has everything else.

Certainly I could endeavor to build amazing modern software in C, but unless for some reason C is an absolute must, I would rather try any other language first. I doubt this is a controversial statement, and yet someone will always be there to defend the opposite stance.


Now you’re just repeating the same fallacy by assuming all these tools have improved at the same rate.


> It's impossible for EVERY tool to be good.

What does 'good' mean? Gcc is a good C compiler and a bad Java compiler. It's even worse at being a document database.

I don't think 01acheru was saying that all tools are equally able to do all tasks. I read them as saying that people have used tools with recognized flaws to make good stuff and that being snobby about whether tools are "good" or "bad" in a general way isn't super useful for anyone. Instead, we should say specific things about specific flaws and let others decide if those flaws matter to them.

In particular, this post isn't really saying that MongoDB doesn't work, it's saying that the MongoDB data model isn't useful for what the author was using it for. Even if you are sure that your app was "the perfect use case for MongoDB" all you can really speak about is your use case. The real headline for this article is "we couldn't make MongoDB work and we're skeptical anyone can," which is totally fair, but shys away from the grand claims that 01acheru (and I) are critiquing.


Here are some quotes from Mongo's start page

> No database makes you more productive

> The most popular database for modern apps

What we have here is not really something intended for a specialist usecase, as implied by your comment. Mongo is clearly pushed as being the best database, used by most "modern" apps.


Could you explain how either snippet you quoted translates, to you, into MongoDB calling themselves "the best database"?

For me, they appear to claim to be popular and focused on developer productivity. Those quotes don't seem to show them claiming to be "the best". Like...it's pretty standard "put your best foot forward" sort of high level description and I think it's a bit much to say that they're making some universal claim?


I'm responding to the defense given in the parent comment, starting with

> Gcc is a good C compiler and a bad Java compiler

Meaning, some tools are meant for a specific purpose and it's not fair to judge them if they do poorly at things they were not designed to do. GCC does not claim to be a java compiler (any longer [1]).

I don't think this apology applies to MongoDB. It is described as a "general purpose" database, good for most modern apps. This is asserting exactly the opposite of what was argued in the parent comment. It is then fair to expect that it should do a good job of representing basic data like graph relationships.

[1] https://en.m.wikipedia.org/wiki/GNU_Compiler_for_Java


I didn't think saying GCC is a C compiler was saying it was a 'specialized' tool. It's a tool that compiles C code. It'll compile any valid C program on any supported architecture, but it's not trying to be better at any particular kind of C program. A specialist C compiler might target embedded systems[1], or it might compile C to some unusual intermediate representation, etc.

Once we start talking about C compilers (or general databases), I wanted to express the idea that terms like 'good' or 'bad' are too board to be useful. The only way MongoDB would be a 'bad' database is if it didn't do what it said it did (which seemed to be true for a long time[2]). 'Bad' only makes sense, to me, as a synonym for 'broken.'

Instead, tools that perform the game general function tend to focus on different aspects of that function. MongoDB focuses on 'productivity' (whatever that means). My impression was, for a time, GCC focused on overall performance while Clang focused on IR introspection through LLVM[3].

I think the article OP links is valid criticism of MongoDB. They're very clear that Mongo didn't work for their use case and I'm sure they're correct. I just think they go too far in saying "Never use MongoDB" and that what they're really saying is something along the lines of, "we couldn't make MongoDB work and we're skeptical anyone can."

P.s. I don't think GCC ever claimed to be a Java compiler - your link is to GCJ, a different program also made by GNU.

[1] Such as SDCC: https://en.wikipedia.org/wiki/Small_Device_C_Compiler

[2] https://stackoverflow.com/questions/10560834/to-what-extent-...

[3] I don't think this is true anymore? I don't write a lot of C.


> Your post implies that there is no tool on earth that "sucks" and that it's not the tool, it's the person.

I got more of a "stop complaining about tools. Pick one you like, use it and build something with IT instead. I feel the same way. Developers are a fickle bunch. One tool works, but then in order to be "cool" you have to bag on it and then propose some other obscure tool you think is better that nobody has ever heard of.

It seriously reminds me of people arguing over music. Its totally uncool to like a mainstream band because everybody else likes that band and its not cool to like them. So then you have all the "cool" people who listen to all the obscure "awesome" bands who Rolling Stone magazine tells you to listen to, so then you go around telling people you listen to the Shithouse Rats. "Oh you've never heard of the Shithouse Rats? Well, they're kind of obscure." and now you're one of the of the cool kids.


I feel there can be a fair reason behind this, which is basically that using a tool no one else uses, even if it’s great for the use case, is likely to be a losing battle.

Hard to get using it approved at work, less usage means fewer bugs are caught and features developed, etc.

So people evangelize their favorite tools because it benefits them directly to have them adopted more widely.


This is usually true.

But every once in a while you have a case like WhatsApp, which sold to Facebook at a price of $500 million per engineer, which never could have happened without Erlang.


Yeah that was my point, 100%!

And by the way I really like you music analogy, and it’s emblematic of something even larger: they read about Shithouse Rats on Rolling Stone, named after a song of Bob Dylan or Muddy Waters or the group itself (don’t know which one) and all of them are quite famous and mainstream.

You’ve got to love mankind, we are awesome!


Music implies sort of an everything is just an opinion thing so your analogy does not fit.

Think of it like horse drawn wagon vs. a car. There might exist a guy who in his humble opinion thinks the wagon is better so he use it to get places instead of a car but is that guy a reasonable guy? No.

The analogy I mentioned above is more apt because Mongo was indeed at one point in time more of a wagon rather than an unpopular piece of music.


It's a bit culturally insensitive to suggest that all Amish people are unreasonable.


https://www.typeinvestigations.org/investigation/2020/01/14/...

And it's a bit insensitive to suggest that you support incest and child rape. Or am I reading to much into your post?

Let's be honest here. You aren't suggesting that you support raping children anymore then I am suggesting the Amish are unreasonable.

This cancel culture attitude of constantly calling out and classifying everything as some sort of infraction against a culture or a race has got to stop.


> This cancel culture attitude of constantly calling out and classifying everything as some sort of infraction against a culture or a race has got to stop.

I agree. Unfortunately, sarcasm doesn't convey well in text.


> at the time the article was written, Mongo was definitively a bad tool

I concur. Version 3.0 is dated March 3, 2015 that uses the WiredTiger engine which fixes much of the brokenness.

I did some workaround work on a MongoDB v2.x app. It did suck and was inconvenient operationally, but it also did scale so had its uses.

However the discussion now should be about how it is to be used/not today and not back in 2013. So fair to say it did suck or you shouldn't have used it, but that doesn't have much relevance.


They never said that every tool is good. They didn't even imply it.


"There has to be tools that are patently bad to use and people have used these bad tools to build great things."

Don't those tools die out and get forgotten?


No. Sticking to what is known beats innovation most of the time.

There is no correlation between something being widely used and it being good at its job.


>No tool will fix wrong assumptions or bad design, we can dive into philosophy here but I’m more of a practical person so... :)

And no design can fix a bad tool, we can dive into the practicalities here but I'm more of a philosophical person so...:)


Despite having used document oriented databases for many years(largely because they were shoved down my throat and I inherited someone else's architecture), I never really managed to figure out why people find them so compelling. There has been a shift in the last two years and people have started running away from them. Specifically the web-dev crowd adored them and I guess it's easy to fetch a document in the exact structure you need it but sooner or later you inevitably reach the point where you have to analyze data. And here mongo(and all the similar alternatives) become the biggest pain in the a...neck you can think of. Couchbase tried to tackle this issue with n1ql to a certain degree but at large scale it is still not particularly useful. To my mind, having a relational database which has a good architecture can't be matched by any document oriented database. But getting a large system/database right does take more effort. There are numerous ways to make relational databases incredibly scalable but again, it takes a lot more effort.


There was a time where adding a column to a database was a really big deal. You had to get it past the DBA, and there were real resource constraints on the database system. With a document store the schema is entirely in the hands of the developer.

Also JSON became the standard way to ship data around, and RDBMs systems of the time couldn't really handle JSON. So you either write a bunch of code to map complex nested JSON to relational tables, or just dump it into an un-indexible text column.

There was vendor hype, just like there was around Object databases in the pre-internet days.

If you were starting a new project you needed to decide if you were going to use a document store and an RDBMS or just on or the other. If it was just one you would choose a document store if you anticipated you would need to handle a lot of unstructured data.

Today the situation is revered. A document store only does documents well. A good hybrid database like postgres gives you the best of both worlds. Throw in hosted database services and resource constraints are much less of an issue. So people aren't running back to an old school RDBMS. They are moving to a much superior and evolved data store.


I fairly recently _really_ started to understand how important historical reasoning and understanding is in the context of software, technology and science. Your comment is a great example. Tech developments, choices, trends and so on only really make sense in the context of history. And often we forget about history, start to reinvent things or even steer into a completely useless direction because we don't apply temporal reasoning or simply don't learn from the past.

Another benefit of this kind of approach is starting to learn about a challenging subject. Say you want to deepen your knowledge in a branch of mathematics that you find interesting and useful. The history of that branch will tell you so much more than a typical lecture-style conglomerate of concepts. It provides a great overview of important actors, their relationships, cause and effect of discoveries, the culture, the problems and so on. On top of that it is easier to remember and internalize concepts if you know the story behind them.


That's only partially true. With document schemas, you simply eliminate the DBA since whatever you put in there is entirely up to you. In all fairness I've never dealt with DBAs - I've always managed to get a technological freedom and be able to design and organize my databases in whichever way I see fit. I'd generally hate to have to ask someone to clone a table for me or whatever.

JSON is the standard way to ship data around the internet, yes. Though grpc is catching up and more and more often I see people relying on grpc in their architecture. And grpc conceptually is a lot closer to RDBM, given that you have a code generation step and everything in your data needs to be defined(aka statically typed).

Recently I started several personal projects and though I struggle to find time and motivation to work on them on my own, document related databases are completely out of the question. postgre and potentially redis as a proxy for heavy loads and that's that. I wouldn't call postgres a hybrid database. It does support json datatypes natively but in it's core it is the definition of what RDBMs are. The best example for a hybrid database(from a developer's perspective since it isn't open source and I do not work for google in any shape or form) is spanner.


The DBA problem you descibing is not in database system, it's in DBA. You can have the same freedom with relational database. Or, for some reason, you can also put a DBA between you and Mongo, who won't give you change the schema of your JSONs (you do have that schema somewhere, just not managed by Mongo).

I've worked on plenty of both SQL and Mongo projects, and honestly the process around schema migration is pretty much the same. Just for Mongo you write it in the code instead of SQL.


> There was a time where adding a column to a database was a really big deal. You had to get it past the DBA, and there were real resource constraints on the database system. With a document store the schema is entirely in the hands of the developer.

That time is still here if you're running enough read nodes and QPS.


> why people find them so compelling

My theory is that it's easy to add a field by adding logic into the app instead of munging tables relationships. Moves the logic to where developers are more comfortable. Scalability/etc is irrelevant for most use cases anyway.


> Scalability/etc is irrelevant for most use cases anyway.

I literally can't parse what you mean by this


Most apps don’t get a lot of traffic


I'd guess 95% of what we use databases for; actual performance of the database is irrelevant.


It's amazing for three things: search, logging and draft records.

Search, with mongodb can do $all query, which is hard to replicate at sql level without aggregation. However I'm still waiting for aggregate-level $elemAt.

Logging, you can attach anything to a property, then it'll be queryable.

Draft records, it's easy to just insert and insert the records because it's schema-less. Validate during creation and validate again during publishing or approval. It's queryable and you can use a generic collection for that.

For logging and draft records, sql JSON field may be able to handle them, though I don't know how good it is at querying.


4 things, you missed _crazy_ fast analytics.


I have found myself enjoying using a document database as the online store, and then using a 'big data solution' (we use Presto) for any analytics queries later.

Traditional migrations for relational databases are really painful. Document databases make this much easier, and if you've faced the operational pain of needing to migrate a large database (for example, it's so easy to accidentally lock an entire table in Postgres), you might be pretty compelled.

(That said, I think the pendulum is swinging back away from document databases. So you're in luck ;))


How do you normalize your Json structure so it can be queried? Do you enforce a schema on your JSON or do you morph it on export into a common structure.

If you enforce a schema on the Json structure how do you handle the changes on the live system?


I think the parent is arguing that it's hard to do useful big-data analysis on highly nested structures of data. When your database is storing some immense JSON blob, it's hard to write SQL against it.


I have found 2 use cases, one of which I've never actually seen in the wild.

The most common use case is, "I need to store data where the schema is unknown or can change without notice, and have my shit not break." This is what we used Mongo for.

The other use case I could see (and this is pretty much only with Dynamo) is, "I want to build an application that's cross-region native. Most of my data is relatively static, so I accept eventual consistency on changes. I will have a separate data store for transactional data and data that cannot be eventually consistent." I want to build this project, but it will never happen because it's too easy to RDBMS in a single region to start.


> Despite having used document oriented databases for many years(largely because they were shoved down my throat and I inherited someone else's architecture), I never really managed to figure out why people find them so compelling.

Well, filesystems are pretty good. It's the only document store I use (and mostly enjoy).

But then you look at the trade-off with some think like just Maildir, and you really start to wonder if this schemaless document store thing is so great?

I suppose the real shame is that proper object dbs like zodb or gemstone gets much less attention - they to have big trade-offs - but I feel they at least give back in terms of consistency and simplicity.


> I never really managed to figure out why people find them so compelling.

This might sound jaded but my feeling is that a lot of developers just looked at JSON objects that they were already working with and thought to themselves "actually, it would be cool to just store this directly".

Which, in itself, isn't a bad idea but writing a completely new solution from scratch to a problem that's been solved for decades seems a bit like hubris.

AFAIK many relational databases support JSON today, so I'm not sure what the argument would be to choose something like MongoDB today from scratch if you had the choice of anything.


A half table oriented half document oriented solution is just odd. I’ve worked with JSONB in Postgres. There are many reasons to dislike mongo but “your can do that with PostgreSQL “ isn’t one of them.

Querying nested values is is nothing like Rethink or Mongo.

Keys could be rows or keys.

You’re still having to make one to many relationships for something that should just be an array.


We are using mongo specifically because it makes it easy to do analytics on large datasets quickly.


> On my laptop, PostgreSQL takes about a minute to get denormalized data for 12,000 episodes, while retrieval of the equivalent document by ID in MongoDB takes a fraction of a second.

What? Her database can't possibly be indexed properly.


Yeah, this sounds like a design defect. But since the author doesn't really describe what they did, it is hard to really figure it out. I'm guessing this is some sort of query with a self-join going on, where the mongo request is a basic fetch by id.


The use of the term “denormalized” suggests to me that it was a query that involved a lot of joins. Which is certainly something that could have been otherwise addressed with a different design.

Comparing fetching from a normalized design to a denormalized one isn’t really a fair comparison.


I fear the problem is that the person doesn't do joins but separate sub-queries which then get recombined in RAM in the client. Given the software stack described there is a realistic chance of this happening implicitly due to ORM mappers.

But then the way tables don't map well to 1-to-many mappings and joins still returning data in tables this can also be a problem. Especially if a large field get duplicated a lot. RDBMS really should go from 2d-Tables to proper nested types for results IMHO.


That would be nice, though I suspect that it is really much more complicated than it seems. You can emulate this to some extent in Postgres with the various JSON functions and essentially return a tree from a single query. But my experience was that I quickly got to a point where the query plan got really complex and planning time started to dominate.


Author is using Rails, so my guess would be the bottleneck is ActiveRecord. I've never used ActiveRecord, so I can't speak to it directly, but in my experience when dealing with large numbers of records in an ORM (and author easily is working with hundreds of thousands), things grind to a halt, even with eager loading. There's a lot more overhead to create thousands of ORM objects than it is to serialize an equivalent chunk of BSON.


Bulk operations are were a lot of rails programmers struggle.

Active record is awesome in many ways, but it can shoehorn you into n+1 solutions.


12k is not a large number of records though


12k episodes, each with cast list and reviews. That's easily hundreds of thousands of records, possibly millions.


Even without an index that sounds too long (though obviously hardware and Postgres itself both have come a long way since 2013). At 12,000 rows even a brute force query should be quick.

I would suspect something like bad statistics or some other reason that caused a pathological query plan. In any case this is not a good representation of any potential performance difference between Postgres and MongoDB.


I could see it taking that long if she was using a full movie database with millions of rows in it.


Her


Thanks, corrected


at most cases it could also be 1.) how the user writes the code, and 2.) how the db api / library was coded


I bet they're using some stupid ORM that makes 12000 separate queries.


I use MongoDB in my application. The approach I took is to store data in flat documents (relational style) and only de-normalize when necessary for performance. The relational model was invented for a reason -- it is flexible and it is easy to update data in one place and so on. The downside of relational is that joins will kill you when you have very large tables. To avoid joins, I use lookups when possible, and de-normalize only to the extend needed. I get the best of both worlds.


You can do all that with SQL too. And do you have any very large tables?

> joins will kill you when you have very large tables

nitpick, but table size doesn't directly matter (much). if your queries are very specific and only return a couple rows, then you can have huge tables and join across them without issue. Joins only get particularly painful if you're doing aggregation/reporting queries across large parts of it


> if your queries are very specific and only return a couple rows, then you can have huge tables and join across them without issue.

I agree, if you index your tables. Relational databases are very capable; when there's a performance problem, it's often due to simple things like failing to index what should have been indexed.

No tool is perfect for all use cases. There are cases where relational databases won't work. But when I try to store data, I first consider storing it in files, and if that is unpleasant, I consider relational databases. These are both relatively simple time-tested solutions, and it's usually good to start with simple & time-tested unless there's a reason it won't work well.


I can also use Lists, Sets, Maps in my data structure.

Doing the same in SQL requires a lot of intermediate tables.


SQL databases don't support easy schema evolution, sharding and so on.


PostgreSQL directly supports table sharding. https://pgdash.io/blog/postgres-11-sharding.html

Schema evolution is supported by pretty much every ORM you'd care to use; it's not the job of the SQL database to handle the migration. I'm using Prisma and you literally change the software spec for the schema and say "migrate", and it creates the migration SQL and applies it to the Postgres DB programmatically. That gets you deterministic schema evolution and not the "my schema isn't actually reliable" that NoSQL/no-schema databases rely on.

And then you have CockroachDB/YugabyteDB that give you extreme horizontal scalability, with full PostgreSQL compatibility...

And bang, the last reason to use MongoDB vanishes.


I think a missing piece is making materialized views fast since there will always be cases where someone wants to denormalize to get around performance issues, even with well designed indexes.


And Mongo is even worse. You can easily change document structures, but now you have inconsistent data over time.


I first read this article while working on a project where the company had basically written a RDBMS using MongoDB. It was so many different kinds of bad I lost count.


That's the problem with using document databases when you really need an RDBMs. You end up reimplementing an RDBMS, badly.


It's so painful to come onboard projects that should have been designed originally in an RDBM, but that was never a consideration because, "they are slow." People fight the migration tooth and nail until it becomes nearly impossible to move forward.


This article from 8 years ago highlights how far MongoDB has come: transactions, left outer joins ($lookup), etc.


How many years before it reaches feature parity with a traditional RDBMS though? And when will it get a query language as good as SQL?

That said, I will admit the change streams feature is amazing. That completely changed the way I thought about building reactive applications


The issue is that MongoDB isn't chasing a fixed target. RDBMS have gotten better in the mean time.


This article appears here pretty frequently: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


As a sysadmin that often gets the privilege of pretending to design the smallest of complex systems, one of my regular components has been CouchDB because of its built in http api.

I've never been anti-Mongo, but this one little piece has made CouchDB an affordable choice for people like me who are not equipped to otherwise defend the choice.

Is there a missing piece that could deal Mongo back in the next time I try to convince someone that there's as straight a path to the sysadmin solutions I generally compose?


mongo-express maybe?

https://github.com/mongo-express/mongo-express

But if couchDB works, it works. Personally I'd love to shoehorn LISP into everything I do, but most of the time I just use python and bash because things tend to get done faster when I do.


We've used MongoDB at our SaaS and have grown to well over $1m ARR and never had an issue.

Maybe if you're trying to build a massive ($B) company, starting with PostgreSQL makes more sense for you. For everything else, MongoDB works just fine.

Just use the technologies that you know and can move fastest with. Startups rarely succeed/fail because of which technologies you choose to use.


If your data store is nothing but a persistence layer for an application, perhaps that makes sense.

In many companies the need to regularly access the database for analytics and BI is a thing; this isn't limited to $B companies. Most of the tooling available works best with SQL databases. (Though the BI connector at https://docs.mongodb.com/bi-connector/current/ looks interesting for this purpose)


Mongo and other nosql databases are still the absolute fastest way to get started particularly you don't know what your data is eventually going to look like.


I disagree -- an RDBMS is pretty much safe/sane for any design, though it may not be the fastest. NoSQL databases (specifically, eventually consistent DBs) are only safe/sane in specific scenarios.

If you don't know what your needs are, you should always start with an RDBMS -- it's not that difficult to go "up" to a NoSQL db from there (you're only losing information, and if you can't safely move because of loss of ACID... you'd probably have been really fucked if you started off without it), but you can't easily migrate back "down" to the RDBMS -- a NoSQL database stores almost no information about your data or its constraints.

And your application will almost always want transactional guarantees and to model relationships properly -- generally only small chunks of the design (design-wise; data-wise it might be 90% of the app) can be treated with eventual consistency and have real scaling needs, which you can shift over to your nosql system.

Apps are generally just metadata tracking with a dash of real work.


as a UX designer / product engineer, it's easier for me to jump into a nosql than to wire up a postgres instance for a small proof of concept project that I know won't really go anywhere


Sorry, this might come off as obtuse, I don't mean it that way. Perhaps I'm misreading your title, but I find it odd that a UX designer or product person would be the one writing proofs of concept that require a database.

Unless you mean for solo projects? In that case, it may work for you, but if you're not a database expert, it makes sense you would work around your limitations. That doesn't necessarily mean it's a best practice.


sqlite?

All my proof of concept (and some production) stuff just uses that until I need features or concurrency that it can’t provide.


Changing SQL schema is honestly just not that hard...

but even still, you can always use json columns in postgres if you don't want the db to enforce a schema.


Agreed, I've never understood this. There are lots of good migrations tools around for SQL databases. If people mean "you don't need to run migrations with Mongo, you can just start adding fields to documents as they're accessed, and clean up as-needed" then... don't do that. That's how the Mongo DB I am responsible for was managed for most of the last 10 years. It's a nightmare now, we can barely change it at all.


yeah json columns are a game changer and will definitely "disrupt" nosql usage, at least for my own projects


If your just prototyping you don't need a schema.

Complaining mongo doesn't scale is like saying your Mita can't off-road


>If your just prototyping you don't need a schema.

Oh you always have a schema. Its just about whether you want to enforce it or not.


It’s enforced regardless in most situations. The question is whether you want it enforced on write with the database throwing an error, or enforced on every read by your app crashing or behaving incorrectly when it gets wildly unexpected values back.


If you are prototyping a local file will do, or store json in a text field in sqlite, or in a json column in postgres. You will even have more flexibility than mongo since you will not have the constraint of their document data model.


Echoing this.

My prototyping always starts with

    DB = {}
at the top of my file. Sometimes it grows to serializing to disk / loading the dumped object from disk. Often times it's all I need to know that my idea was crap and needs revised. And it always keeps me from faffing about with infrastructure.


Only your last option will allow me to send a webpage link to a friend for him to try it.

The idea is if you have just a couple of days to crack out of MVP you don't want to waste time with postgres or whatever. The problem ends up being then your boss is like, all right this works keep going with it


I don't understand why postgres w/ json enables a website, where sqlite and text files don't. You just ship the sqlite/text file with the rest of the webserver code?

It's less work than setting up mongo, postgres, or whatever long-running data store. Just deploy the web application... and you're done.


So that would make sense, if it's purely for show. For a recent project we actually used firebase but I wanted to enable myself to update the database without redeploying the website.

I generally dislike when people try to write off a technology just because they don't know how to use it. Don't hammer in screws, but it doesn't mean you should throw out all of your hammers


Careful here: while this may be true of some NoSQL stores that I've not used, this is a tarpit for the wide column datastores I'm familiar with (Dynamo, Cassandra, and others).

It's very easy to get going quickly and find out that you've tripped over antipatterns, and now your database is "a database, a Ruby app, and 30ms of latency on every call". It's easy to think "I'll model this later" and end up hitting the database five times in a row to answer a question.

With these systems, up front modeling of your data model and access patterns is essential if the trade off you are trying to make (far less functionality for smooth performance at ridiculous scale) will ever make sense.


This is what the company I work for claimed. Now we are 10 years old and have a monster of a relational database stored in MongoDB.

There's never a good moment to switch database, but there is always a new problem caused by storing relational data in a document store.


Before PostgreSQL added BSON support this may have been true, but now it's pretty easy to build an SQL database that you can extend arbitrarily just like with Mongo.

You probably shouldn't use that too much, not for anything remotely production, but it's there if you need that flexibility or an easy place to dump javascript objects.

When you are just getting started, it's super easy to manipulate SQL data structures regardless.


I'll never ever use MongoDB again, because every single time I've ended up running a cluster it has always been the most troubled part of my stack. I've been burned way too many times to ever considering touching that stove again.


What do you mean by "troubled," and what docs did you follow to set up and run the cluster?

Caring and feeding for a database (of any type) with any type of HA has a learning curve. Hence the growing number of PaaS services that handle the setup and maintenance for you (AWS's RDS, Mongo's Atlas, etc)


MongoDB's own management software (Cloud Manager, Ops Manager) is littered with bugs to the point where it's nearly unusable for certain actions. (One of which being restoring backups)

It was really, really bad 3-4 years ago. Regularly entering irrecoverable error states while performing basic management operations via MongoDB's management GUI. I've noticed a significant improvement in the past ~1 year.


By troubled I mean it would regularly go into what I called "three stooges" mode where each node swore that another was the master. Of course this would mean that writes would stop dead in their tracks.

Given that this happened to me across companies, teams, and a half decade of time, I've decided that this is a case where the problem is Mongo and not me.


The only constant there is you which means it is perhaps you.

Setting up and caring for a DB cluster is a complicated thing, to the point of there being non-BS certification courses for nearly every major HA database including Mongo. It very well could be that there was a well-documented flag that you never learned.

This doesn't mean you're being unreasonable. I'd be cranky with any DB getting itself in a split-brain scenario, but my conclusion wouldn't be a bug in the software but rather it's a bug in my understanding. It's worth noting that getting in this state should either be impossible, or it should be obvious about how it arrived at such a state with links to relevant documentation.

(There's also little incentive to make it super easy to run in production on your own. They sell that as a service after all. It wouldn't surprise me if the product had invariants that assumed production-level configurations and that nobody's tested with whatever configs ended up making it go nuts.)


I share the experience of parent.

My startup ages ago wasn't big enough to have perf issues with mongo, but I certainly experienced data loss (ore wiredtiger).

A company I worked for (with the most capable team of DevOps I've seen in my career, incidentally) eventually moved from self hosting mongo to atlas, hoping to ease the pain. This was already a few years after wiredtiger came out and fix a lot of issues.

Performance problems kept happening and we just started switching to postgres.

At my new job, DevOps people keep complaining about our mongo clusters on atlas.

It's again, terrible perf and occasional weird issues.

We did a disk upgrade and the cluster took 3 days to move all the replica from starting up (and replicating data) to running. During those three days we were running with one less replica and everything was to slow. A complete nightmare.

Adding another replica wouldn't have solved anything because it would have taken a day to finish booting.

They've been trying to migrate off mongo for years but there is never time, as usual.


I think there are two separate aspects that get conflated into one.

1) Document database - rather than a strict rigid schema, you can store nested json documents in tables/collections. Or the idea of soft schema where the whole database doesn't need to be blocked for a schema change and you have some leeway in integrity.

2) Relational database - Ability to make complex sql queries that join data from multiple tables.

Mongodb has some support for joining but it doesn't have a sql variant. If your data is mostly key:val store then it's great. You can shard it, and have replicas. It's easy to make a fast reliable backend with mongodb. Many popular sites run on mongodb backend.

However with new json types in MySQL and Postgres, it too has support for inserting documents and querying subkeys. It can be sharded and replicated (albeit with a bit more configuration).

Couchbase which is like mongo (in its document store capabilities) N1QL which offers agility of SQL and flexibility of JSON.

So like any tool, it has it's tradeoffs.

Then again kudus to the author for evoking our reptillian brains: "Never use MongoDB" incites emotions and gets you on top of HN. If it was called "When to use MongoDB", it wouldn't get the same reaction.


> On a social network, however, nothing is that self-contained. Any time you see something that looks like a name or a picture, you expect to be able to click on it and go see that user, their profile, and their posts. A TV show application doesn’t work that way. If you’re on season 1 episode 1 of Babylon 5, you don’t expect to be able to click through to season 1 episode 1 of General Hospital.

That is exactly what I'd expect, and that is how small websites like IMDB work. I am on the page for a General Hospital episode, and via the actors in the episode or whatever other part I can click through to Babylon 5, or the other way around, or anywhere else.


> But there are actually very few concepts in the world that are naturally modeled as normalized tables. We use that structure because it’s efficient, because it avoids duplication, and because when it does get slow, we know how to fix it.

Urmmm...How?


MongoDB + Ruby sparks joy for me. It's come a long way since 2013 and latest features like transactions (though nowhere near SQL level of robustness) are enough for my use cases. To each his or her own.


In the early tens we ran an engineering project to improve critical sections in applications using transactional memory. We had a PhD level intern applying our techniques to various open source projects. One target was MongoDB. After a few days of investigation of the Mongo source code, he had to give up because he couldn't even find the critical sections in the source. They had locking, but it was extremely convoluted.

So yes I would agree with that. Never use MongoDB.


Jepsen review 2020:

https://jepsen.io/analyses/mongodb-4.2.6

... not good.



I'm not a fan of MongoDb (although I have used it for many years) but it has its place in the market and there is need for it. "Never use MongoDb" is a clickbate title. I really wish I could down votes this post but I can't. I can only upvote unfortunately.


Ok, so besides the technical faults of Mongo within the context of its category, what is the ideal use-case of a document oriented store?

If you use metadata documents to model your relations you might get away with the most dangerous foot guns, but then why not jump straight into graph databases?


I'm by no means a fan of Mongo, but the product has improved quite a lot over the years.

Mongo now supports multi-document (and multi-node) transactions, joins, and has a decent storage engine.

So you might even have a chance of keeping your data actually consistent.


Unless you want to.


SQL is a solution to a problem in the vein of prematurely optimizing for many use cases where mongodb is called for.



For the record we use mdb in production, and it's been fine.


TLDR never use software architects who don't realize how MongoDB is a really bad fit for their needs. Mistakes are possible, but not thinking about requirements is negligence.


plot twist .. use MongoDB api /drivers on Document Layer on FoundationDB ;-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: