Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why do you consider tweaking _id a hack, and what is the greener pasture for you? If you use random string primary keys in any other database you'd run into the same problem. My app is much more write-intensive than read-intensive. My stuff wouldn't work on PostgreSQL at all, I have so much data that I must use sharding.


You're basically implementing an optimized version of ObjectId for your use case and having to take a couple stabs at it as well. No big deal, but there are just lots of little tricks like that to learn with MongoDB.

"My stuff wouldn't work on PostgreSQL at all, I have so much data that I must use sharding."

We had to implement sharding as well and chose to do it manually in PostgreSQL. Luckily our schema made that relatively easy and natural. YMMV


As mathias_10gen points out below this is the default behavior of ObjectId. You just have to be carful if you decide to override the _id field.


It isn't just the _id field, it's any indexed field.


I have a 1.5 TB Postgres database, sharded by schema, that runs wonderfully on a single box (12 core, 36GB RAM, raid 10 of 15k SAS drives). Why couldn't you shard with Postgres?


Am I reading that correctly that you're sharding a database on the same box?


Good point, sounds like I intend to keep it that way for this particular database in my comment.

It's setup for multi box (each schema is mapped to a hostname in code) but I simply haven't had a reason to move to more boxes yet. The schema feature is a nice, convenient way to pre-shard like this so that growing to more boxes doesn't require rehashing for a very long time if ever (depending on how much sharding you do up front). You just move schemas/shards as needed using the standard dump and restore tools and update the schema->hostname mapping in the code.


When sharding, do you do all joins in code, or just the ones that span several shards?


If I need to go cross-shard then I am doing it in code. If you knew both shards were on the same box you could do cross-shard joins if you used schemas like this but you would need some potentially tricky logic that determines if it is working with all shards on the same machine.

Thankfully most of the joins happen within a shard (hashing and sharding on something like a user_id) with the exception being various analysis and aggregation queries.

Using PostgreSQL's schemas is admittedly not too different from just using many DBs in MySQL or something else but in practice I've found that extra layer of organization helps keep things neater. I can backup, move, or delete a specific schema/shard or I can backup, move, etc all shards on a machine by operating on the containing database.


I would have to do that manually. Unless you know of an automatic solution that doesn't involve paying tons of money on commercial licenses?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: