Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good point, sounds like I intend to keep it that way for this particular database in my comment.

It's setup for multi box (each schema is mapped to a hostname in code) but I simply haven't had a reason to move to more boxes yet. The schema feature is a nice, convenient way to pre-shard like this so that growing to more boxes doesn't require rehashing for a very long time if ever (depending on how much sharding you do up front). You just move schemas/shards as needed using the standard dump and restore tools and update the schema->hostname mapping in the code.



When sharding, do you do all joins in code, or just the ones that span several shards?


If I need to go cross-shard then I am doing it in code. If you knew both shards were on the same box you could do cross-shard joins if you used schemas like this but you would need some potentially tricky logic that determines if it is working with all shards on the same machine.

Thankfully most of the joins happen within a shard (hashing and sharding on something like a user_id) with the exception being various analysis and aggregation queries.

Using PostgreSQL's schemas is admittedly not too different from just using many DBs in MySQL or something else but in practice I've found that extra layer of organization helps keep things neater. I can backup, move, or delete a specific schema/shard or I can backup, move, etc all shards on a machine by operating on the containing database.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: