A few clarifications: \* Slides w/o notes available here: http://opensourcebridg...

Joakal · on June 23, 2011

A good part of your presentation should have been what your requirements are.

For example: MongoDB's schemaless feature has a downside that means more data as a key must be defined individually. Great if you constantly need to fix/change the schema. A structured schema solution obviously has a space saving.

schmichael · on June 23, 2011

Schemaless is not great when you need to actually fix or change your schema. It's very difficult to migrate every document's schema without downtime.

Schemaless is great if you have a dynamic schema. We did not for the main dataset in question, and I probably should have mentioned that.

We do have 1 smaller dataset still in Mongo that currently has a dynamic schema and works perfectly (very little data & traffic though).

jamwt · on June 23, 2011

Kind of a side topic, but lazy, JIT migrations work well for this. You keep a "migration chain" in your fetch layer that goes v1 -> v2, v2 -> v3, and every persisted doc is tagged with the schema version at the time of persistence.

Then, your binding code just runs it through whatever subset of the chain brings the document to v-latest. If the document is re-written, it will (naturally) be re-written in updated form, and you'll only pay the upgrade penalty once (or, not at all for cold data).

It sounds kinda creepy to old-school schema theorists, but in practice it actually works pretty well (provided you always use this library to retrieve your objects.)

Using this method, I've never done a stop-the-world schema migration on a schema-free system, and I don't think I ever would... otherwise, you're tossing out one of the major benefits of schema-free: no migrations! If you're going to be rigorous about full-collection schema consistency, you maybe should have just used a proper RDBMS to begin with...

_urga · on June 23, 2011

I am trying to come up with a simple way of automatically handling data and schema migrations across a database that spans client and server treating both as masters. It just seems incredibly difficult to get right. Your idea of handling migration at read time sounds promising.

jamwt · on June 23, 2011

Yep--and just to reassure you, it's more than an idea on my end... I designed and maintained a critical system a few years ago on a large K/V store that used this method, and it worked without a hitch. We never even needed to think about the fact that the documents within the store were comprised of various schemas from several generation of the application (except when we wrote and tested the next incremental link the upgrade chain).