Reflections on MongoDB
We have used MongoDB on several production and play projects over the last year. We love it. MongoDB is now the default database for new projects at Collective Idea.
There will be blood
But before I get too far, let me be clear: MongoDB is not all leprechauns and unicorns. It’s the bleeding edge, and you will bleed.
The libraries and frameworks are still immature. There are some awesome libraries in development, but if you’re coming to them from a mature library like Active Record, then you’ll find them lacking (nothing a little sweat and blood can’t fix).
As a result of immaturity, there is a lack of curated documentation. The wiki on mongodb.org has some great information that will get you started, but there isn’t much beyond that. As Mongo gains traction, that will change.
There is no spoon
Our industry has accumulated 40 years of relational database theory. If something can be modeled in a relational database, then somebody has done it, and there is generally consensus on which way is the best.
But everything we know about modeling is tied to relational databases. We split out our data structures into third or fourth normal form (there are normal forms that are too complex for mere mortals), and then create models that directly map to those structures. We rarely consider how the data will be used when designing the database.
With document databases, the approach is exactly opposite. Figure out how the data will be used, then figure out how to structure it. There is a distinct moment of epiphany when you realize that you don’t have to create a separate collection just to store repeating phone numbers, or add a join table to associate an article with multiple categories.
It’s not the Wild West
While RDBMs are great for storing data, it turns out they are actually horrible application development platforms (are there any Oracle developers in the house?). And for too long, vendors tried to push application logic into the database. Many Rails developers realized long ago that specifying application constraints in the database is actually a bad idea. Among other reasons, it leads to errors that are extremely difficult to present gracefully to the user.
MongoDB is a “schema-less” database, meaning that it doesn’t care how you structure the data that you put into it. Some worry that schema-less databases are too out-of-control, like you’ll wake up one morning to find your user records have become mutants.
Having a schema-less database does not mean your application is schema-less. It just means that we are choosing to specify our entire schema in our application, where we were already defining constrains on our schema.
When should I use MongoDB?
Always.
No, seriously!?
OK, I think MongoDB makes sense with most web applications. In the end, most apps are just doing glorified CRUD, and don’t need ACID or many of the features of a relational database.
There are times when you definitely should not use MongoDB, like when you need transactions. Document databases are shiny and new (to most of us–others have been using them for 20 years), so we’re still figuring out how to model things. Conventions are emerging, but de-normalization and application-driven design make it difficult to apply universal patterns.
If you haven’t checked it out yet, I highly recommend it! If you have, then share your thoughts and experiences in the comments.
Comments
Great post. I’ve been using MongoDB and MongoMapper since Nunemaker introduced it at a SBRB meeting awhile back, and its changed my development process.
I will agree there are times it doesn’t work. I used it in a distributed jukebox hardware/software solution and was one of the early ones to push gridfs and mongomapper integration. I found out a few months down the road that the mongodb master/slave replication is very delicate, and can become corrupted if the perfect storm happens, which seems to be a lot around me. I was storing all the mp3’s for this jukebox in gridfs, and after a power outage and battery backup dying, had the indexes corrupted, which requires a complete resync from the master to the slave (which i could have partly avoided with local daily backups), which took hours, meaning the clients store was without music during that time. I moved mp3 storage to the filesystem and haven’t had any problems since, and if I do get another corrupt db it will take less than a minute to autoresync.
I know this is an extreme example, but just thought I’d share it as a warning. I love mongodb and use it on virtually every project.
Ignoring the issues with the bleeding edge nature of something like MonoDB…
How has it helped you build a better application when compared to a similar solution using a RDBM? Is it much faster to develop with? Is it easier to test?
I am curious about specific examples where using a NoSQL solution as a RDBM
replacementended up being a much better solution.The vibe I keep getting is that a NoSQL solution is a really good idea because it’s not a RDBM solution.
The only concern I have with MongoDB regards single-server durability. How have you handled this issue in your projects?
There is a powerful management studio (Database Master) for MongoDB, you can download it here:
http://www.nucleonsoftware.com/
Awesome article. We’ve made the jump to MongoDB for a few apps, nothing in production yet. Reading your article gives me the confidence for us to push further with mongodb, and not worry about its immaturity as much.
Thanks!
Thanks for Database Master link, but the software doesnt seem to be able to connect to Mongo 1.5.2 :D I hope its fixed soon.
We have 2 rails production sites now using mongodb, first with mongomapper, but now with mongoid, and I too have fallen in love with it. We have 2 more production sites launching in the next few weeks using it, and when you combine mongodb with mongohq and heroku.com developing is really fun. I completely agree with you though, it is a bit bleeding sometimes. I myself had to fork several projects on github.
@clayton, apart from all of the usual arguments about mongodb being faster (no joins) and more scalable (autosharding), etc, etc, one thing I found was really nice was that when I converted an ActiveRecord/PostgreSQL project to it, my test suite reduced from 20 minutes to around 2 minutes, which has really increased my development speed!
Jonathan Hoyt: Thanks for sharing. I’ve heard a few other stories like that. I’m hopeful that it is a result of MongoDBs immaturity, and over time it will become more stable.
Clayton: That’s a great question, and it may require a separate blog post to answer. I’ve experience both times where porting an app to MongoDB made the code cleaner and development faster, and I’ve had to port an app from MongoDB to MySQL for the same reason.
Toby Hede: I actually haven’t had to deal with that yet. All of the MongoDB apps I’ve worked on are either hosted by Rails Machine, MongoHQ, or were prototypes that didn’t experience serious traffic.
Ryan: I’m glad. Keep us posted on how it’s working out for you.
I’ve just recently started using MongoDB for some fun apps. I chose it over Redis because it’s Javascript, which means I can extend the language itself with myriad tools. I also liked that MongoDB has an actual programming language, not a DSL, backing it; furthermore the language is modern and functionally powerful. Also, since ruby can speak JSON, it’s a natural fit.
Though it’s very interesting to work with, I must confess it’s kind of scary. With relational DBs the schema is at the level of the data, so it’s like a security net – you’re guaranteed your data will always remain in this form until you choose to change it, which requires a lot of incantation and sometimes some black magic. But with MongoDB and other Document-based Databases, a schema change is a few keypresses away. MongoMapper (which is fantastically powerful) has a Versioning module, which may bring some of the ActiveRecord::Migration structure to the table, but the notion of a transient schema is a big hurdle to acceptance and application development.
Sherrod: I would argue that static schemas are a lot like statically typed languages. Yes, the static schema provides a security blanket, but it’s really a poor way to enforce data integrity. It only enforces one aspect of integrity (data type) and leaves it to your application to worry about the rest (validations).
@Brandon: That’s an interesting parallel! I would argue there’s another aspect, though, it enforces data structure. With a schema, there’s absolutely no way additional columns can be added without them being explicitly defined in the schema. With Document Stores, any data of any kind can be added ad-hoc. In fact, this idea is crucial to some mechanics of Document Stores, such as arbitrarily nesting hierarchies.
I think the challenge here is the proverbial happy medium – an explicit yet easily mutable schema, and the enforcement of it, which is where the ORM comes in. But I wonder if, with the Document Stores modernizing the idea of a Relational Database, our current ORMs are short-sighted; I wonder if there’s a more modern, more appropriate component to communicate with the application and data layers?
@Sherrod Good points. I think it is an indication of a new set of patterns and practices. I believe that Brandon’s parallel hits the nail on the head. I would like to also point out that it also emphasizes the importance of testing. If you test your code well a dynamic schema may provide advantages in many areas with the safety of tests to ensure that you haven’t broken anything.
Good discussion, I have used MongoDB with Mongoid on a couple of projects. We are currently using MongoDB for a large commercial application in development.
I originally looked at Mongo as a fast Key/Value Store but with the added benefit of secondary indexes, while I still like those features, I really love the schema less design.
Writing Oracle Database migration scripts for production sites have taken their toll on me, being able to change the schema on the fly has been an absolute joy. It’s also amazing how much having embedded collections in documents is much more natural than all the normalization I have done in the past.
MongoDB is one of those things like switching from a Windows PC to a Mac or from Java to Ruby, you know how much easier. your life has become but trying to explain that to someone else is challenging
Sherrod: Don’t quote me on this, but I’m pretty sure that MongoMapper has an option that will restrict which keys can be used. It’s always annoyed me that Active Record automatically gets the schema from the database, but then you have to define validations and associations in the model. I’ve been loving in Mongo having everything defined in the model.
MongoDB and its document-oriented brethren are very exciting, and even fun to use. However, these technologies are relatively new (in the grand scheme of things), and the sysadmin in me wonders how easy they are to administer and maintain. In your experience, do they require any hand-holding or tweaking to achieve optimal performance, as do their relational counter-parts?
One pattern I’ve had success with:
I establish a nice normalized db schema for basic entities; users, widgets, …
From this, I build useful data structures in mongo (or other nosql engine) that are application specific from the RDBMS. I also store things that are a pain to store in an RDBMS in the nosql engine, but in a separate collection namespace (or separate db).
I get to store everything I want. I can restructure the nosql cache at any time and re-generate. It’s usually pretty fast. The only downside is remembering what goes where, but I was already managing multiple RDBMS connections (read vs write nodes), so I don’t consider it to be a problem.
YMMV
I’m still thinking about the posibility of giving up MYSQL. But the lack of many-to-many relationship really bothers me much. Anybody know a smart solution to deal with this?
FWIW : Single server durability is coming.
memo: many-to-many is actually really easy in Mongo. You can use the
:in
option:That stores an array of ids in the record, giving you a many-to-many.
I quote http://gist.github.com/425349:
Mongo is a poor man’s couch.
Great post. I really think that Mongo, or something like it, will become the next generation of databases. However, I’m very surprised that it has become a default database choice at any development company. It seems like a very risky thing to do to take a chance with your datastore on mission critical applications. Especially when there isn’t a wide body of knowledge for backups, recovery, etc.
I am facing 2 much trouble, mongodb one to many relationship with spring
pls anybody suggest me.