We’ve been using MongoDB for a year and a half at ThoughtLeadr. During that time we’ve gone from elation to depression using this trendy NoSQL datastore. Based on the documentation, it’s not hard to see why you’d get pulled in. A schemaless, performant database that can utilize both sharding and replica sets to mantain high availability at nearly limitless scale. Well, I guess I should have known that when something is too good to be true, it probably isn’t. Here’s the lowdown of MongoDB’s web-scale breakdown.

Global lock

I’ll admit that I didn’t realize the global lock (now database level lock) was such a major issue when I first started using MongoDB. I’ve never written database internals myself before. While that doesn’t excuse me from doing my homework, I bought into 10Gen’s benchmark’s page. Oh wait, they don’t have benchmarks? Strange, I remember reading all these great articles about MongoDB’s performance when I picked it up. Way Backmachine to the rescue. This is one of the most frustrating aspects of working with MongoDB, the global lock has far greater repercussions in production than what you see in the benchmarks. I’ll get into specifics below but the global lock is a reoccurring theme throughout, a serious flaw in any database design, that should have be represented more honestly.

MapReduce is useless at web scale

When you run a MapReduce job against a database, the global write lock stops any other process from manipulating that data. Meaning if you run a moderate number of MapReduces per hour you massively degrade application performance against those collections. Plus, you can block the rest of replica set from sync’ing in a timely fashion, which can cause “primary” database switches and accidental loss of data.

One of the core principles of MapReduce is the ability to get concurrent data processing out of a system to analyze large datasets quickly. With MongoDB, MapReduces run inside a single-threaded Javascript VM eliminating concurrent processing and slowing web-scale data processing down to a crawl.

Sysadmin tooling is painful

Every major sysadmin tool is a blocking process. Most of this falls back on the global write lock issues or immature tooling but in practical terms if you need to modify your database structure in production you’ll be forced to have downtime.

Here’s a great example, database compaction only works if you are under 50% disk utilization. Let me repeat that again. Database compaction only works if you’re not using more than half your disk. Have you ever created a collection only to realize later that you don’t really need it? If you have over half your disk in use, you’ll need to take one of the replica set members offline and delete the entire database manually then bring it back up to sync all the data from the primary. Only after it has completed sync’ing, a process that can take days, can you set that system to primary and do the same with the other members of the replica set.

A woeful lack of production grade tooling

We never really felt the need to use an ORM tool with MongoDB since its JSON data structures map nicely to both Python and Haskell (our core languages) analogs. However, after using MongoDB in production for over a year, migrations became a real pain. There’s no mature tools to simplify this process – forcing any adopter to eventually roll their own migration system. Plus, there wasn’t a clear best practice for migrating objects in storage. Do you use a framework to lazy migrate objects as you need them? Or run a migration script to update all the data at one time (updating some objects even if you never use them again)? The answer is both, heavily dependent on the situation. When you spend most of your time in the NoSQL world, it’s easy to forget migration support is built into SQL with the ALTER TABLE command.

What’s next?

Honestly, we started using MongoDB because of its great documentation and blazing developer speed (amazingly fast to get up and developing features). The problems only crop up when your product has real traction, real data, and real scale. Then it becomes apparent that MongoDB isn’t ready for prime time. We’ve already switched over to Percona for our production metadata database but we’re not done with NoSQL. Our full database stack still includes Redis and Riak since we have a need for both fast IO and big data respectively.

40 Comments

  1. Wow.

    > 10Gen’s benchmark’s page.

    No apostrophe on “benchmark.”

    > Way Backmachine to the rescue.

    Wayback Machine

    > This is one of the most frustrating aspects of working with MongoDB, the global lock has far greater repercussions in production than what you see in the benchmarks.

    Comma should be a semicolon.

    > reoccurring

    Recurring. Reoccurring is not a word.

    > Meaning if you run a moderate number of MapReduces per hour you massively degrade application performance against those collections.

    Comma needed after “per hour.”

    > sync’ing

    Syncing is an accepted shorthand for synchronizing; it doesn’t need the apostrophe.

    > Most of this falls back on the global write lock issues or immature tooling but in practical terms if you need to modify your database structure in production you’ll be forced to have downtime.

    Could use a comma after “tooling.”

    > There’s no mature tools …

    Should be “There are no mature tools …”

    Interesting read but annoying read, due to inadequate communication skills. Consider using a teammate to proofread future blog posts. B+

  2. Andrew Pennebaker

    Haha, nice. I just finished a distributed and concurrent systems programming course at GMU, where my professor demonstrated the simplicity and ineffectiveness of a single lock distributed system. Alternative solutions include transactional memory, something to look into with Haskell/STM.

    Have you tried Redis? This makes two thumbs-down articles I’ve read on MongoDB, and I’m wondering if Mongo in particular is suboptimal, or if it’s a problem in other NoSQLs as well.

    I know Google uses MapReduce to great effect. What kind of non-global-locking database do they use?

  3. Taariq Lewis

    If you people used PostgreSQL like the the rest of the NORMAL folks, all this drama wouldn’t matter! If the fig leaf can’t cover ya, then find yourself an elephant! LOL!

  4. Xorlev

    We’ve had similar experiences with Mongo. Back when everything fit on a single server and we were much less seasoned in the ways of Mongo, we attempted a data migration with MapReduce. I wish I was joking.

    Another somewhat painful realization is that sharding is really the only way to scale out reads. I’d hoped that adding more slaves would do the job, but 5-10s behind on replication (on beefy boxes, only taking in 500 writes/sec — durably) made it a little too eventually consistent for our uses.

    Riak is a fantastic datastore, but we moved back to MySQL for anything we really cared about. The real eye-opener for me was the realization that NoSQL started as a way to have great lookup characteristics of key -> value pairs but has started adding on indexes and all the complexity and slowness that comes with them. At that point, RDBMSes have a leg up withs years of mature, tuned index implementations.

    • Amit Kumar

      >5-10s behind on replication (on beefy boxes, only taking in 500 writes/sec — durably) made it a little too eventually consistent for our uses.

      Then it was not the choice for you. No need to bash MongoDB. After reading the basic Mongo guide this should be a big red flag already not to use it for your app.

      • Xorlev

        I figured there’d be a response like this.

        At the time, it -was- the correct choice. We had extremely fluid schemas and trying to prove out a product without putting too much into it. MongoDB was great to us for that while.

        MongoDB promises a lot but has a lot of caveats when you go to make it a functional datastore for a real business.

        I’m not sure what part of the Mongo manual says, “hey, you can only scale if your writes aren’t durable.” I agree though, the manual required a close look once we had time to make sure our data story was in line. It certainly does scale, but not effortlessly like it tries to claim. They’ve improved things significantly in Mongo 2.2, but I still expected more out Mongo given the hardware we threw at it.

        http://www.mongodb.org/display/DOCS/Production+Notes are what I followed to keep us up and running.

        • Mahesh Paolini-Subramanya

          Funny – virtually all posts involving MongoDB not quite working for someone (immediately, eventually, whenever), end up w/ the “You should have read the documentation more carefully” argument.
          I suspect there is some of Godwin Rule equivalent for MongoDB —
          “All discussions about MongoDB will eventually claim vindication via documentation” 🙂

    • Shakakai

      The delay on replication was less of an issue for us but it is definitely something you need to account for when selecting Mongo. We love Riak. All our MapReduce functionality runs in Erlang across our Riak cluster, fast and efficient.

  5. Nikita Ivanov

    We are helping at least two customer a month to migrate away from MongoDB to something that really scales, i.e. GridGain. With production customers running on 1000s of nodes in fully ACID transactional mode on document data in real time – GridGain can handle a lot. Take a look: http://www.gridgain.com

  6. hasenj

    Out of curiosity, did you consider Cloudant/BigCouch?

    Though I think someone in your situation would just be sick of all the nosql crap

    • Shakakai

      I’m definitely bullish on NoSQL (or more accurately databases built for more specific use-cases). We checked out CouchDB, MySQL (a couple different flavors), and Postgres.

  7. Esteban Feldman

    I like this kind of posts because it will help MongoDB to improve.

    • Shakakai

      Agree. I hope MongoDB will figure out a way to rework their internal architecture for v3 or v4.

  8. Anon

    Mongo’s architecture is a joke. This sort of story is no surprise to folks who’ve been working with database internals. This has nothing to do with SQL vs NoSQL, and everything to do with the fact that their system is basically a pile of rookie mistakes… Memory-mapped files, seriously?

    • Brady Sullivan

      I don’t understand your problem. What’s your beef with memory mapped files?

      • Shakakai

        Memory-mapped files cause the 4GB collection size cap on 32bit machines. Also, under certain circumstances IO on memory-mapped files can actually be significantly slower than standard IO (due to page faults).

        To be honest, this wasn’t one of our issues with Mongo.

        • Augusto Delatorre

          Then don’t run 32bit. Mongo is not, nor has it ever claimed to be, universally appropriate.

  9. Brady Sullivan

    Have you read CJ Date’s Database Systems book? It’s killer and perhaps would have shown you the fall backs of a NoSQL solution in your case.

    Also, great write-up! It’s always interesting to read someone’s debrief on a difficult problem.

  10. taxilian

    It is certainly interesting to read articles like this; we use mongodb in our production app (with ~6K hits per week so far and growing) and so far everything is working great. Could we have done the same thing with sql? probably, but much of it would have been more of a pain. Totally agree with the frustration about insufficient tooling; of course, mysql was the same way when it first came out. I think most of the issues you mention will be fixed with time.

    The Map/Reduce issues you mention are definitely important for anyone to understand; map/reduce is not something that works well for doing anything complex in real-time. If used differently, however, it can be insanely powerful. http://hamstudy.org (personal site, not the production one I mentioned earlier) uses map/reduce to keep track of user statistics by simply updating them as it goes with a reduce into collection map/reduce; it’s blazing fast. MongoDB 2.2 has introduced the aggregation framework which hopefully fixes some of the other cases where map/reduce doesn’t cut it due to the lock, though.

    The only thing I really disagree with in this article is the statement that “MongoDB isn’t ready for prime time”. That’s not true; it is absolutely ready for prime time, but it is unfortunately not always easy to determine what the performance issues you will have are ahead of time. If you understand mongodb well enough and are using it for the things that it is good for then it is absolutely ready for prime time; if you misunderstand some of it (which unfortunately many of us don’t do, and the docs are still a little young so it’s not always easy to understand all of it correctly without experience) then you may be trying to do things with it that it simply isn’t designed to do.

    #1 thing that people should realize before signing up to use mongodb is that mongodb is *not* a drop-in replacement for sql; it’s a totally different system, different paradigm. Many things are far, far easier to deal with and incredibly slick. Unfortunately, some things simply aren’t possible or are not reasonable to do in a performant manner.

    The good news is that the mongodb team seems to be (from what I can tell) working hard to try to continue to improve and provide solutions to the problems. Until then, understand what they are and choose wisely =]

    • Noone

      6,000 hits/visitors a week is literally nothing! When you’re seeing 6,000 users per hour then maybe its slightly busy.

      • taxilian

        I’m sorry, I didn’t realize that this was a contest. The point is that even that is enough load to start doing some measurements and see what things do and don’t work — and some things do and some things don’t work. The important thing is to understand the tools and the performance implications of various pieces.

    • Shakakai

      We have 110M users hitting our system every month and that’s growing quickly so you haven’t seen our scaling issues yet.

      • taxilian

        Absolutely agreed. That tends to be the case with any database, though, and there are other websites that are using mongodb successfully with far more users than that. The argument that because the things you expected to work at that scale weren’t performant then the database “isn’t ready for prime time” is logically flawed, though; it simply means that Map/Reduce in the way you’re using it isn’t feasible at that scale, or other particular things you were doing isn’t feasible at that scale.

        One of the biggest challenges with MongoDB (and other nosql databases, but mongodb more than most) is that it is so similar in so many ways to sql that the natural first instinct is to do things in the same way, just substituting things straight across. That doesn’t work; it’s a totally different database architecture and requires entirely different architecture of your classes. For extensive aggregation and reporting map/reduce isn’t the equal of sql joins and groupings. That may mean in your case that mongodb isn’t a good option, but that doesn’t make it a bad option for anything else, just for your use.

  11. Pierce Wetter

    Global lock is gone in 2.2

    http://blog.serverdensity.com/goodbye-global-lock-mongodb-2-0-vs-2-2/

    • Shakakai

      That’s true but they replaced it with a database level lock. The best practice now is to have only one collection per database to minimize the lock. Makes you wonder why they even have databases at all if you have to organize everything at the collection level.

  12. mjasay

    I find these sorts of articles frustrating, because for every “MongoDB doesn’t work” I know of scads more “MongoDB is manna from heaven” cases. The problem is that no one feels compelled to write up the latter, which probably contributes to more of the former. We need better knowledge-sharing between those who are successful with MongoDB (or any technology, really) and those just coming up to speed. I know companies (big brand names that you use every day) that are running MongoDB at massive scale. But are they going to blog about it? Almost certainly not.

    None of which is to downplay the particular problems encountered here. Whether user error/”you should have read the manual” or due to real problems with the technology, the result is the same: unhappy user.

    Todd, any chance you’d be willing to debrief with some friends at 10gen? Not to get you back on MongoDB, but rather to just try to better understand the complete experience you had, so that it can be improved for others (and hopefully for you on your next application). Ping me at mjasay @ that Google mail thing.

    • Shakakai

      Sure, I’d be happy to chat with the 10gen folks. I’ll shoot you an email.

    • mjasay

      I should note that I’m not disinterested in this, having very recently joined 10gen. Still, the reason I joined is because while MongoDB isn’t perfect, I do believe it’s moving (fast) in the right direction and is already good for a great many use cases. It may not have been ideal for yours, Todd, which is why I’m glad we’re going to be talking about it, to see if we could have done something better and helped you be successful.

  13. BuggyFunBunny

    Eventually, Kiddie Koders have to become adults. The relational model and RDBMS are the answer, if you care about the data and aren’t mostly interested in generating oodles of LoC.

  14. Clifford Farrugia

    I have recently posted a blog article about problems we faced during the last 18 months with MongoDB if you would like to do some further reading. Here: http://blog.trackerbird.com/content/mongodb-performance-pitfalls-behind-the-scenes/

  15. Moving on… to Percona’s XtraDB Cluster | ThoughtLeadr

    […] on from our MongoDB tour of duty, this post describes our recent experience in setting up Master/Slave and transitioning to […]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>