What’s the best way to keep track of unique tags for a collection of documents millions of items large? The normal way of doing tagging seems to be indexing multikeys. I will frequently need to get all the unique keys, though. I don’t have access to mongodb’s new “distinct” command, either, since my driver, erlmongo, doesn’t seem to implement it, yet. Even if your driver doesn’t implement distinct, you can implement it yourself. In JavaScript (sorry, I don’t know Erlang, but it should translate pretty directly) can say:
result = db.$cmd.findOne({“distinct” : “collection_name”, “key” : “tags”})
So, that is: you do a findOne
on the “$cmd
” collection of whatever database you’re using. Pass it the collection name and the key you want to run distinct on. If you ever need a command your driver doesn’t provide a helper for, you can look at: HTTP://WWW.MONGODB.ORG/DISPLAY/DOCS/LIST+OF+DATABASE+COMMANDS for a somewhat complete list of database commands.
I have an embedded document that tracks group memberships. Each embedded document has an ID pointing to the group in another collection, a start date, and an optional expire date. I want to query for current members of a group. “Current” means the start time is less than the current time, and the expire time is greater than the current time OR null. This conditional query is totally blocking me up. I could do it by running two queries and merging the results, but that seems ugly and requires loading in all results at once. Or I could default the expire time to some arbitrary date in the far future, but that seems even uglier and potentially brittle. In SQL I’d just express it with “(expires >= Now()) OR (expires IS NULL)
” – but I don’t know how to do that in MongoDB.Just thought I’d update in-case anyone stumbles across this page in the future. As of 1.5.3, mongodb now supports a real $ or operator: https://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24. Or Your query of “(expires >= Now()) OR (expires IS NULL)
” can now be rendered as:
{$or: [{expires: {$gte: new Date()}}, {expires: null}]}
In case anyone finds it useful, www.querymongo.com does translation between SQL and MongoDB, including OR clauses. It can be really helpful for figuring out syntax when you know the SQL equivalent. In the case of OR statements, it looks like this
SQL:SELECT * FROM collection WHERE columnA = 3 OR columnB = ‘string’;MongoDB:db.collection.find({“$or”: [{“columnA”: 3}, {“columnB”: “string”}]});
Q. MongoDB Get names of all keys in collectionI’d like to get the names of all the keys in a MongoDB collection. For example, from this:
db.things.insert( { type : [‘dog’, ‘cat’] } );db.things.insert( { egg : [‘cat’] } );db.things.insert( { type : [] } );db.things.insert( { hello : [] } );
I’d like to get the unique keys:type, e.g., hello. You could do this with
MapReduce:mr = db.runCommand({“mapreduce” : “my_collection”,“map” : function() {for (var key in this) { emit(key, null); }},“reduce” : function(key, stuff) { return null; },“out”: “my_collection” + “_keys”})
Then run distinct on the resulting collection so as to find all the keys:
db[mr.result].distinct(“_id”)[“foo”, “bar”, “baz”, “_id”, …]
Accelerate Your career with MongoDB Training and become expertise in MongoDB. Enroll For Free MongoDB Training Demo!
Q. Is mongodb fit for sites like stackoverflow?Put simply: Yes, it could be. Let’s break down the various pages/features and see how they could be stored/reproduced in MongoDB. The whole information in this page could be stored in a single document under the collection questions. This could include “sub-documents” for each answer to keep the retrieval of this page fast.You could hit the document size limit of 4MB quite quickly this way, so it would be better to store answers in separate documents and link them to the question by storing the ObjectIDs in an array. The votes could be stored in a separate collection, with simple links to the question and to the user who voted. A db.eval() call could be executed to increment/decrement the vote count directly in the document when a vote is added (though it blocks so wouldn’t be very performant), or a MapReduce call could be made regularly do offset that work. It could work the same way for favourites. Things like the “viewed” numbers, logging user’s access times, etc. would generally be handled using a modifier operation to increment a counter. Since v1.3 there is a new “Find and Modify” command which can issue an update command when retrieving the document, saving you an extra call.Any sort of statistical data (such as reputation, badges, unique tags) could be collected using MapReduce and pushed to specific collections. Things like notifications could be pushed to another collection acting as a job queue, with a number of workers listening for new items in the queue (think badge notifications, new answers since user’s last access time, etc). The Questions page and it’s filters could all be handled with capped-collections rather than querying for that data immediately. Ultimately, YMMV. As with all tools, there are advantages and costs. There are some SO features which would take a lot of work in an RDBMS but could be handled quite simply in Mongo, and vice-versa. I think the main advantage of Mongo over RDBMSs is the schema-less approach and replication. Changing the schema regularly in a “live” RDMBS-based app can be painful, even impossible if it’s heavily used with large amounts of data – those types of ops can lock the tables for far too long. In Mongo, adding new fields is trivial since you may not need to add them to every document. If you do its a relatively quick operation to run a map/reduce to update documents. As for replication, Mongo has the advantage that the DB doesn’t need to be paused to take a snapshot for slaves. Many RDBMSs can’t set up replication without this approach, which on large DBs can take the master down for a long time (I’m looking at you, MySQL!). This can be a blessing for StackOverflow-type sites, where you need to scale over time – no taking the master down every time you need to add a node.
Q. How can I browse or query live MongoDB data?An ideal (for my needs) tool would be a web based viewer with dead simple features (browsing and doing queries). MongoHub is moved to a native mac version, please check https://github.com/bububa/MongoHub-Mac. https://github.com/Imaginea/mViewer - I have tried this one and as a viewer it’s awesome with tree and document views. genghisapp is what you want. It is a web-based GUI that is clean, light-weight, straight-forward, offers keyboard shortcuts, and works awesomely. It also supports GridFS. Best of all, it’s a single script! To install it $ gem install genghisapp bson_ext(bson_ext
is optional but will greatly improve the performance of the gui). To run it (this will automatically open your web browser and navigate to the app as well) genghisapp
. To stop it genghisapp –kill
I am using MongoDB v1.4 and the mongodb-csharp driver and I try to group on a data store that has more than 10000 keys, so I get this error: assertion: group() can’t handle more than 10000 unique keys using c# code like this:
Document query = new Document().Append(“group”,new Document().Append(“key”, new Document().Append(“myfieldname”, true).Append(“length”, true)).Append(“$reduce”,new CodeWScope(“function(obj,prev) { prev.count++; }”)).Append(“initial”, new Document().Append(“count”, 0)).Append(“ns”, “myitems”));
I read that I should use map/reduce, but I can’t figure out how. Can somebody please shed some light on how to use map/reduce? Or is there any other way to get around this limitation? Thanks. EDIT: I forgot that I have 2 columns in my key collection, added that. Thanks to Darin Dimitrov. In addition, I will post my solution that group by two fields, if anybody is interested in that:
Skip code blockstring mapFunction = @”function(){emit({fieldname:this.fieldname,length:this.length}, 1)}”;string reduceFunction =@”function(k,vals){var sum = 0;for(var i in vals) {sum += vals[i];}return sum;}”;IMongoCollection mrCol = db[“table”];using (MapReduceBuilder mrb = mrCol.MapReduceBuilder().Map(mapFunction).Reduce(reduceFunction)){using (MapReduce mr = mrb.Execute()){foreach (Document doc in mr.Documents){// do somethingint groupCount = Convert.ToInt32(doc[“value”]);string fieldName = ((Document)doc[“_id”])[“fieldname”].ToString();}}}
Comments