Wednesday, September 21, 2016

Integrate R with MongoDB

MongoDB is a free and open source database that uses a document-oriented data model. MongoDB was created by Dwight Merriman and Eliot Horowitz who had confronted development and scalability issues with traditional relational database approaches while building Web application at DoubleClick.
MongoDB was written in C++ language. Classified as NoSQL database, MongoDB supports dynamic schema design and avoids the traditional table-based relational database structure in favor of JSON-like documents.
While R is a wonderful tool for statistical analysis, visualization and reporting. It is also free and open source tool which is mostly used by statisticians, Analyst and Data scientist.
Now here I am going to show how to extract data from a MongoDB with R. And also you will learn how to analyze the data in R.
Before starting R session, we need to install MongoDB in the local machine and then load the data into the database.

MongoDB Installation

The first step is to check the operating system architecture of your system either it is 64 bit or 32 bit using the following command.


umic os get osarchitecture

Now you know your system architecture, to install the MongoDB on windows, first download the latest release of MongoDB from http://www.mongodb.org/downloads. Download 64 bit 
MongoDB for windows, as 32-bit versions of MongoDB will not work with 64 bit windows.


After Downloading the MongoDB setup, Double click on it  and click on Next Then agree to the license agreement > click Next> choose the setup type as complete > and finally click on Install:

Once you click on installation is complete you can check where your mongoDB is installed in your Local Disk (C):



Note: In this bin directory there are a bunch of mongo executables out of which mongo and mongod are the important ones, since mongo is the shell and mongod is responsible for starting the server. So we would have to run them on the command line.

So in order to run the server and the shell we need to change the path of the directory so that we can access it through the shell (command prompt).

Go to the control panel and then System, then click on Advanced system settings. Next click on the environment variables and thereafter choose the Path variable from the system variables.

Next go to the directory where MongoDB is currently present and copy the path till the bin directory where the mongoDB utilities are present and then append it in the Path variable and click on ok button.

After that before starting the mongoDB server we need to create a \data\lib directory where mongod will put the data. 


md \data
md \data\db

Now mongoDB is successfully installed and you can start the mongoDB server by using the following command:
C:\Users\Admin> mongod 

Once the mongoDB server has started you can start the mongo shell in another command prompt:

                                               mongo

Stored Data into MongoDB

# Create Database
use MyDatabase

# Create Collection inside Student Database
db.createCollection(Student)
# Insert Data into Student Collection
db.Student.insert([
{Name: 'Vihaan', RollNo: 101, Marks: 89, City: 'Gurgaon'}
{Name: 'Rohit',RollNo: 102, Marks: 74, City: 'Delhi'},
{Name: 'Sangeeta',RollNo: 103, Marks: 68, City: 'Noida'}
])

# View Table
db.Student.find ().pretty ()


Access MongoDB Data into R

# Install rmongoDB package
library(devtools)
install_github(repo = "mongosoup/rmongodb")

# Include library
library(rmongodb)
# create a connection to mongodb localhost
mongo_data = mongo.create(host = "127.0.0.1:27017")

# check whether mongodb is connected
mongo.is.connected(mongo_data)
## [1] TRUE

# View Database of MongoDB
mongo.get.databases(mongo_data)
## [1] "db1"        "MyDatabase" "mydb"       "RDatabase"  "test"

# shows all databases present in mongodb
mongo.get.database.collections(mongo_data,db = "MyDatabase")
## [1] "MyDatabase.Student"

# This would suffice as this would convert the entire list into a data frame in R.
data1 = mongo.find.all(mongo_data, ns = "MyDatabase.Student",data.frame=TRUE)
head(data1)
##                        _id     Name RollNo Marks    City
## 1 57cab63481d052e22ef3247c   Vihaan    101    89 Gurgaon
## 2 57cab63481d052e22ef3247d    Rohit    102    74   Delhi
## 3 57cab63481d052e22ef3247e Sangeeta    103    68   Noida

# To View all the function of rmongodb Package
mongofunction<-ls("package:rmongodb")
mongofunction<-data.frame(mongofunction)
head(mongofunction)
##            mongofunction
## 1 as.character.mongo.oid
## 2         mongo.add.user
## 3      mongo.aggregation
## 4     mongo.authenticate
## 5    mongo.binary.binary
## 6  mongo.binary.function


You can do lots of things using RMongoDB package. You will use lots of functions and perform Data manipulation and visualization on it. 

No comments:

Post a Comment

Creating Compelling Pie Charts in Looker: A Step-by-Step Guide with Examples

Creating Compelling Pie Charts in Looker: A Step-by-Step Guide with Examples   In the realm of data visualization, pie charts are a clas...