Thursday, March 30, 2017

Extract Google Analytics Data using R

Google Analytics is arguably one of the most powerful web analytics service offered by Google tracks and reports website traffic. Google Analytics is a free Web analytics service that gives insights and fundamental analytical tools for website improvement (SEO) and promoting purposes. The administration is accessible to anybody with a Google account. Google purchased Urchin Software Corporation in April 2005 and utilized that organization's Urchin on Demand item as the reason for its present service.


Features of Google Analytics you should be know

1.      Custom Reports and Dashboards
2.      Share a new filter in multiple Views
3.      Set up your Goals at your view level
4.      Google Analytics Segments for analysis of subsets, such as conversions
5.       Emails based sharing and communication

6.      Provide facility to integrate with other google products, such as AdWords, Public Data Explorer etc.

      Through the Google Analytics Dashboard, clients can gather data on individual’s person whose sites connection to long range social networking websites, for example, Facebook and Twitter.
Is it accurate to say that you are sick of spending a large portion of your day replicating information out of the Google Analytics interface to refresh that same old report? Fortunately, there is a lot of tools out there to help with this: Google Spreadsheets, Qlikview, Shuffle point, and Tableau, just to give some examples. One of my most loved free apparatuses is R.
R is an effective program for statistical analysis, visualizing and reporting. Using R Language we can easily access the Google Analytics API with writing a few lines of code. 

Enable the Google Analytics API
In the first place, you have to ensure the Google Analytics API settings are arranged effectively.

1.      Navigate to Google Developer Console and Create a new project


1.      After creating a project enable the Google Analytics API.
2.      Go to Credentials tab, and create a new client Id and secret key(It will give you 4 option like API key, OAuth client ID etc. you need to select one options according to your need)
3.      Select Application Type(E.g. In this example I select Other Option) and give the name of the Application
4.      A pop box is generated which shows you client ID and Client Secret Key.
Once the project is configured and the accreditations set prepared, we have to verify your Google Analytics Account with your application. This guarantees your application (R Script) can get to your Google Analytics information.

 Using RStudio
 In R (I will be using RStudio), Load the necessary packages.

library(RGoogleAnalytics)
Client.id <-"xxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com"
Client.secret <- "xxxxxxxxxxxx89A7Fv"
token<-Auth(client.id, client.secret)

Now you need to authenticate, which is basically telling Google Analytics that you have a right to access this data.

ValidateToken(token)

By running this command, a web browser should pop up, confirm your application to access your data, Click “Allow”


Once validated you get a couple of tokens (Access Token and Refresh Token). An Access Token is affixed with every API ask for so that Google's servers realize that the solicitations originated from your application and they are legitimate.
Now, very next step is to get the profile ID of the Google Analytics which the information extraction is to be done. It can be found in Admin Panel of the Google Analytics UI.

# using init function to initiate the fetching process of data
query_list <- Init(start.date = "2016-07-01",
                                  end.date = "2017-03-28",
                                  metrics = "ga:bounces",
                                  dimensions = "ga:month, ga:year",
                                 sort = "ga:year",
                                 max.results = 1000,
                                 table.id = "ga:xxxxxx")

Before questioning for a set of dimensions and metrics, you might need to check whether they are perfect. This should be possible utilizing the Dimensions and Metrics Explorer

#using Query Builder and get report data for mapping
ga.query <-QueryBuilder(query_list)

# Extract the data and store it in a data-frame
Bounces_report <- GetReportData(ga.query, token)
View(Bounces_report)


Let take another example, suppose we want to fetch the New user and Returning user counts

query_list1 <- Init(start.date = "2016-07-01",
                                  end.date = "2017-03-28",
                                  dimensions = "ga:userType",
                                  metrics = "ga:users, ga:newUsers",
                                 max.results = 1000,
                                 table.id = "ga:xxxxxx")

#using Query Builder and get report data for mapping
ga.query1 <-QueryBuilder(query_list1)

# Extract the data and store it in a data-frame
UserData_report <- GetReportData(ga.query1, token)
View(UserData_report)


In this way you can fetch data from Google Analytics using various dimension and metrics in R. In case if API returns an error, here’s a guide to understanding cryptic error responses.
















Monday, March 20, 2017

How can R and Hadoop Used together



As the amount of big data is increases especially unstructured data that are collected by the big organizations. IT infrastructure is not simple, not able to meet the demands of this new “BI Analytics” Pictures. For these reasons, many organizations are turning to the R and Hadoop. R is a programming language and software environment for statistical computing and graphics while Hadoop is a Java-based programming framework that supports the processing of large dataset in a distributed computing environment. Both technologies is an open source that why many data scientist and Analyst prefer to use them. 


As you know very well the R have ability to analyze data using rich library of packages but fall short when it comes to working with large datasets. Whereas Hadoop have capability to store and process large amount of data in TB and PB range. 

A portion of the reasons why R is such a solid match for Data Analytics are as beneath:
Effective Programming Language
It can be utilized for realistic applications
Factual/statistical programming features
Propelled data representations using advanced graphs
The R data structure
Extension through the vast library of R packages

Getting data about well-known associations that hold Big Data4 ways to Integrate R with Hadoop
Some of the popular organizations that hold Big Data are as follows:
• Facebook: It has 40 PB of data and captures 100 TB/day
• Yahoo!: It has 60 PB of data
• Twitter: It captures 8 TB/day
• eBay: It has 40 PB of data and captures 50 TB/day

How much information is considered as Big Data varies from organization to organization. For a few organizations, 10 TB of information would be viewed as Big Data and for others 1 PB would be Big Data. So no one but you can figure out if the information is huge Data. It is adequate to state that it would begin in the low terabyte extend. Likewise, a question well worth asking is, as you are not catching and holding enough of your information do you think you don't have a Big Data issue now? In a few situations, organizations actually dispose of information, in light of the fact that there wasn't a savvy way to store and process it. With stages as Hadoop, it is conceivable to begin catching what's more, putting away every one of that information.

Understanding the reason for using R and Hadoop together
Sometimes data lives on the HDFS in different formats. Since a considerable measure of information experts are extremely profitable in R, it is normal to utilize R to process the information put away through Hadoop-related devices. As specified before, the qualities of R lie in its capacity to analyze data utilizing a rich library of packages yet miss the mark with regards to chipping away at huge datasets. The quality of Hadoop then again is to store and process huge sums of data in the TB and even PB extend. Such unlimited datasets can't be prepared in memory as the RAM of each machine can't hold such expansive datasets. Such solutions can likewise be accomplished in the cloud platforms stages, for example, Amazon EMR.

There are possibly 4 ways to use R with Hadoop together. 

RHadoop  --  RHadoop is an extraordinary open source programming system of R for performing data analytics on the Hadoop platform by means of R capacities. RHadoop has been produced by Revolution Analytics, which is the main business supplier of programming and administrations in view of the open source R extend for statistical processing. The RHadoop extend has three distinctive R packages: rhdfs, rmr, and rhbase.
Every one of these packages are actualized and tried on the Cloudera Hadoop disseminations CDH3, CDH4, and R 2.15.0. Additionally, these are tried with the R form 4.3, 5.0, and 6.0 appropriations of Revolution Analytics.
 

These three distinctive R packages have been planned on Hadoop's two principle highlights

rhdfs: This is an R package for giving all Hadoop HDFS access to R.   Every appropriated document can be made do with R functions.

rmr: This is an R package for giving Hadoop MapReduce interfaces to R. With the assistance of this packages, the Mapper and Reducer can undoubtedly be produced.

rhbase: This is an R package for taking care of information at HBase distributed database through R.

2.      RHIPE -- R and Hadoop Integrated Programming Environment (RHIPE) is a free and open source extend. RHIPE is broadly utilized for performing Big Data examination with D&R examination. It allows running a MapReduce job within R. RHIPE is an integrated programming environment that is created by the Divide and Recombine (D and R) for analyzing a lot of information. D&R analysis is utilized to isolate enormous information, prepare it in parallel on a distributed system to create the output, and finally recombine all intermediate output into a set.

3.      ORCH – This is the Oracle R Connector which can be utilized to solely work with Big Data in Oracle machine or on the non-Oracle system like Hadoop.


4.      Hadoop Streaming: This is the R Script accessible as a component of the R bundle on CRAN. This plans to make R more open to Hadoop streaming applications. Utilizing this you can compose MapReduce programs in a dialect other than Java. In different words to coordinate an R work with Hadoop and see it running in a MapReduce mode, Hadoop underpins Streaming APIs for R. These Streaming APIs essential help running any script that can get to and work with standard I/O in a map-reduce mode. Along these lines, in the event of R, there wouldn't be any unequivocal customer side combination finished with R.


Creating Compelling Pie Charts in Looker: A Step-by-Step Guide with Examples

Creating Compelling Pie Charts in Looker: A Step-by-Step Guide with Examples   In the realm of data visualization, pie charts are a clas...