Wednesday, June 15, 2016

Introduction to Qlikview

What is Qlikview?

Qlikview is a business intelligence reporting tool which is currently new in the market. It is simple software which is used for visually analyzing the relationship between data. It was founded by QlikTech which is a Radnor (Pennsylvania) based software company founded in 1993. The first version of Qlikview was released on 2005 which as Qlikview 7.0

Qlikview has more than 15,000 customers in 100 countries. They have more than 650 employees across 22 offices in 24 countries.


Why Qlikview?
  • Create a flexible end user interface to an information ware house
  • Get snapshots of data relations
  • Make presentation based on your data
  • Create dynamic graphical charts and tables
  • Perform statistical analysis
  • Link descriptions and multimedia to your data
  • Build your own expert systems
  • Create new tables, merging information from several sources
  • Build your own business intelligence system

What are the silent features of Qlikview?
  • Data association is maintained automatically
  • Data is held in memory
  • Direct and Indirect searches can be done
  • Data is compressed to 10% of its original size
  • Visual relationship using colors
Architecture of Qlikview
Qlikview is patented technology, so we need to understand how QlikView works. Qlikview architecture consists two parts:
  •               Front End
  •        Back End

Front End:

The Front End is where end user interact with the QlikView documents and data. It contains the Qlikview Server which is used by developers and business users to access the BI reports. The user documents seen on front End are QVW, .meta and .shared documents which is also be stored in the windows OS as a standalone document. All the communication between Client server and user is handled either HTTP or via Qlikview server. In front End Qlikview server is responsible for client security. 

Back End (Including Infrastructure Resources)
In Back End all QlikView source documents is created by Qlikview developer. These sources files contains scripts within QVW files through which we can load data from different sources such as Data warehouses, Excel, SAP, Salesforces.com etc. Qlikview backend consists of two parts:
  •           Qlikview Desktop
  •          Qlikview publisher
 Qlikview Desktop
Qlikview Desktop is a windows-based desktop tools that is used by Analyst, Developers to extract and transform data from different sources. The files created by Qlikview desktop are stored with an extension .qvw and these files are passed on to the Qlikview server. These files are also used to create graphical user interface. 

 Qlikview Publisher (QVS)
Qlikview publisher is used to distribute the Qlikview file(.qvw) to the Qlikview servers and users.It also does the direct  loading of data from different data sources (oledb/odbc, xml, xls) reduces the QlikView application and distributes to a QVS.

Qlikview uses an associative in memory technology that provides facility to users to analyze and process data very quickly. It stored the unique entries once in memory and then everything else are pointers to the parent data.
At the beginning when we load data into QlikView, Qlikview will load a Qlikview document from hard disk and then place the entire dataset into RAM. That’s why Main memory is the primary storage location for all data to be analyzed by QlikView. 

How to Get Qlikview ?

Qlikview can be easily downloaded from here Download Qlikview

You just need to fill the form and submit the application. Afer that you need yo accept the aggrement and click on Download button. Qlikview personal edition is automatically downloaded and then ou need to perform some simple steps to install this software in windows.







Friday, June 10, 2016

How to install RHBASE Package in R

RHBase Package allows R developer to connect Hadoop HBASE to R using Thrift Server. Even Developers can read, write, and modify tables stored in HBase from R. Installation of RHBase Package requires that you first install and build Thrift server. 

So let see how to install HBase thrift server on centos:

  • Before install HBase thrift server, you need to install all required tools, libraries and apache based Thrift compiler on Linux based system.
          sudo yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel
  • Then Download Thrift tar file, type following command in terminal
  •  Navigate to the directory where tar file is downloaded.
cd Downloads
  •  Unzip the tar file
           sudo tar -xvf thrift-0.9.0.tar.gz
  • Change permissions
           sudo chmod 777 -R thrift-0.9.0
  • Move thrift directory to another location
           sudo mv thrift-0.9.0 /home/cloudera
  •   Enter into thrift directory and configure the environment variables using following commands
           cd thrift-0.9.0
           sudo ./configure
           sudo make
           sudo make install

  •  Now your Thrift installation is completed! To verify that you have successfully installed Thrift just type:
          thrift   -version
  • .Copy thrift libraries from /usr/local/lib to /usr/lib
          sudo cp /usr/local/lib/libthrift.so  /usr/lib
          sudo cp /usr/local/lib/libthrift-0.9.0.so  /usr/lib
  •  Update PKG_CONFIG_PATH in bashrc file  by typing the following command in terminal
          sudo gedit $HOME/.bashrc
  •   Next Verify pkg-config path is correct
                 pkg-config –cflags thrift
  • .          Now  start hbase, Go to hbase directory and type following command
               cd  /home/cloudera/hbase
               bin/start-hbase.sh
  • .         Start thrift server 
        bin/hbase thrift start
  • .         Check thrift server is running or not, go to browser and type, 
                http://localhost:9095/thrift.jsp



         Install rhbase Package in R    

           To install rhbase package first of all we need to download the package.


         Then Set the system environment path in R commander

                 Sys.setenv("PKG_CONFIG_PATH"="/usr/local/lib/pkgconfig")
                 install.packages(“rhbase_1.2.0.tar.gz”,repos=NULL, type=‘source’)
                 library(rhbase)
   
        This way you can install rhbase packages in R


  
  


















Thursday, June 9, 2016

Install R and Rstudio on Centos Linux

R is an analytical tool and Hadoop is an open source framework that allows us to store and process large amount of data in distributed environment.

To integrate R with Hadoop, we need to install R in Linux environment because Hadoop works in Linux environment. Now we will describe some steps to install R and Rstudio in Centos. 

Step 1:  Before install R in Centos we need to install some libraries

            sudo yum install gfortran
sudo yum install build-essential
sudo yum install gcc-gfortran.x86_64
sudo yum install libX11-devel.x86_64
sudo yum install libXt-devel
sudo yum install readline-devel

Step 2:  Now Install R

          sudo yum install R

Step 3: Check whether R is install or not, to start R type following command in terminal

 sudo R

Through this command you will see R interface is open in Command prompt.

Step 4: After that we need to download Rstudio server, type following command in terminal

sudo wget https://download2.rstudio.org/rstudio-server-rhel-0.99.902-x86_64.rpm

Step 5: Enter into Download directory
             cd Download
             sudo cp /Downloads /home/cloudera

Step 6: Give Permissions and install Rstudio server

            sudo chmod777 /home/cloudera/rstudio-server-rhel-0.99.902-x86_64.rpm
sudo yum install --nogpgcheck rstudio-server-rhel-0.99.902-x86_64.rpm

Step 7: Start Rstudio server, open terminal and type,

            sudo Rstudio-server start

Step 8: Access Rstudio server on browser, go to browser type,

            <Your ip address>:8787
localhost:8787

Step 9: Stop Rstudio server

            sudo Rstudio-server stop

Now you can easily install R packages. 

Wednesday, June 8, 2016

Simple Linear Regression in R

In analytics one of the most common techniques is regression. In simplest form regression is used to determine relationship between two variables, it tells us what we can expect from the other variable.

Linear regression is one of the part of regression techniques. Basically linear regression is based on Ordinary Least Square Regression. Before we go to any further, we clarify some terminologies.

Response Variable: It is the outcome variable which we are trying to predict.

Predictor Variable: It is the input variable which we are using to predict.

A simple linear regression model that describes the relationship between two variables x and y. where x is a dependent (i.e. response) variable, and y is independent (predictor) variable. According to the probability theory, if variable y is dependent on x then variable x cannot be independent of variable y. so we stick with the terms response and predictor exclusively.
Linear regression model can be expressed as:

Y = ax +b

Where,
Y is the response variable
x is the predictor variable
a   is the intercept
b is the slope

Steps to Implement Linear Regression in R

Let us take a case we have data named as father.son, using father heights we want to predict sons height using linear regression model.

Here, father heights are the predictors and sons heights are the responses.

require(UsingR)
require(ggplot2)
head(father.son)

##    fheight  sheight
## 1 65.04851 59.77827
## 2 63.25094 63.21404
## 3 64.95532 63.34242
## 4 65.75250 62.79238
## 5 61.13723 64.28113
## 6 63.02254 64.24221

To calculate linear regression we use lm function. lm function creates the relationship model between the response and the predictor variables.

heightsLM <- lm(sheight ~ fheight, data = father.son)
heightsLM
##
## Call:
## lm(formula = sheight ~ fheight, data = father.son)
##
## Coefficients:
## (Intercept)      fheight 
##     33.8866       0.5141

Here, we once again see the formula notation that specifies to regress sheight on fheight. The interpretation of this result is that for every extra inch of height in a father, we expect an extra half inch in height for his son.
The intercept in this case doesn’t make much sense because it represents the height of a son whose father had zero height. To understand more clearly we need to see a full report of the model

summary(heightsLM)

##
## Call:
## lm(formula = sheight ~ fheight, data = father.son)

## Residuals:
##     Min      1Q  Median      3Q     Max
## -8.8772 -1.5144 -0.0079  1.6285  8.9685
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 33.88660    1.83235   18.49   <2e-16 ***
## fheight      0.51409    0.02705   19.01   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.437 on 1076 degrees of freedom
## Multiple R-squared:  0.2513, Adjusted R-squared:  0.2506
## F-statistic: 361.2 on 1 and 1076 DF,  p-value: < 2.2e-16

This generate lots of information about model, including standard errors, t-test values and p-values for coefficient and so on. This is all diagnostic information to check the fit of the model.

ggplot(data=father.son , aes(x = fheight,y = sheight)) +  geom_point(col="darkgreen") + geom_smooth(method =  'lm')+ labs(x = "Father",y = "Sons")


In this graph blue line running through point is the regression line and grey brand around it represents the uncertainty in the fit. Basically linear regression is used for prediction purpose. 

Monday, June 6, 2016

Access MySQL Data using R


With the increasing prevalence of data in our daily lives, new and better tools are needed to analyze the deluge.  In addition to transforming and analyzing data, R can produce amazing graphics and reports with ease.
R is a programming language and free software environment for statistical computing and graphics. It is one of the best tool to perform analysis over big data. R is used by statisticians and data miners with advanced machine learning training and also by people who are not necessarily trained in advanced data analysis but are tired of using Excel.
In order to analyze data some Analyst and data scientist uses MYSQL as a database. One way to do this, is to extract the data from the database and import it into statistical software. The DBI package in R provides a uniform, client side interface to different database management systems, such as MySQL,
PostgreSQL, and Oracle. For example, the RMySQL package extends the DBI package to provide a MySQL driver and the detailed inner workings for the generic functions to connect, disconnect, and submit and track queries

The R code the user writes to establish a MySQL driver, connect to a MySQL database, and request results is the same code for all
SQL-standard database managers.
We provide a simple example here of how to extract data from a MySQL
Database in an R session.

Step 1:   To connect to a MySQL database simply install the package and load the   library.
# Install RMySQL Package
install.packages("RMySQL")

# Load library
library(RMySQL)

When you include the library of RMySQL. It may generate this error like unloadNamespace(package): namespace ‘DBI’ is imported by ‘twitteR’ so cannot be loaded.  
To solve this error we need to install another package i.e. DBI

Step 2:  Load a driver for a MySQL-type database:

# Load Driver
drv = dbDriver("MySQL")

Step 3: Create a connection to the database management server

# create a database connection object.
mydb = dbConnect(drv, user='root', password='12345', dbname='mysql', host='localhost')

Listing Tables and Fields:
Once the connection is established, queries can be sent to the database. So let discuss all function one by one
·        dbListTables(mydb)
It returns the names of the tables from the database.

# Display list of all tables from database
dbListTables(mydb)
[1] "columns_priv" "db" "engine_cost"
[4] "event" "func" "general_log"
[7] "gtid_executed" "help_category" "help_keyword"
[10] "help_relation" "help_topic" "innodb_index_stats"
[13] "innodb_table_stats" "ndb_binlog_index" "plugin"
[16] "proc" "procs_priv" "proxies_priv"
[19] "server_cost" "servers" "slave_master_info"
[22] "slave_relay_log_info" "slave_worker_info" "slow_log"
[25] "tables_priv" "time_zone" "time_zone_leap_second"
[28] "time_zone_name" "time_zone_transition" "time_zone_transition_type" [31] "user"

·        dbListFields(mydb, 'table_name')
It return a list of the fields of a table from database.

# Display fields of a table
dbListFields(mydb, 'user')
[1] "Host" "User" "Select_priv"
[4] "Insert_priv" "Update_priv" "Delete_priv"
[7] "Create_priv" "Drop_priv" "Reload_priv"
[10] "Shutdown_priv" "Process_priv" "File_priv"
[13] "Grant_priv" "References_priv" "Index_priv"
[16] "Alter_priv" "Show_db_priv" "Super_priv"
[19] "Create_tmp_table_priv" "Lock_tables_priv" "Execute_priv"
[22] "Repl_slave_priv" "Repl_client_priv" "Create_view_priv"
[25] "Show_view_priv" "Create_routine_priv" "Alter_routine_priv"
[28] "Create_user_priv" "Event_priv" "Trigger_priv"
[31] "Create_tablespace_priv" "ssl_type" "ssl_cipher"
[34] "x509_issuer" "x509_subject" "max_questions"
[37] "max_updates" "max_connections" "max_user_connections"
[40] "plugin" "authentication_string" "password_expired"
[43] "password_last_changed" "password_lifetime" "account_locked"

·        (mydb, 'drop table if exists some_table, some_other_table')

This function is basically used to create and insert data into table from R to mysql database.
# Create table from R into mysql database
dbSendQuery(mydb, "CREATE TABLE authors
(author_id INT AUTO_INCREMENT PRIMARY KEY,
            author_last VARCHAR(50),
            author_first VARCHAR(50),
            country VARCHAR(50));")

# Add Data into Author Table
dbSendQuery(mydb, "INSERT INTO authors
(author_last, author_first, country)
            VALUES('Kumar','Manoj','India');")

·        dbReadTable(mydb, "table_name", row.names = "user_id")

This function is used to read the table from mysql database into R

# Extract data from mysql to R
userstable <- dbReadTable(mydb, "authors")
userstable


Now we can also check in mysql command line either table that we have created using R into database is created or not.

MYSQL Commands
# View Database
show databases;

# use Database for example: we have a database named as mysql
use mysql;

# View tables inside of database
show tables;

# View data of a table
select * from authors;


This way we can easily load data from MySQL database to R and vice versa.




Creating Compelling Pie Charts in Looker: A Step-by-Step Guide with Examples

Creating Compelling Pie Charts in Looker: A Step-by-Step Guide with Examples   In the realm of data visualization, pie charts are a clas...