Pages

Saturday, May 10, 2014

Cloudera Distribution of Hadoop CDH4 Installation



Creating Databases for the Cloudera Manager Services

Create and configure MySQL databases for the Cloudera Management Services (Activity Monitor, Service Monitor, Host Monitor, and Report Manager), the Hive Metastore and Cloudera Navigator (optional). The databases must be configured to support UTF-8 character set encoding.

For performance reasons, it’s generally a good idea to have the Hive Metastore Server on the database server.

By default, Cloudera Manager uses Derby for Oozie’s database and SQLite for Hue’s database. You can configure MySQL after Cloudera Manager is installed.

1.       Connect to MySQL on myhost1 as the root user.
$ mysql -u root -p
Enter password:

2.       Create a database for the Activity Monitor.
mysql> create database amon DEFAULT CHARACTER SET utf8;
mysql> grant all on amon.* TO 'amon'@'%' IDENTIFIED BY 'passwd';

3.       Create a database for the Service Monitor.
mysql> create database smon DEFAULT CHARACTER SET utf8;
mysql> grant all on smon.* TO 'smon'@'%' IDENTIFIED BY 'passwd';

4.       Create a database for the Host Monitor.
mysql> create database hmon DEFAULT CHARACTER SET utf8;
mysql> grant all on hmon.* TO 'hmon'@'%' IDENTIFIED BY 'passwd';

5.       Create a database for the Report Manager.
mysql> create database rman DEFAULT CHARACTER SET utf8;
mysql> grant all on rman.* TO 'rman'@'%' IDENTIFIED BY 'passwd';

6.       Create a database for Cloudera Navigator (optional).
mysql> create database nav DEFAULT CHARACTER SET utf8;
mysql> grant all on nav.* TO 'nav'@'%' IDENTIFIED BY 'passwd';

7.       Create a database for the Hive metastore. Create a separate metastore for each Hive service, if you have more than one.
mysql> create database hive DEFAULT CHARACTER SET utf8;
mysql> grant all on hive.* TO 'hive'@'%' IDENTIFIED BY 'passwd';

8.       Create a database for Hue.
mysql> create database hue DEFAULT CHARACTER SET utf8;
mysql> grant all on hue.* to 'hue'@'%' IDENTIFIED BY 'passwd';

9.       Create a database for Oozie.
mysql> create database oozie DEFAULT CHARACTER SET utf8;
mysql> grant all on oozie.* to 'oozie'@'%' IDENTIFIED BY 'passwd';


Installing the MySQL JDBC Connector

Install the JDBC connector on the Cloudera Manager Server host, as well as hosts to which you assign the Activity Monitor, Service Monitor, Host Monitor, Report Manager, Hive Metastore, and Cloudera Navigator roles. In this case, all are on same host.

Download JDBC Driver for MySQL from http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.35.tar.gz
# tar zxvf  mysql-connector-java-5.1.35.tar.gz
# cp  mysql-connector-java-5.1.35/mysql-connector-java-5.1.35-bin.jar /usr/share/java/
# ln -s /usr/share/java/mysql-connector-java-5.1.35-bin.jar /usr/share/java/mysql-connector-java.jar

Add the Cloudera Manager Repository

# cd /etc/yum.repos.d

Installing the Cloudera Manager Server

Install the Cloudera Manager Server either on the machine where the database is installed, or on a machine that has access to the database. This machine need not be a host in the cluster that you want to manage with Cloudera Manager. The Cloudera Manager Server does not require CDH4 to be installed on the same machine.

Note :

1.       Cloudera Manager requires Hadoop to be installed on all hosts, but Hadoop must not be configured and must not be running.

2.       The Activity Monitor in Cloudera Manager 4.0 requires the hue-plugins package to be installed on the JobTracker host, regardless of whether you are using Hue. If you are using Hue, the hue-plugins package must be installed on all hosts.

# yum -y install cloudera-manager-daemons cloudera-manager-server

Configuring the Database for the Cloudera Manager Server

As we are not using the embedded database, remove /etc/cloudera-scm-server/db.mgmt.properties.

Enable Cloudera Manager Server to connect to external database by running the script on Cloudera Manager Server host.

# /usr/share/cmf/schema/scm_prepare_database.sh mysql -h localhost -u abitra -p  --scm-host localhost scm scm scm

Verifying that we can write to /etc/cloudera-scm-server
Creating SCM configuration file in /etc/cloudera-scm-server
Executing:  /usr/java/jdk1.6.0_31/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
[                          main] DbCommandExecutor              INFO  Successfully connected to database.
All done, your SCM database is configured correctly!

Retrieving the Database Host, User Name, or Password

# cat /etc/cloudera-scm-server/db.properties

Start the Cloudera Manager Server

# service cloudera-scm-server start

Configuring Services

Login Cloudera Manager Admin Console at http://myhost1.example.com:7180/

The default credentials are Username: admin Password: admin

In Welcome to Cloudera page, click Continue.

Please restart Cloudera Manager to allow the new license to take effect. Existing clusters or services will be unaffected by the restart of Cloudera Manager.

# service cloudera-scm-server restart
Stopping cloudera-scm-server:                              [  OK  ]
Starting cloudera-scm-server:                              [  OK  ]

After the Cloudera Manager server restarts, login again.

The Cloudera Manager will enable to choose the packages for the below services. Click Continue.

·         Apache Hadoop (Common, HDFS, MapReduce, YARN)
·         Apache HBase
·         Apache ZooKeeper
·         Apache Oozie
·         Apache Hive
·         Hue (Apache licensed)
·         Apache Flume
·         Cloudera Impala (Apache licensed)
·         Apache Sqoop
·         Cloudera Search (Apache licensed)

Click Continue to proceed with the installation.

Specify hosts for your CDH cluster installation including Cloudera Manager Server host and then click Search to find the cluster hosts.

myhost1, myhost2, myhost3, myhost4, myhost5

Cloudera Manager identified the hosts to configure them for CDH.
Select the hosts where you want to install CDH and click Continue.

On Cluster Installation page, select repository type you want to use for the installation. Choose Use Parcels. Under More Options, you can add repository for previous versions.
Select the specific releases of Impala and Solr to install on your hosts. If you do not want to install those products, choose None.
Select the specific release of Cloudera Manager Agent you want to install on your hosts.
Cloudera Manager and Cloudera Distribution of Hadoop (CDH) are comprised of a set of services. These services interact among each other and use databases to complete tasks.

Provide SSH Login credentials for root.

Cloudera manager daemons, cloudera manager agent and jdk get installed on the previous selected hosts. Click Continue.

Selected parcels get installed on the previous selected hosts. Click Continue.

Choose cluster services you want to install. You can choose one of the standard combinations: Core
Hadoop, Real-Time Delivery (previously known as HBase Services), Real-Time Query (which includes HDFS, Hive and Impala), or All Services; these combinations take into account the dependencies between the Hadoop services. Alternatively, you can choose Custom Services, and select the services individually.
                                 
Note:
Some services depend on others; for example, HBase requires HDFS and ZooKeeper.
The Cloudera Management Services, which are added to each package, are Cloudera Manager processes that run to support monitoring and management features in Cloudera Manager. Cloudera Navigator is a system to support enforcement of compliance with company policies for data stored in a Hadoop Distributed File System (HDFS) deployment.

After the selection of the services, customize the role assignments for each node in the cluster. Click Inspect Role Assignments.

On the Database Setup page, enter the information for the Service Monitor, Activity Monitor, Host Monitor, Report Manager, and Hive metastore databases. Click Test Connection to confirm that Cloudera Manager can communicate with the databases. This transaction takes two heartbeats to complete (about 30 seconds with the default heartbeat interval). If the test succeeds in all cases, click Continue; otherwise check and correct the information you have provided for the databases and then try the test again.

On Review Configuration Changes page, confirm the settings entered for file system paths, such as the NameNode Data Directory and the DataNode Data Directory. Supply the name of the mail server (it can be localhost), the mail server user, and the mail recipients.

The wizard starts the services on your cluster.

When all of the services are started, click Continue.

Start the Cloudera Manager Agent

# service cloudera-scm-agent start
Starting cloudera-scm-agent:                               [  OK  ]

Change the Default Administrator Password

Change the default administrator password as soon as beginning to use Cloudera Manager.

1.       From the Administration tab, select Users.
2.       Click the Change Password button next to the admin account.
3.       Enter a new password twice and then click Update.

Specifying the Racks for Hosts

Cloudera Manager includes internal rack awareness scripts, but you must specify the racks where the hosts in your cluster are located. If your cluster contains more than 10 hosts, Cloudera recommends that you specify the rack for each host. HDFS and MapReduce will automatically use the racks you specify.

1.       Click the Hosts tab.
2.       Select the host(s) for a particular rack.
3.       From Actions for Selected tab, click Assign Rack.
4.       Enter new rack name such as /rack1 and then click Confirm.

After assigning racks, restart affected services.

Checking Host Heartbeats

By default, every Agent must heartbeat successfully every 15 seconds.

1.       Click the Hosts tab.
2.       See a list of all the hosts along with the value of Last Heartbeat.

Configure MySQL JDBC driver for Hive

# ln -s /usr/share/java/mysql-connector-java.jar /usr/hive/lib/hive/mysql-connector-java.jar

Enabling Oozie Web Console

1.       Download ext-2.2.zip from http://extjs.com/deploy/ext-2.2.zip
2.       Extract the contents of the file to /usr/lib/oozie/libext on the Oozie server.
3.       On Oozie Service page, select Configuration > View and Edit.
4.       Check Enable Oozie Server Web Console.
5.       Click on Save Changes.
6.       Restart the Oozie Service.

Configuring MySQL for Oozie

By default, Cloudera Manager uses Derby for Oozie Database.

1.       On Oozie Service page, select Configuration > View and Edit.
2.       In the Category Pane, expand Oozie Server (Default) and click Database.
3.       Specify the settings for Oozie Server Database Type, Oozie Server Database Name, Oozie Server Database Host, Oozie Server Database User, and Oozie Server Database Password.
4.       Create symlink for MySQL connector.
# ln -s /usr/share/java/mysql-connector-java.jar /var/lib/oozie/mysql-connector-java.jar
5.       Start the Oozie Service.

Configuring MySQL for Hue

By default Cloudera Manager uses SQLite for Hue Database.

1.       From The Hue service instance page, click Actions > Stop. Confirm you want to stop the service by clicking Stop.
2.       Click Configuration > View and Edit. In the Category Pane, expand Service-Wide and click Database.
3.       Specify the settings for Hue Database Type, Hue Database Hostname, Hue Database Port, Hue Database Username, Hue Database Password, and Hue Database Name.
4.       Restart the Hue service.

7 comments:

Unknown said...

Cloud is one of the tremendous technology that any company in this world would rely on(cloud computing training). Using this technology many tough tasks can be accomplished easily in no time. Your content are also explaining the same(Cloud computing training institutes in chennai). Thanks for sharing this in here. You are running a great blog, keep up this good work.

Unknown said...

I was just wondering how I missed this article so far, this is a great piece of content I have ever seen in the entire Internet. Thanks for sharing this worth able information in here and do keep blogging like this.

Hadoop Training Chennai | Big Data Training Chennai | Big Data Training in Chennai

Unknown said...

Informative blog.. After reading this blog i am very strong and clear in this topic and having clear explanation so easy to understand

best big data training | hadoop training institute in chennai | big data training institute in chennai

Unknown said...

After reading this blog i very strong in this topics and this blog really helpful to all... explanation are very clear so very easy to understand... thanks a lot for sharing this blog

hadoop training and placements | big data training and placements

Alfred Avina said...

If we consider the Big data modernization solutions, then adaptive learning is an excellent way to make it successful.

Khalif Mohammed said...
This comment has been removed by the author.
Anonymous said...

Well somehow I got to read lots of articles on your blog.

Big Data Hadoop Training In Chennai | Big Data Hadoop Training In anna nagar | Big Data Hadoop Training In omr | Big Data Hadoop Training In porur | Big Data Hadoop Training In tambaram | Big Data Hadoop Training In velachery

Post a Comment