Pages

Saturday, January 14, 2017

Install Hadoop, Hive and Spark in Cluster

Install Hadoop in cluster

On all Nodes

Update /etc/hosts
192.168.1.10 master-node-01
192.168.1.11 core-node-01
192.168.1.12 core-node-02

Uncomment in /etc/ssh/sshd_config
PasswordAuthentication yes

Configure passwordless SSH on master-node-01 only.

$ ssh-keygen
$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@core-node-01
$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@core-node-02

Install JDK

$ cd /tmp
$ wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http://www.oracle.com/; oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u151-b12/e758a0de34e24606bca991d704f6dcbf/jdk-8u161-linux-x64.rpm
$ sudo yum localinstall jdk-8u161-linux-x64.rpm

You can also install on all nodes by looping them in the same command.

$ cat ~/hosts.lst
master-node-01
core-node-01
core-node-02

$ for each in `cat ~/hosts.lst`;do echo $each; scp jdk-8u161-linux-x64.rpm $each:/tmp; done
$ for each in `cat ~/hosts.lst`;do echo $each; ssh -t -q $each "sudo su - -c 'sudo yum -y localinstall /tmp/jdk-8u161-linux-x64.rpm'";done

$ wget http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip
$ unzip jce_policy-8.zip
$ sudo cp /tmp/*policy.jar /usr/java/jdk1.8.0_161/jre/lib/security/

$ for each in `cat ~/hosts.lst`;do echo $each; scp /tmp/*policy.jar $each:/tmp; done
$ for each in `cat ~/hosts.lst`;do echo $each; ssh -t -q $each "sudo su - -c 'sudo cp /tmp/*policy.jar /usr/java/jdk1.8.0_161/jre/lib/security/'";done

$ ls -l /usr/java
total 4
lrwxrwxrwx 1 root root   16 Jul 22  2014 default -> /usr/java/latest
drwxr-xr-x 9 root root 4096 Mar  6 19:20 jdk1.8.0_161
lrwxrwxrwx 1 root root   22 Mar  6 19:20 latest -> /usr/java/jdk1.8.0_161

Install Hadoop

$ cd /tmp
$ wget http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.6/hadoop-2.7.6.tar.gz
$ cd /opt
$ sudo tar zxvf /tmp/hadoop-2.7.6.tar.gz
$ sudo ln -s hadoop-2.7.6 hadoop

Edit .bashrc

export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/opt/hadoop
PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/bin

Edit hadoop-env.sh
$ cd $HADOOP_HOME/etc/hadoop
export JAVA_HOME=/usr/java/latest

Edit slaves

core-node-01
core-node-02

Edit core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master-node-01</value>            --> hostname or HA enabled logical URI
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
</property>
</configuration>

hdfs-site.xml on the NameNode:
The NameNode stores its metadata and edit logs.

<configuration>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data1/dfs/nn</value>
</property>
</configuration>

$ sudo mkdir -p /data1/dfs/nn
$ sudo chown hdfs:hadoop /data1/dfs/nn
$ sudo chmod 700 /data1/dfs/nn

hdfs-site.xml on the DataNode:
Configure the disks on DataNode in a JBOD configuration. The DataNode stores HDFS blocks.

<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data1/dfs/dn,file:///data2/dfs/dn</value>
</property>
</configuration>

$ sudo mkdir -p /data1/dfs/dn /data2/dfs/dn
$ sudo chown hdfs:hadoop /data1/dfs/dn /data2/dfs/dn
$ sudo chmod 700 /data1/dfs/dn /data2/dfs/dn

Edit mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Edit yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
</configuration>

On master-node-01, format NameNode.

$ sudo -u hdfs hdfs namenode -format

Start Services

$ bin/start-dfs.sh
$ bin/start-yarn.sh

Create the /tmp directory.
$ sudo -u hdfs hdfs dfs -mkdir /tmp
$ sudo -u hdfs hdfs dfs -chmod -R 1777 /tmp

Check daemons on Master
$ jps
NameNode
SecondaryNameNode
ResourceManger

Check daemons on Slaves
$ jps
DataNode
NodeManager

HDFS NameNode                   : http://master-node-01:50070/
HDFS DataNode                     : http://core-node-01:50075/
YARN Resource Manager      : http://master-node-01:8088/
YARN NodeManager              : http://core-node-01:8042/
MapReduce JobHistoryServer : http://master-node-01:19888/

Install MySQL and JDBC Connector

On NameNode, install MySQL as per instructions at http://anandbitra.blogspot.com/2014/08/installing-mysql-server-on-centos-7.html

mysql> create database metastore DEFAULT CHARACTER SET utf8;
mysql> grant all on metastore.* TO 'hive'@'%' IDENTIFIED BY 'passwd';

Install the JDBC connector
Download JDBC Driver for MySQL from http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.35.tar.gz
$ tar zxvf  mysql-connector-java-5.1.35.tar.gz
$ sudo cp  mysql-connector-java-5.1.35/mysql-connector-java-5.1.35-bin.jar /usr/share/java/
$ sudo ln -s /usr/share/java/mysql-connector-java-5.1.35-bin.jar /usr/share/java/mysql-connector-java.jar

Install Hive in cluster

$ cd /tmp
$ wget http://apache.cs.utah.edu/hive/hive-2.3.2/apache-hive-2.3.2-bin.tar.gz
$ cd /opt
$ sudo tar zxvf /tmp/apache-hive-2.3.2-bin.tar.gz
$ sudo ln -s apache-hive-2.3.2 hive

Edit .bashrc

export HIVE_HOME=/opt/hive
PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin
export CLASSPATH=/opt/hadoop/lib/*:/opt/hive/lib/*

Edit hive-env.sh

$ cp hive-env.sh.template hive-env.sh
export JAVA_HOME=/usr/java/latest
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

Edit hive-site.xml

<configuration>
   <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
      <description>metadata is stored in a MySQL server</description>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
      <description>MySQL JDBC driver class</description>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>hive</value>
      <description>user name for connecting to mysql server</description>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>passwd</value>
      <description>password for connecting to mysql server</description>
   </property>
</configuration>

$ hive
hive> use default;
hive> show tables;

Install Spark in cluster

$ cd /tmp
$ wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz
$ cd /opt
$ sudo tar zxvf /tmp/spark-2.2.0-bin-hadoop2.7.tgz
$ sudo ln -s spark-2.2.0-bin-hadoop2.7 spark

Edit .bashrc

export SPARK_HOME=/opt/spark
PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin

Edit spark-env.sh

$ cp spark-env.sh.template spark-env.sh
export JAVA_HOME=/usr/java/latest
export SPARK_WORKER_CORES=4
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

Add Slaves

$ cd $SPARK_HOME/conf

core-node-01
core-node-02

Start Spark Services

$ sbin/start-all.sh

Check daemons on Master
$ jps
Master

Check daemons on Slaves
$ jps
Worker

Spark HistoryServer : http://master-node-01:18080/

Configure Hive execution engine to use Spark
set hive.execution.engine=spark

$ pyspark

SparkContext available as sc, HiveContext available as sqlContext.
>>> from pyspark.context import SparkContext
>>> from pyspark.sql import HiveContext
>>> sqlContext = HiveContext(sc)
>>> sqlContext.sql("use default")
DataFrame[result: string]
>>> sqlContext.sql("show tables").show()


Tuesday, January 5, 2016

Nginx and Tomcat


Nginx is an open source, high performance reverse proxy, load balancer, and content cache, as well as providing extra layer of security for applications.

Add nginx Repository to /etc/yum.repos.d/nginx.repo on web server.

[nginx]
name=nginx repo
baseurl=http://nginx.org/packages/mainline/rhel/6/$basearch/
enabled=1
gpgcheck=1
gpgkey=http://nginx.org/keys/nginx_signing.key

Install nginx
$ sudo yum install nginx

Start nginx
$ sudo service nginx start

Ensure nginx starts at boot.
$ sudo chkconfig --list nginx
nginx           0:off   1:off   2:on    3:on    4:on    5:on    6:off

Adjust the parameters in /etc/nginx/nginx.conf for optimal performance.

worker_processess 1;   --> set to number of CPUs.
worker_rlimit_nofile 200000;
access_log off;
send_file on;
tcp_nopush on;
tcp_nodelay off;
keepalive_requests 100000;
keepalive_timeout 30;
reset_timedout_connection on;
client_body_timeout 10;
send_timeout 2;
gzip on;

Restrict access to nginx server

$ sudo iptables -I INPUT -p tcp -s 0.0.0.0/0 --dport 443 -j ACCEPT


Setup locally-mounted DVD as yum repository

Mount DVD media
$ mount -o loop /dev/sr0 /mnt

$ sudo cp /mnt/media.repo /etc/yum.repos.d/rhel6dvd.repo
$ sudo chmod 644 /etc/yum.repos.d/rhel6dvd.repo

Change gpgcheck=0 setting to 1 and add the below lines.

enabled=1
baseurl=file:///mnt/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

Clear the related caches.
$ sudo yum clean all
$ sudo subscription-manager clean

Install memcached 
$ sudo yum install php-pecl-memcached memcached

Change CACHESIZE and -l in /etc/memcached.conf
Locate -m parameter and change its value to at least 1GB
Locate -l parameter and change its value to 127.0.0.1 or localhost.

Start memcached
$ sudo service memcached start

Ensure memcached starts at boot.
$ sudo chkconfig memcached on

Nginx comes with built-in memcached module to obtain responses from a memcached server. If the content does not exist in cache, the module will raise an error which we catch and redirect to application server for processing.

# cat /etc/nginx/sites-available/default
server {
    listen 80;
    server_name domain;

    location / {
        set  $memcached_key    "$uri?$args";
        memcached_pass    127.0.0.1:11211;
        error_page        404 502 504 = @cache_miss;
    }
    location @cache_miss  {
        proxy_pass  http://app_server:8080/;
    }
}

Load Balancing

The requests are proxied to the application server group myapp1 in round-robin.

upstream myapp1 {
        server appsrv1.example.com;
        server appsrv2.example.com;
        server appsrv3.example.com;
    }

Create a Self-Signed SSL Certificate

Generate a Private Key
$ sudo openssl genrsa -des3 -out server.key 2048

Generate a CSR
$ sudo openssl req -new -key server.key -out server.csr

Remove Passphrase from Key
$ sudo openssl rsa -in server.key.org -out server.key

Generate Self-Signed Certificate
$ sudo openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

Hardening SSL Configuration

    ssl_session_cache shared:SSL:20m;
    ssl_session_timeout 10m;

    ssl_prefer_server_ciphers     on;
    ssl_protocols       TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers     ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS;
    add_header Strict-Transport-Security "max-age=31536000";


For the domains domain1 and domain2 to deliver content from different ROOT contexts for app1 and app2 with corresponding virtual hosts app1_host and app2_host respectively, deploy each application on tomcat instance under /opt/tomcat and then put a reverse proxy in front of them.

Create the file /etc/nginx/conf.d/domain1.conf with the following contents. Configure as above to use memcached and add hardening SSL after ssl_certicate_key.

server {
    listen    443;
    ssl  on;
    ssl_certificate     /etc/ssl/domain1.crt;
    ssl_certificate_key /etc/ssl/domain1.key;

    server_name  domain1;
    access_log /var/log/nginx/domain1.access.log;
    error_log /var/log/nginx/domain1.error.log;

    location / {
       proxy_set_header X-Real-IP  $remote_addr;
       proxy_set_header X-Forwarded-For $remote_addr;
       proxy_set_header Host $host;
        proxy_pass http://app1_host:8080/;
    }
}

And then create the file /etc/nginx/conf.d/domain2.conf with the following contents:

server {
    listen    443;
    ssl  on;
    ssl_certificate     /etc/ssl/domain2.pem;
    ssl_certificate_key /etc/ssl/domain2.key;

    server_name  domain2;
    access_log /var/log/nginx/domain2.access.log;
    error_log /var/log/nginx/domain2.error.log;

    location / {
       proxy_set_header X-Real-IP  $remote_addr;
       proxy_set_header X-Forwarded-For $remote_addr;
       proxy_set_header Host $host;
        proxy_pass https://app2_host:8080/;
    }
}

and then restart nginx.
$ sudo service nginx restart

Download Oracle JDK 8 and install on Application server

$ sudo wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u66-b17/jdk-8u66-linux-x64.rpm"

$ sudo rpm -ivh jdk-8u66-linux-x64.rpm*

Download Tomcat 8 and install on Application server

$ cd /opt
$ sudo wget ftp://apache.cs.utah.edu/apache.org/tomcat/tomcat-8/v8.0.30/bin/apache-tomcat-8.0.30.tar.gz
$ sudo tar -xvf apache-tomcat-8.0.30.tar.gz
$ sudo ln -s apache-tomcat-8.0.30 tomcat

Access manager webapp by adding the following to $CATALINA_HOME/conf/tomcat-users.xml

  <role rolename="manager-gui"/>
  <user username="tomcat" password="s3cret" roles="manager-gui"/>

Start Tomcat server
$ sudo  /opt/tomcat/bin/startup.sh

If Tomcat is running successfully, you can see the Tomcat Welcome page at http://localhost:8080/

Configure SSL (Optional) 

Tomcat listens on port 8080. It can be found in /opt/apache-tomcat/conf/server.xml. Comment the following and configure SSL.

           <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />

Create a local self-signed certificate.
$ sudo keytool -genkey -alias tomcat -keystore conf/keystore.jks -keyalg RSA -keysize 2048
$ sudo keytool -list -keystore keystore.jks

Uncomment https and add keystore info to server.xml

<Connector port="8443" protocol="org.apache.coyote.http11.Http11NioProtocol"
               maxThreads="150" SSLEnabled="true" scheme="https" secure="true"
               clientAuth="false" sslProtocol="TLS"
               keystoreFile="/conf/keystore.jks"
               keystorePass="changeit" />

Force webapp to work on SSL, add the following code to web.xml.

<security-constraint>
   <web-resource-collection>
        <web-resource-name>securedapp</web-resource-name>
        <url-pattern>/*</url-pattern>
    </web-resource-collection>
    <user-data-constraint>
        <transport-guarantee>CONFIDENTIAL</transport-guarantee>
    </user-data-constraint>
</security-constraint>

Configure Virtual Hosting

copy app1.war and app2.war in app1 and app2 folders respectively in /opt/tomcat

Virtual Hosting - Edit Engine portion in /opt/tomcat/server.xml

     <Host name="app1_host"  appBase="app1"
            unpackWARs="true" autoDeploy="true">
        <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
               prefix="domain1_access_log" suffix=".txt"
               pattern="%h %l %u %t "%r" %s %b" />
      </Host>
     <Host name="app2_host"  appBase="app2"
            unpackWARs="true" autoDeploy="true">
        <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
               prefix="domain2_access_log" suffix=".txt"
               pattern="%h %l %u %t "%r" %s %b" />
     </Host>
     &lt/Engine>
    
Restart tomcat server.

Auto stop/start tomcat


# cd /etc/rc.d/init.d
# cat tomcat
#!/bin/bash
# description: Tomcat Start Stop Restart
# processname: tomcat
JAVA_HOME=/opt/jdk1.8.0_66
export JAVA_HOME
PATH=$JAVA_HOME/bin:$PATH
export PATH
CATALINA_HOME=/opt/tomcat

case $1 in
start)
sh $CATALINA_HOME/bin/startup.sh
;;
stop)
sh $CATALINA_HOME/bin/shutdown.sh
;;
restart)
sh $CATALINA_HOME/bin/shutdown.sh
sh $CATALINA_HOME/bin/startup.sh
;;
esac
exit 0
 
# chmod 755 tomcat
# chkconfig --add tomcat
# chkconfig tomcat on
# chkconfig --list