Pages

Saturday, September 13, 2014

MySQL Replication

Replication between Master and Slave servers is based on the binary logging mechanism. Master writes updates and changes as events to the binary log. Slaves read the binary log from the master and execute the events.

Install MySQL Server on both master and slave servers as shown at http://anandbitra.blogspot.com/2014/08/installing-mysql-server-on-centos-7.html

Configure the Master Database

Create a pseudo-user for replicating data and grant replication permission.
mysql > CREATE USER 'replica'@'%' IDENTIFIED BY 'password';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'replica'@'%';

Update the /etc/my.cnf file.

[mysqld]
server_id = 1
log_bin = /var/lib/mysql/mysql_binlog
binlog_format = mixed
binlog_do_db = <db_name>,<db_name>
#binlog_ignore_db = mysql
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1

Restart MySQL
# service mysqld restart

Acquire a global read lock to get backups.
mysql> FLUSH TABLES WITH READ LOCK;
In another session, obtain the current binary log file and position. This information is needed to setup the slave later.
mysql> SHOW MASTER STATUS;

Backup metastore database.
$ mysqldump -u root -p metastore > metastore.sql

mysql> UNLOCK TABLES;

Configure the Slave Database
  
Create database
mysql> CREATE DATABASE metastore;

Import metastore database exported on master server.
$ mysql -u root -p metastore < /tmp/metastore.sql

Update the /etc/my.cnf file.

[mysqld]
server-id = 2
log_bin = /var/lib/mysql/mysql_binlog
binlog_format = mixed
binlog_do_db = metastore
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
relay_log = /var/lib/mysql/mysql_relaylog

Restart MySQL
# service mysqld restart

Start the replication by setting binlog file and position.
mysql> CHANGE MASTER TO master_host='192.168.1.6', master_user='replica', master_password='vagrant', master_log_file=’mysql_binlog.000002’, master_log_pos=106;
mysql> START SLAVE;

Verify replication status

mysql> SHOW SLAVE STATUS\G

Saturday, August 9, 2014

Installing MySQL Server on CentOS 7



Requires root access on target server.

Download MySQL rpm-bundle at https://dev.mysql.com/downloads/file/?id=473261

After downloading the tarball, unpack with the following command.

# cd /tmp
# tar xvf MySQL-5.6.38-1.el6.x86_64.rpm-bundle.tar

 Install MySQL

         $ sudo yum localinstall MySQL-*

Update /etc/my.cnf

[mysqld]
datadir = /var/lib/mysql                                                                                                  
socket = /var/lib/mysql/mysql.sock
transaction-isolation = READ-COMMITTED
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links = 0

key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1

max_connections = 550
#expire_logs_days = 10
#max_binlog_size = 100M

log_bin=/var/lib/mysql/mysql_binlog

binlog_format = mixed

read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M

# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 1G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

Start MySQL Community Server.

         $ sudo systemctl start mysql
         OR
         $ sudo service mysql start

Ensure the MySQL Server starts at boot.
         $ sudo chkconfig mysql on
  
 Perform post install security activities. The root password is none.

   $ sudo /usr/bin/mysql_secure_installation


NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MySQL
      SERVERS IN PRODUCTION USE!  PLEASE READ EACH STEP CAREFULLY!

In order to log into MySQL to secure it, we'll need the current
password for the root user.  If you've just installed MySQL, and
you haven't set the root password yet, the password will be blank,
so you should just press enter here.

Enter current password for root (enter for none):
OK, successfully used password, moving on...

Setting the root password ensures that nobody can log into the MySQL
root user without the proper authorisation.

Set root password? [Y/n] Y
New password:
Re-enter new password:
Password updated successfully!
Reloading privilege tables..
 ... Success!


By default, a MySQL installation has an anonymous user, allowing anyone
to log into MySQL without having to have a user account created for
them.  This is intended only for testing, and to make the installation
go a bit smoother.  You should remove them before moving into a
production environment.

Remove anonymous users? [Y/n] Y
 ... Success!

Normally, root should only be allowed to connect from 'localhost'.  This
ensures that someone cannot guess at the root password from the network.

Disallow root login remotely? [Y/n] Y
 ... Success!

By default, MySQL comes with a database named 'test' that anyone can
access.  This is also intended only for testing, and should be removed
before moving into a production environment.

Remove test database and access to it? [Y/n] Y
 - Dropping test database...
ERROR 1008 (HY000) at line 1: Can't drop database 'test'; database doesn't exist

 ... Failed!  Not critical, keep moving...
 - Removing privileges on test database...
 ... Success!

Reloading the privilege tables will ensure that all changes made so far
will take effect immediately.

Reload privilege tables now? [Y/n] Y
 ... Success!


All done!  If you've completed all of the above steps, your MySQL
installation should now be secure.

Thanks for using MySQL!

Cleaning up...


Purge Bingary Logs

To delete binary logs before midnight 7 days ago.
mysql> PURGE BINARY LOGS BEFORE DATE(NOW() - INTERVAL 7 DAY) + INTERVAL 0 SECOND;

To keep 7 days worth of binary logs
mysql> SET GLOBAL expire_logs_days = 7;

and add to /etc/my.cnf
[mysqld]
expire_logs_days=7

When replication is configured, verify slave status and delete binary logs to the relay master log file.

mysql> PURGE BINARY LOGS TO ‘Relay_Master_Log_File_On_Slave’;

Saturday, May 24, 2014

Hadoop DistCp



DistCp (distributed copy) is a tool for large inter/intra-cluster copying.

Run the DistCp command for inter-cluster.
$ hadoop distcp hdfs://nn1:8020/basePath hdfs://nn2:8020/basePath
$ hadoop distcp hdfs://nn1:8020/basePath1 hdfs://nn1:8020/basePath2 hdfs://nn2:8020/basePath
$ hadoop distcp hdfs://nn1:8020/srclist hdfs://nn2:8020/basePath
Where srclist contains hdfs://nn1:8020/basePath1 and hdfs://nn1:8020/basePath2.

This will expand the namespace under /basepath on nn1 into a temporary file, partition its contents among a map tasks, and start a copy on each TaskTracker from nn1 to nn2.

Run the DistCp command on the destination cluster only for copying between different versions of Hadoop.

$ hadoop distcp hftp://cdh3-namenode:50070/ hdfs:/cdh4-namenode/

For specific path such as /hbase
$ hadoop distcp hftp://cdh3-namenode:50070/basePath hdfs:/cdh4-namenode/basePath

Where cdh3-namenode refers to the source NameNode hostname as defined by the config  fs.default.name and 50070 is the NameNode port as defined by the config dfs.http.address, and cdh4-namenode is destination NameNode as defined by the config fs.defaultFS. You can also use destination nameservice-id. basePath in both above URIs is the directory to copy.


Saturday, May 17, 2014

Running a MapReduce Job



WordCount is a simple application that counts the number of occurrences of each word in an input set.

1.       Create the input directory in HDFS.
# useradd cloudera
$ sudo su hdfs                                                                                    
$ hadoop fs -mkdir /user/cloudera
$ hadoop fs -chown cloudera /user/cloudera
$ exit
$ sudo su - cloudera
$ pwd
/home/cloudera
$ hadoop fs -mkdir /user/cloudera/wordcount /user/cloudera/wordcount/input

2.       Create sample text files and copy the files into HDFS under the input directory.
$ echo "Hello World Bye World" > file0
$ echo "Hello Hadoop Goodbye Hadoop" > file1
$ hadoop fs -put file* /user/cloudera/wordcount/input

3.       Create a java program.
$ vi WordCount.java

package org.myorg;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class WordCount {

public static class Map extends MapReduceBase implements Mapper {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}

public static class Reduce extends MapReduceBase implements Reducer {
public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}

4.       Compile WordCount.java.
$ mkdir wordcount_classes
$ javac -cp /opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop/client-0.20/* -d wordcount_classes WordCount.java

5.       Create a JAR.
$ jar -cvf wordcount.jar -C wordcount_classes/ .
added manifest
adding: org/(in = 0) (out= 0)(stored 0%)
adding: org/myorg/(in = 0) (out= 0)(stored 0%)
adding: org/myorg/WordCount$Map.class(in = 1938) (out= 798)(deflated 58%)
adding: org/myorg/WordCount$Reduce.class(in = 1611) (out= 649)(deflated 59%)
adding: org/myorg/WordCount.class(in = 1546) (out= 749)(deflated 51%)

6.       Run the application.
$ hadoop jar wordcount.jar org.myorg.WordCount /user/cloudera/wordcount/input /user/cloudera/wordcount/output
14/02/22 19:36:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/22 19:36:53 INFO mapred.FileInputFormat: Total input paths to process : 2
14/02/22 19:36:58 INFO mapred.JobClient: Running job: job_201402221622_0001
14/02/22 19:37:00 INFO mapred.JobClient:  map 0% reduce 0%
14/02/22 19:39:07 INFO mapred.JobClient:  map 33% reduce 0%
14/02/22 19:39:31 INFO mapred.JobClient:  map 67% reduce 0%
14/02/22 19:39:32 INFO mapred.JobClient:  map 100% reduce 0%
14/02/22 19:39:43 INFO mapred.JobClient:  map 100% reduce 100%
14/02/22 19:39:50 INFO mapred.JobClient: Job complete: job_201402221622_0001
14/02/22 19:39:50 INFO mapred.JobClient: Counters: 33
14/02/22 19:39:51 INFO mapred.JobClient:   File System Counters
14/02/22 19:39:51 INFO mapred.JobClient:     FILE: Number of bytes read=79
14/02/22 19:39:51 INFO mapred.JobClient:     FILE: Number of bytes written=651887
14/02/22 19:39:51 INFO mapred.JobClient:     FILE: Number of read operations=0
14/02/22 19:39:51 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/02/22 19:39:51 INFO mapred.JobClient:     FILE: Number of write operations=0
14/02/22 19:39:51 INFO mapred.JobClient:     HDFS: Number of bytes read=413
14/02/22 19:39:51 INFO mapred.JobClient:     HDFS: Number of bytes written=41
14/02/22 19:39:51 INFO mapred.JobClient:     HDFS: Number of read operations=7
14/02/22 19:39:51 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/02/22 19:39:51 INFO mapred.JobClient:     HDFS: Number of write operations=2
14/02/22 19:39:51 INFO mapred.JobClient:   Job Counters
14/02/22 19:39:51 INFO mapred.JobClient:     Launched map tasks=3
14/02/22 19:39:51 INFO mapred.JobClient:     Launched reduce tasks=1
14/02/22 19:39:51 INFO mapred.JobClient:     Data-local map tasks=3
14/02/22 19:39:51 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=210815
14/02/22 19:39:51 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=10176
14/02/22 19:39:51 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/02/22 19:39:51 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/22 19:39:51 INFO mapred.JobClient:   Map-Reduce Framework
14/02/22 19:39:51 INFO mapred.JobClient:     Map input records=2
14/02/22 19:39:51 INFO mapred.JobClient:     Map output records=8
14/02/22 19:39:51 INFO mapred.JobClient:     Map output bytes=82
14/02/22 19:39:51 INFO mapred.JobClient:     Input split bytes=360
14/02/22 19:39:51 INFO mapred.JobClient:     Combine input records=8
14/02/22 19:39:51 INFO mapred.JobClient:     Combine output records=6
14/02/22 19:39:51 INFO mapred.JobClient:     Reduce input groups=5
14/02/22 19:39:51 INFO mapred.JobClient:     Reduce shuffle bytes=117
14/02/22 19:39:51 INFO mapred.JobClient:     Reduce input records=6
14/02/22 19:39:51 INFO mapred.JobClient:     Reduce output records=5
14/02/22 19:39:51 INFO mapred.JobClient:     Spilled Records=12
14/02/22 19:39:51 INFO mapred.JobClient:     CPU time spent (ms)=2630
14/02/22 19:39:51 INFO mapred.JobClient:     Physical memory (bytes) snapshot=566894592
14/02/22 19:39:51 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2479079424
14/02/22 19:39:51 INFO mapred.JobClient:     Total committed heap usage (bytes)=280698880
14/02/22 19:39:51 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
14/02/22 19:39:51 INFO mapred.JobClient:     BYTES_READ=50

7.       View the results of running job by selecting Activities > mapreduce1 Jobs.

8.       Examine the output.
$  hadoop fs -cat /user/cloudera/wordcount/output/part-00000
Bye                         1
Goodbye 1
Hadoop   2
Hello        2
World      2

9.       Remove the output directory so that you can run the sample again.   
$ hadoop fs -rm -r /user/cloudera/wordcount/output
Moved: 'hdfs://myhost2.example.com:8020/user/cloudera/wordcount/output' to trash at: hdfs://myhost2.example.com:8020/user/hdfs/.Trash/Current