HBase和Hadoop增加 Snappy压缩
# tar -zxvf snappy-1.0.5.tar.gz
# ./configure
# make
# make install
# vi /etc/ld.so.conf.d/snappy.conf
/usr/local/lib
# ldconfig
# ldconfig -p |grep snappy
# ldconfig -v |grep snappy
下面两行可以防止在不同的操作系统下hadoop原来自带的snappy native库不能使用的问题:
# cp -P /usr/local/lib/libsnappy* $HADOOP_HOME/lib/native/Linux-amd64-64/
# cp -P /usr/local/lib/libsnappy* $HBASE_HOME/lib/native/Linux-amd64-64/
# vi $HADOOP_HOME/conf/core-site.xml
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
# vi $HADOOP_HOME/conf/mapred.xml
<property>
<name>mapred.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
<description>If the job outputs are compressed, how should they be compressed?
</description>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
<description>Should the outputs of the maps be compressed before being
sent across the network. Uses SequenceFile compression.
</description>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
<description>If the map outputs are compressed, how should they be
compressed?
</description>
</property>
<property>
<name>mapred.output.compress</name>
<value>true</value>
<description>Should the job outputs be compressed?
</description>
</property>
要把$HADOOP_HOME/conf加入hbase-env.sh,HADOOP_HOME必须是绝对路径:
# vi hbase-env.sh
export HADOOP_HOME=/data1/oHive/hadoop-0.20.2-cdh3u2
export HBASE_CLASSPATH=${HADOOP_HOME}/conf
测试:
# ./hbase org.apache.hadoop.hbase.util.CompressionTest file:///tmp/aaa snappy
# ./hbase shell
hbase(main):003:0> create ‘t1’, { NAME => ‘cf1’, COMPRESSION => ‘SNAPPY’ }
hbase(main):004:0> describe ‘t1’
DESCRIPTION ENABLED
{NAME => ‘t1’, FAMILIES => [{NAME => ‘cf1’, BLOOMFILTER => ‘NONE’, REPLICATION_SCOPE true
=> ‘0’, COMPRESSION => ‘SNAPPY’, VERSIONS => ‘3’, TTL => ‘2147483647’, BLOCKSIZE =>
‘65536’, IN_MEMORY => ‘false’, BLOCKCACHE => ‘true’}]}
1 row(s) in 0.5370 seconds
hbase(main):014:0> put ‘t1’, ‘r1’, ‘cf1’, ‘1234’
hbase(main):020:0> put ‘t1’, ‘r2’, ‘cf1’, ‘1234’
hbase(main):016:0> get ‘t1’, ‘r1’
COLUMN CELL
cf1: timestamp=1347353707019, value=1234
1 row(s) in 0.0900 seconds
hbase(main):021:0> get ‘t1’, ‘r2’
COLUMN CELL
cf1: timestamp=1347353737455, value=1234
1 row(s) in 0.0870 seconds