锐单电子商城 , 一站式电子元器件采购平台!
  • 电话:400-990-0325

HBase过滤器

时间:2023-01-31 14:00:00 300前置连接器392

目录

一、介绍

1.hbase运算符

2.Hbase 过滤器的比较器

二、代码

1.hbase建表

2.创建数据

3.导入依赖

4.列值过滤器

5.单列值过滤器

6.单列值排除过滤器

7.rowkey过滤器

8.rowkey前缀过滤器:PrefixFilter

9. 列簇过滤器

10.列过滤器

11.综合过滤器


一、介绍

  • 列值过滤器
  • SingleColumnValueFilter 单列值过滤器
  • SingleColumnValueExcludeFilter 单列值排除过滤器
  • rowkey过滤器
  • rowkey前缀过滤器:PrefixFilter
  • 列簇过滤器
  • 列过滤器
  • PageFilter 分页过滤器
  • 分页过滤器 改进版
  • 综合使用多过滤器 之前Hbase查询表中的数据是通过的 get 和 scan ,但是get只能查询一行数 据,scan虽然可以查询范围内的数据,但这个范围的划分只取决于行键的范围,不能像 mysql或者hive对列值进行筛选查询等操作。

过滤器可以根据列族、列、版等更多条件过滤数据, 基于 HBase 三维有序(行键、列、版本有序),这些过滤器可以有效地完成查询和过滤的任务 过滤条件 RPC 查询请求将过滤器分发给每个人 RegionServer(这是服务器过滤器) 降低网络传输压力。 使用过滤器至少需要两种参数: 一种是抽象的比较运算符,另一种是比较器

1.hbase运算符

// CompareFilter.CompareOp.LESS_OR_EQUAL LESS    < LESS_OR_EQUAL  <= EQUAL    = NOT_EQUAL   <> GREATER_OR_EQUAL >= GREATER    > NO_OP               排除所有 

2.Hbase 过滤器的比较器

BinaryComparator  按字节索引顺序比较指定字节数组,采用Bytes.compareTo(byte[]) BinaryPrefixComparator 和前面一样,只是比较左端的数据是否相同 NullComparator 判断给定的是否为空 BitComparator 按位比较 a BitwiseOp class 做异或,和,并操作 RegexStringComparator 提供正则比较器,仅支持 EQUAL 和非EQUAL SubstringComparator 判断提供的子串是否出现在判断中table的value中。 

比较过滤器:可用于比较过滤器:rowkey、列簇、列、列值过滤器

列值过滤器:ValueFilter

列过滤器:QualifierFilter

列簇过滤器:FamilyFilter

rowKey过滤器:RowFilter 专用过滤器:只适用于特定过滤器

单列值过滤器:SingleColumnValueFilter

列值排除过滤器:SingleColumnValueExcludeFilter

rowkey前缀过滤器:PrefixFilter

分页过滤器PageFilter

二、代码

1.hbase建表

 hbase(main):002:0> create 'emp2','info' 0 row(s) in 1.7460 seconds   => Hbase::Table - emp2

2.创建数据

put 'emp二、info:name','zhangsan' put 'emp二、info:job','preader' put 'emp二、info:salary','35000' put 'emp2','1001','info:deptName','TP'  put 'emp十二、info:name','lisi' put 'emp十二、info:job','preader' put 'emp十二、info:salary','35000' put 'emp2','1002','info:deptName','AC'   put 'empsinfo:name','gopal' put 'empsinfo:job','manager' put 'empsinfo:salary','50000' put 'empsinfo:deptName','TP'  put 'emp1202info:name','manisha' put 'emp1202info:job','preader' put 'emp2','1202','info:salary','50000' put 'emp1202info:deptName','TP'  put 'emp1203info:name','kalil' put 'emp1203info:job','phpdev' put 'emp2','1203','info:salary','30000' put 'emp1203info:deptName','AC'  put 'emp1204info:name','prasanth' put 'emp1204info:job','phpdev' put 'emp2','1204','info:salary','30000' put 'emp2','1204','info:deptName','AC'  put 'emp1205info:name','kranthi' put 'emp2','1205','info:job','admin' put 'emp1205info:salary','20000' put 'emp1205info:deptName','TP'  put 'emp1206,info:name','satishp' put 'emp2','1206','info:job,'grpdes'
put 'emp2','1206','info:salary','20000'
put 'emp2','1206','info:deptName','GR'

 3.导入依赖



            org.apache.hadoop
            hadoop-client
            2.7.3
        
        
        
            org.apache.hbase
            hbase-client
            1.4.13
        
        
        
            org.apache.hbase
            hbase-server
            1.4.13
        

4.列值过滤器

列值过滤器仅仅针对单元格中的值进行过滤,满足 比较运算符加上比较器 构成的过滤条件,则留下,否则为 null 虽然这里给出的过滤条件是 salary>30000 ,但是发现在代码里面根本没有指定salary这一列,因此实际上,列值过 滤器是与所有列的所有单元格进行比较。如果满足条件则保留数据,如果不满足则过滤掉该数据,查询时不满足 过滤条件的单元格都为null 这里的 id 和 name 等列的值也能显示出现,是因为这里的比较器是按照字节数组进行比较,id和name里面 的值都满足过滤条件,所以没有 变成null

package com.lenovo.Filter;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;

public class ValueFilterDemo {
    //成员变量自动初始化
    Connection connection;
    Admin admin;
    TableName tableName;
    Table table;
    /**
     * @Date 2022.04.26
     * @Description 获取连接以及表对象
     */
    @Before
    public void createConnection(){
        //局部变量手动初始化
        //获取配置对象
        Configuration configuration = new Configuration();
        configuration.set("hbase.zookeeper.quorum", "IP地址");
        //获取连接
        try {
            connection = ConnectionFactory.createConnection(configuration);
            //获取管理员对象
            admin = connection.getAdmin();
            //获取表名
            tableName = tableName.valueOf("emp2");
            //获取表的对象
            table = connection.getTable(tableName);

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    /**
     * @Date 2022.04.26
     * @Description 列值过滤器
     * salary>30000
     */
    @Test
    public void test(){
        //创建比较运算符以及比较运算器
        BinaryComparator binaryComparator = new BinaryComparator("30000".getBytes());
        //创建过滤器
        ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.GREATER, binaryComparator);
        //调用print方法
        print(valueFilter);
    }
    

    //打印过滤后数据
    private void print(Filter filter){
        //创建Scan对象
        Scan scan = new Scan();
        //将过滤器放到scan对象
        scan.setFilter(filter);
        //调用scan

        try {
            ResultScanner scanner = table.getScanner(scan);
            //解析
            for (Result result : scanner) {
                //scan扫描返回的时多行数据,遍历循环每一行的数据
                // 利用getrow方法取出rowkey
                // 利用getValue方法取出这一行的value值,根据列簇和列确定一个单元格的值
                //拿到rowKey
                String rowKey = Bytes.toString(result.getRow());//byte数组转字符串
                //拿到其他列
                String name = Bytes.toString(result.getValue("info".getBytes(), "name".getBytes()));
                String job = Bytes.toString(result.getValue("info".getBytes(), "job".getBytes()));
                String salary = Bytes.toString(result.getValue("info".getBytes(), "salary".getBytes()));
                String deptName = Bytes.toString(result.getValue("info".getBytes(), "deptName".getBytes()));
                System.out.println("id:"+rowKey+",name:"+name+",job:"+job+",salary:"+salary+",deptName:"+deptName);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }


    }
    @After
    public void close(){
        try {
            admin.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

        try {
            connection.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

运行结果:

 5.单列值过滤器

package com.lenovo.Filter;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;

public class ValueFilterDemo {
    //成员变量自动初始化
    Connection connection;
    Admin admin;
    TableName tableName;
    Table table;
    /**
     * @Date 2022.04.26
     * @Description 获取连接以及表对象
     */
    @Before
    public void createConnection(){
        //局部变量手动初始化
        //获取配置对象
        Configuration configuration = new Configuration();
        configuration.set("hbase.zookeeper.quorum", "IP地址");
        //获取连接
        try {
            connection = ConnectionFactory.createConnection(configuration);
            //获取管理员对象
            admin = connection.getAdmin();
            //获取表名
            tableName = tableName.valueOf("emp2");
            //获取表的对象
            table = connection.getTable(tableName);

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
/**
     * @Date 2022.04.26
     * @Description 单列值过滤器
     * 可以指定一个列进行过滤
     * 该过滤器会将符合过滤条件的列对应的cell所在的整行数据进行返回
     * 如果某条数据的列不符合条件,则会将整条数据进行过滤
     * 如果数据中不存在指定的列,则默认会直接返回,并且该列全为null
     * salary>30000
     */
    @Test
    public void SingleColumnValueFilter() throws IOException {
        SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter(
                "info".getBytes(),
                "salary".getBytes(),
                CompareFilter.CompareOp.GREATER,
                "30000".getBytes()
        );
        print(singleColumnValueFilter);

    }
//打印过滤后数据
    private void print(Filter filter){
        //创建Scan对象
        Scan scan = new Scan();
        //将过滤器放到scan对象
        scan.setFilter(filter);
        //调用scan

        try {
            ResultScanner scanner = table.getScanner(scan);
            //解析
            for (Result result : scanner) {
                //scan扫描返回的时多行数据,遍历循环每一行的数据
                // 利用getrow方法取出rowkey
                // 利用getValue方法取出这一行的value值,根据列簇和列确定一个单元格的值
                //拿到rowKey
                String rowKey = Bytes.toString(result.getRow());//byte数组转字符串
                //拿到其他列
                String name = Bytes.toString(result.getValue("info".getBytes(), "name".getBytes()));
                String job = Bytes.toString(result.getValue("info".getBytes(), "job".getBytes()));
                String salary = Bytes.toString(result.getValue("info".getBytes(), "salary".getBytes()));
                String deptName = Bytes.toString(result.getValue("info".getBytes(), "deptName".getBytes()));
                System.out.println("id:"+rowKey+",name:"+name+",job:"+job+",salary:"+salary+",deptName:"+deptName);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }


    }
    @After
    public void close(){
        try {
            admin.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

        try {
            connection.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

运行结果:

6.单列值排除过滤器

package com.lenovo.Filter;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;

public class ValueFilterDemo {
    //成员变量自动初始化
    Connection connection;
    Admin admin;
    TableName tableName;
    Table table;
    /**
     * @Date 2022.04.26
     * @Description 获取连接以及表对象
     */
    @Before
    public void createConnection(){
        //局部变量手动初始化
        //获取配置对象
        Configuration configuration = new Configuration();
        configuration.set("hbase.zookeeper.quorum", "IP地址");
        //获取连接
        try {
            connection = ConnectionFactory.createConnection(configuration);
            //获取管理员对象
            admin = connection.getAdmin();
            //获取表名
            tableName = tableName.valueOf("emp2");
            //获取表的对象
            table = connection.getTable(tableName);

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
/**
     * @Date 2022.04.26
     * @Description 单列值排除过滤器
     * 列值过滤器返回的是全部的行,而单列值过滤器返回的是满足过滤条件的行
     * salary=30000
     */
    @Test
    public void SingleColumnValueExcludeFilter(){
        BinaryPrefixComparator binaryPrefixComparator = new BinaryPrefixComparator("30000".getBytes());
        SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter(
                "info".getBytes(),
                "salary".getBytes(),
                CompareFilter.CompareOp.EQUAL,
                binaryPrefixComparator
        );
        print(singleColumnValueFilter);
    }
//打印过滤后数据
    private void print(Filter filter){
        //创建Scan对象
        Scan scan = new Scan();
        //将过滤器放到scan对象
        scan.setFilter(filter);
        //调用scan

        try {
            ResultScanner scanner = table.getScanner(scan);
            //解析
            for (Result result : scanner) {
                //scan扫描返回的时多行数据,遍历循环每一行的数据
                // 利用getrow方法取出rowkey
                // 利用getValue方法取出这一行的value值,根据列簇和列确定一个单元格的值
                //拿到rowKey
                String rowKey = Bytes.toString(result.getRow());//byte数组转字符串
                //拿到其他列
                String name = Bytes.toString(result.getValue("info".getBytes(), "name".getBytes()));
                String job = Bytes.toString(result.getValue("info".getBytes(), "job".getBytes()));
                String salary = Bytes.toString(result.getValue("info".getBytes(), "salary".getBytes()));
                String deptName = Bytes.toString(result.getValue("info".getBytes(), "deptName".getBytes()));
                System.out.println("id:"+rowKey+",name:"+name+",job:"+job+",salary:"+salary+",deptName:"+deptName);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }


    }
    @After
    public void close(){
        try {
            admin.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

        try {
            connection.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

运行结果:

 

7.rowkey过滤器

package com.lenovo.Filter;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;

public class ValueFilterDemo {
    //成员变量自动初始化
    Connection connection;
    Admin admin;
    TableName tableName;
    Table table;
    /**
     * @Date 2022.04.26
     * @Description 获取连接以及表对象
     */
    @Before
    public void createConnection(){
        //局部变量手动初始化
        //获取配置对象
        Configuration configuration = new Configuration();
        configuration.set("hbase.zookeeper.quorum", "IP地址");
        //获取连接
        try {
            connection = ConnectionFactory.createConnection(configuration);
            //获取管理员对象
            admin = connection.getAdmin();
            //获取表名
            tableName = tableName.valueOf("emp2");
            //获取表的对象
            table = connection.getTable(tableName);

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
/**
     * @Date 2022.04.26
     * @Description rowkey过滤器
     * rowkey过滤器加上前缀比较器
     * 过滤出rowkey(id)以100开头的
     */
    @Test
    public void rowKeyFilter(){
        BinaryPrefixComparator binaryPrefixComparator = new BinaryPrefixComparator("100".getBytes());
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, binaryPrefixComparator);
        print(rowFilter);
    }

//打印过滤后数据
    private void print(Filter filter){
        //创建Scan对象
        Scan scan = new Scan();
        //将过滤器放到scan对象
        scan.setFilter(filter);
        //调用scan

        try {
            ResultScanner scanner = table.getScanner(scan);
            //解析
            for (Result result : scanner) {
                //scan扫描返回的时多行数据,遍历循环每一行的数据
                // 利用getrow方法取出rowkey
                // 利用getValue方法取出这一行的value值,根据列簇和列确定一个单元格的值
                //拿到rowKey
                String rowKey = Bytes.toString(result.getRow());//byte数组转字符串
                //拿到其他列
                String name = Bytes.toString(result.getValue("info".getBytes(), "name".getBytes()));
                String job = Bytes.toString(result.getValue("info".getBytes(), "job".getBytes()));
                String salary = Bytes.toString(result.getValue("info".getBytes(), "salary".getBytes()));
                String deptName = Bytes.toString(result.getValue("info".getBytes(), "deptName".getBytes()));
                System.out.println("id:"+rowKey+",name:"+name+",job:"+job+",salary:"+salary+",deptName:"+deptName);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }


    }
    @After
    public void close(){
        try {
            admin.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

        try {
            connection.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

运行结果:

8.rowkey前缀过滤器:PrefixFilter 

package com.lenovo.Filter;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;

public class ValueFilterDemo {
    //成员变量自动初始化
    Connection connection;
    Admin admin;
    TableName tableName;
    Table table;
    /**
     * @Date 2022.04.26
     * @Description 获取连接以及表对象
     */
    @Before
    public void createConnection(){
        //局部变量手动初始化
        //获取配置对象
        Configuration configuration = new Configuration();
        configuration.set("hbase.zookeeper.quorum", "IP地址");
        //获取连接
        try {
            connection = ConnectionFactory.createConnection(configuration);
            //获取管理员对象
            admin = connection.getAdmin();
            //获取表名
            tableName = tableName.valueOf("emp2");
            //获取表的对象
            table = connection.getTable(tableName);

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
/**
     * @Date 2022.04.26
     * @Description rowkey前缀过滤器
     * rowkey过滤器加上前缀比较器
     * 过滤出rowkey(id)以100开头的
     * rowkey过滤器加上前缀比较器后,与rowkey前缀过滤器的效果相同
     */
    @Test
    public void PrefixFilter(){
        PrefixFilter prefixFilter = new PrefixFilter("100".getBytes());
        print(prefixFilter);
    }

//打印过滤后数据
    private void print(Filter filter){
        //创建Scan对象
        Scan scan = new Scan();
        //将过滤器放到scan对象
        scan.setFilter(filter);
        //调用scan

        try {
            ResultScanner scanner = table.getScanner(scan);
            //解析
            for (Result result : scanner) {
                //scan扫描返回的时多行数据,遍历循环每一行的数据
                // 利用getrow方法取出rowkey
                // 利用getValue方法取出这一行的value值,根据列簇和列确定一个单元格的值
                //拿到rowKey
                String rowKey = Bytes.toString(result.getRow());//byte数组转字符串
                //拿到其他列
                String name = Bytes.toString(result.getValue("info".getBytes(), "name".getBytes()));
                String job = Bytes.toString(result.getValue("info".getBytes(), "job".getBytes()));
                String salary = Bytes.toString(result.getValue("info".getBytes(), "salary".getBytes()));
                String deptName = Bytes.toString(result.getValue("info".getBytes(), "deptName".getBytes()));
                System.out.println("id:"+rowKey+",name:"+name+",job:"+job+",salary:"+salary+",deptName:"+deptName);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }


    }
    @After
    public void close(){
        try {
            admin.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

        try {
            connection.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

运行结果:

9. 列簇过滤器

/**
     * @Date 2022.04.26
     * @Description 列簇过滤器
     * 匹配列簇
     */
    @Test
    public void familyFilterTest(){
        RegexStringComparator regexStringComparator = new RegexStringComparator("i[a-zA-Z]");
        FamilyFilter familyFilter = new FamilyFilter(CompareFilter.CompareOp.EQUAL, regexStringComparator);
        print(familyFilter);
    }

运行结果:

 

10.列过滤器

/**
     * @Date 2022.04.26
     * @Description 列过滤器
     * 匹配子字符串
     * 其他列为null
     */
    @Test
    public void substringFilterTest(){
        SubstringComparator substringComparator = new SubstringComparator("me");
        QualifierFilter qualifierFilter = new QualifierFilter(CompareFilter.CompareOp.EQUAL, substringComparator);
        print(qualifierFilter);
    }

 运行结果:

 

11.综合过滤器

 @Test
    public void manyColumnFilter(){
        //查询列
        SubstringComparator substringComparator = new SubstringComparator("a");
        QualifierFilter qualifierFilter = new QualifierFilter(CompareFilter.CompareOp.EQUAL, substringComparator);
        //查询salary>30000
        BinaryComparator binaryComparator = new BinaryComparator("30000".getBytes());
        SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter("info".getBytes(), "salary".getBytes(), CompareFilter.CompareOp.GREATER, binaryComparator);
        //利用FilterList,将多个过滤器放在一起,一起过滤
        FilterList filterList = new FilterList();
        filterList.addFilter(qualifierFilter);
        filterList.addFilter(singleColumnValueFilter);
        print(filterList);

    }

 运行结果:

 

锐单商城拥有海量元器件数据手册IC替代型号,打造电子元器件IC百科大全!

相关文章