关于happybase中 row_prefix 参数

版权声明 本站原创文章 由 萌叔 发表 转载请注明 萌叔 | http://vearne.cc 起因 使用happybase 访问hbase 时 def scan(self, row_start=None, row_stop=None, row_prefix=None, columns=None, filter=None, timestamp=None, include_timestamp=False, batch_size=1000, scan_batching=None, limit=None, sorted_columns=False): scan 函数中有一个row_prefix 参数,而这个参数在java client 对应函数并没有出现,它到底有什么作用呢 查看源码,我们能看到 if row_prefix is not None: if row_start is not None or row_stop is not None: raise TypeError( "'row_prefix' cannot be combined with 'row_start' " "or 'row_stop'") row_start = row_prefix row_stop = str_increment(row_prefix) str_increment 的具体代码 def str_increment(s): """Increment and truncate a byte string (for sorting purposes) This functions returns the shortest string that sorts after the given string when compared using regular string comparison semantics. This function increments the last byte that is smaller than ``0xFF``, and drops everything after it. If the string only contains ``0xFF`` bytes, `None` is returned. """ for i in xrange(len(s) - 1, -1, -1): if s[i] != '\xff': return s[:i] + chr(ord(s[i]) + 1) return None 看完代码大家应该很明白了,row_prefix 被转换成了row_start 和row_stop。 当有如下场景 ...

January 2, 2018 · 1 min

happybase put()操作默认使用批量?

版权声明 本站原创文章 由 萌叔 发表 转载请注明 萌叔 | http://vearne.cc 起因:前段时间,我们把通过happybase向hbase 写数据的操作put() 操作换成了batch() 结果发现性能并没有提升 阅读代码,我发现put() 实现使用的就是批量插入 table.py def put(self, row, data, timestamp=None, wal=True): """Store data in the table. This method stores the data in the `data` argument for the row specified by `row`. The `data` argument is dictionary that maps columns to values. Column names must include a family and qualifier part, e.g. `cf:col`, though the qualifier part may be the empty string, e.g. `cf:`. Note that, in many situations, :py:meth:`batch()` is a more appropriate method to manipulate data. .. versionadded:: 0.7 `wal` argument :param str row: the row key :param dict data: the data to store :param int timestamp: timestamp (optional) :param wal bool: whether to write to the WAL (optional) """ with self.batch(timestamp=timestamp, wal=wal) as batch: batch.put(row, data) # 很明显是批量操作 batch.py ...

January 1, 2018 · 2 min