Hbase

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 前言：前段时间在公司内部做了一个分享总结了部分我在使用各种数据库方面的遇到的问题。也在这里分享给大家。强调一下，这里的坑，我是打了引号的，有些坑，不过是某种数据库的特点，或者因为我们错误的事情而引出了问题，并不一定完全就是这种数据库有问题。 1. 业务篇 1）业务场景不合理的业务设计，永远是对程序员最大的伤痛在我维护的系统中有这样一种场景，用户要一次性下载全年或者半年的舆情数据，数据量会很大，单个任务就会达到数百万条数据。任何一个系统要在短时间内吞吐数据数百万条记录，也不是件很轻松的事情，尤其当这样的任务很多的时候。目前这个时间跨度已经被调整成了3个月。说到这里不经让我想到12306错开时间发售火车票。任何时候从业务角度的优化，总能带来立竿见影的效果 2）字段设计在我维护的某个系统中，同一种指标，在不同的表中，被存成了不同的字段名，这给我们带来了巨大的痛苦。所以建议对于同一种指标，或者事物使用同样的字段名（名称）进行表达、存储，否则后期光转换都要人命 3）表结构的反范式设计大数据场景下，不要受到关系数据库范式设计的太多影响数据机构能够立体的，尽量立体，不要扁平化以新浪微博的一条转发举例一条转发会包含有这条微博的作者这条微博的内容 text 原创微博retweeted_status 原创微博的内容 retweeted_status.text 原创微博的作者 retweeted_status.user … 一条记录就包含了这条转发，以及与这条转发相关的大部分内容，在实际使用时，无需连表查询可以方便的用NoSQL 数据库进行存储 { "created_at": "Tue May 31 17:46:55 +0800 2011", "id": 11488058246, "text": "求关注。"， "source": "<a href="http://weibo.com" rel="nofollow">新浪微博</a>", "favorited": false, "truncated": false, "in_reply_to_status_id": "", "in_reply_to_user_id": "", "in_reply_to_screen_name": "", "geo": null, "mid": "5612814510546515491", "reposts_count": 8, "comments_count": 9, "annotations": [], "user": { "id": 1404376560, "screen_name": "zaku", "name": "zaku", "province": "11", "city": "5", "location": "北京朝阳区", "description": "人生五十年，乃如梦如幻；有生斯有死，壮士复何憾。", "url": "http://blog.sina.com.cn/zaku", "profile_image_url": "http://tp1.sinaimg.cn/1404376560/50/0/1", "domain": "zaku", "gender": "m", "followers_count": 1204, "friends_count": 447, "statuses_count": 2908, "favourites_count": 0, "created_at": "Fri Aug 28 00:00:00 +0800 2009", "following": false, "allow_all_act_msg": false, "remark": "", "geo_enabled": true, "verified": false, "allow_all_comment": true, "avatar_large": "http://tp1.sinaimg.cn/1404376560/180/0/1", "verified_reason": "", "follow_me": false, "online_status": 0, "bi_followers_count": 215 }, "retweeted_status": { "created_at": "Tue May 24 18:04:53 +0800 2011", "id": 11142488790, "text": "我的相机到了。", "source": "<a href="http://weibo.com" rel="nofollow">新浪微博</a>", "favorited": false, "truncated": false, "in_reply_to_status_id": "", "in_reply_to_user_id": "", "in_reply_to_screen_name": "", "geo": null, "mid": "5610221544300749636", "annotations": [], "reposts_count": 5, "comments_count": 8, "user": { "id": 1073880650, "screen_name": "檀木幻想", "name": "檀木幻想", "province": "11", "city": "5", "location": "北京朝阳区", "description": "请访问微博分析家。", "url": "http://www.weibo007.com/", "profile_image_url": "http://tp3.sinaimg.cn/1073880650/50/1285051202/1", "domain": "woodfantasy", "gender": "m", "followers_count": 723, "friends_count": 415, "statuses_count": 587, "favourites_count": 107, "created_at": "Sat Nov 14 00:00:00 +0800 2009", "following": true, "allow_all_act_msg": true, "remark": "", "geo_enabled": true, "verified": false, "allow_all_comment": true, "avatar_large": "http://tp3.sinaimg.cn/1073880650/180/1285051202/1", "verified_reason": "", "follow_me": true, "online_status": 0, "bi_followers_count": 199 } } } 2. hbase 篇 1）无法建立索引 hbase 最大的问题是无法建立索引两个变象建立索引的办法 ...

Hbase

happybase put()操作默认使用批量?

我在数据库方面踩过的"坑"