萌叔

对称加密算法和非对称加密算法速度对比

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 测试环境： CPU 1 核 Intel 2.2GHZ 内存 1GB 算法种类对称加密算法 AES CBC 模式非对称加密算法 RSA 256 加密明文长度为160 bytes 各运行10000次上代码 test_aes.py from Crypto.Cipher import AES import time obj = AES.new('This is a key123', AES.MODE_CBC, 'This is an IV456') message = 'a' * 160 t1 = time.time() for i in xrange(10000): ciphertext = obj.encrypt(message) obj2 = AES.new('This is a key123', AES.MODE_CBC, 'This is an IV456') text = obj2.decrypt(ciphertext) #print text t2 = time.time() print t2 - t1 test_rsa.py ...

happybase put()操作默认使用批量?

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 起因：前段时间，我们把通过happybase向hbase 写数据的操作put() 操作换成了batch() 结果发现性能并没有提升阅读代码，我发现put() 实现使用的就是批量插入 table.py def put(self, row, data, timestamp=None, wal=True): """Store data in the table. This method stores the data in the `data` argument for the row specified by `row`. The `data` argument is dictionary that maps columns to values. Column names must include a family and qualifier part, e.g. `cf:col`, though the qualifier part may be the empty string, e.g. `cf:`. Note that, in many situations, :py:meth:`batch()` is a more appropriate method to manipulate data. .. versionadded:: 0.7 `wal` argument :param str row: the row key :param dict data: the data to store :param int timestamp: timestamp (optional) :param wal bool: whether to write to the WAL (optional) """ with self.batch(timestamp=timestamp, wal=wal) as batch: batch.put(row, data) # 很明显是批量操作 batch.py ...

https 原理简析

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 起因：周四下午要在公司做关于https的分享，就顺便结合已经写好的PPT，在CSDN 中也分享下。参考资料：加密与解密电子工业出版社 HTTP Over TLS RFC2818 The Transport Layer Security (TLS) Protocol RFC5246 https://en.wikipedia.org/wiki/Transport_Layer_Security 大型网站的 HTTPS 实践 http://blog.jobbole.com/86660/ 上面的参考资料都挺好的，尤其后面4个，如果想彻底了解https必须得读一下 HTTPS是什么？ HTTPS (also called HTTP over TLS, HTTP over SSL and HTTP Secure) is a protocol for secure communication over a computer network which is widely used on the Internet. HTTPS consists of communication over Hypertext Transfer Protocol (HTTP) within a connection encrypted by Transport Layer Security or its predecessor, Secure Sockets Layer. ...

redis 启动警告及处理

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 起因: 生产环境的一台redis机器 Can't save in background: fork: Cannot allocate memory 导致redis服务停止，但是当时机器的内存是64G，redis使用到的内存只有40多G 我们都知道，redis 如果开启了持久化，RDB模式的bgsave 以及 AOF模式下，重写appendonly.aof 都会导致redis fork 出一个子进程。但是难道操作系统的进程fork难道不应该是copy-on-write 的吗？这件事让我重新关注起redis启动时的日志来。首先来看看redis启动时所报的日志 1610:M 12 Sep 07:46:20.524 # Server started, Redis version 3.0.1 1610:M 12 Sep 07:46:20.524 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 1610:M 12 Sep 07:46:20.524 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. 1610:M 12 Sep 07:46:20.525 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 1610:M 12 Sep 07:46:20.525 * The server is now ready to accept connections on port 6379 1610:M 12 Sep 07:57:21.819 * Background saving started by pid 1615 1615:C 12 Sep 07:57:21.827 * DB saved on disk 1615:C 12 Sep 07:57:21.827 * RDB: 4 MB of memory used by copy-on-write 1610:M 12 Sep 07:57:21.925 * Background saving terminated with success 可以看到警告有3个 ...

滥用设计模式不如不用

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 引子：想到这个话题，不得不吐槽一个亲身经历的故事要说设计模式，用的最多的莫过于java程序员，尤其是爱讲抽象和继承什么的，曾经有这样一个案例 B类业务和C类业务都会产生订单，然后B类业务和C类业务订单有某些共同字段，比如c1，c2 … … 然后公司的老程序员是这样设计类的 public class Common { } class B extends Common { } class C extends Common { } 然后当时在数据库层面，表被分为3个 common table B table C table 同一张表被硬生生的拆成了两张表，这种拆分只是理论上有意义，拆分在实际上节约不了存储空间，对使用也会造成巨大的麻烦当年，笔者还是个菜鸟，对此未报异议，如今想来真的挺可笑 PS: 我觉得java 程序员（尤其是做业务系统的）视野真的挺狭窄的，真的有必要学习一下其它的语言和框架，了解一下反范式设计，以及no sql 的数据库某些java中继承层级过深（4，5层）真的让人深恶痛决啊。

我在数据库方面踩过的"坑"

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 前言：前段时间在公司内部做了一个分享总结了部分我在使用各种数据库方面的遇到的问题。也在这里分享给大家。强调一下，这里的坑，我是打了引号的，有些坑，不过是某种数据库的特点，或者因为我们错误的事情而引出了问题，并不一定完全就是这种数据库有问题。 1. 业务篇 1）业务场景不合理的业务设计，永远是对程序员最大的伤痛在我维护的系统中有这样一种场景，用户要一次性下载全年或者半年的舆情数据，数据量会很大，单个任务就会达到数百万条数据。任何一个系统要在短时间内吞吐数据数百万条记录，也不是件很轻松的事情，尤其当这样的任务很多的时候。目前这个时间跨度已经被调整成了3个月。说到这里不经让我想到12306错开时间发售火车票。任何时候从业务角度的优化，总能带来立竿见影的效果 2）字段设计在我维护的某个系统中，同一种指标，在不同的表中，被存成了不同的字段名，这给我们带来了巨大的痛苦。所以建议对于同一种指标，或者事物使用同样的字段名（名称）进行表达、存储，否则后期光转换都要人命 3）表结构的反范式设计大数据场景下，不要受到关系数据库范式设计的太多影响数据机构能够立体的，尽量立体，不要扁平化以新浪微博的一条转发举例一条转发会包含有这条微博的作者这条微博的内容 text 原创微博retweeted_status 原创微博的内容 retweeted_status.text 原创微博的作者 retweeted_status.user … 一条记录就包含了这条转发，以及与这条转发相关的大部分内容，在实际使用时，无需连表查询可以方便的用NoSQL 数据库进行存储 { "created_at": "Tue May 31 17:46:55 +0800 2011", "id": 11488058246, "text": "求关注。"， "source": "<a href="http://weibo.com" rel="nofollow">新浪微博</a>", "favorited": false, "truncated": false, "in_reply_to_status_id": "", "in_reply_to_user_id": "", "in_reply_to_screen_name": "", "geo": null, "mid": "5612814510546515491", "reposts_count": 8, "comments_count": 9, "annotations": [], "user": { "id": 1404376560, "screen_name": "zaku", "name": "zaku", "province": "11", "city": "5", "location": "北京朝阳区", "description": "人生五十年，乃如梦如幻；有生斯有死，壮士复何憾。", "url": "http://blog.sina.com.cn/zaku", "profile_image_url": "http://tp1.sinaimg.cn/1404376560/50/0/1", "domain": "zaku", "gender": "m", "followers_count": 1204, "friends_count": 447, "statuses_count": 2908, "favourites_count": 0, "created_at": "Fri Aug 28 00:00:00 +0800 2009", "following": false, "allow_all_act_msg": false, "remark": "", "geo_enabled": true, "verified": false, "allow_all_comment": true, "avatar_large": "http://tp1.sinaimg.cn/1404376560/180/0/1", "verified_reason": "", "follow_me": false, "online_status": 0, "bi_followers_count": 215 }, "retweeted_status": { "created_at": "Tue May 24 18:04:53 +0800 2011", "id": 11142488790, "text": "我的相机到了。", "source": "<a href="http://weibo.com" rel="nofollow">新浪微博</a>", "favorited": false, "truncated": false, "in_reply_to_status_id": "", "in_reply_to_user_id": "", "in_reply_to_screen_name": "", "geo": null, "mid": "5610221544300749636", "annotations": [], "reposts_count": 5, "comments_count": 8, "user": { "id": 1073880650, "screen_name": "檀木幻想", "name": "檀木幻想", "province": "11", "city": "5", "location": "北京朝阳区", "description": "请访问微博分析家。", "url": "http://www.weibo007.com/", "profile_image_url": "http://tp3.sinaimg.cn/1073880650/50/1285051202/1", "domain": "woodfantasy", "gender": "m", "followers_count": 723, "friends_count": 415, "statuses_count": 587, "favourites_count": 107, "created_at": "Sat Nov 14 00:00:00 +0800 2009", "following": true, "allow_all_act_msg": true, "remark": "", "geo_enabled": true, "verified": false, "allow_all_comment": true, "avatar_large": "http://tp3.sinaimg.cn/1073880650/180/1285051202/1", "verified_reason": "", "follow_me": true, "online_status": 0, "bi_followers_count": 199 } } } 2. hbase 篇 1）无法建立索引 hbase 最大的问题是无法建立索引两个变象建立索引的办法 ...

利用redis实现分布式环境下的限频

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc redis 本身有计数器，并且可以做原子的增1操作，特别适合用来做分布式环境下的限频 # coding:utf-8 import time import threading from redis import StrictRedis class Counter(object): def __init__(self, redis_url): self.redis_client = StrictRedis.from_url(redis_url) def increment(self, key): t = int(time.time()) sign = t / 60 redis_key = key + ':' + str(sign) counter = self.redis_client.incr(redis_key) # 注：设置key的失效时间没有必要和原子增1操作包含在一个事务中。 self.redis_client.expire(redis_key, 300) # 设置key的失效时间300 seconds return counter if __name__ == '__main__': redis_url = 'redis://127.0.0.1:6379/0' c = Counter(redis_url) for i in range(100): time.sleep(0.2) x = c.increment('hello') if x > 50: print "over limit" print x 这里限频有个前提条件，就是分布式环境中时钟，必须尽量对齐。在上面的例子中频率限制就是50次/分钟

UTF8 encoding is longer than the max length 32766

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 起因：同事在向ES插入数据时，收到了如下错误 mapping结构如下： { "test": { "mappings": { "test_ignore32766": { "properties": { "message": { "type": "string", "index": "not_analyzed" } } } } } } { "error": "RemoteTransportException[[Pietro Maximoff][inet[/10.1.1.51:9300]][indices:data/write/index]]; nested: IllegalArgumentException[Document contains at least one immense term in field=\"message\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[-28, -72, -83, -27, -101, -67, -25, -69, -113, -26, -75, -114, -26, -83, -93, -27, -100, -88, -25, -69, -113, -27, -114, -122, -26, -106, -80, -28, -72, -128]...', original message: bytes can be at most 32766 in length; got 69345]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 69345]; ", "status": 400 } 此问题的原因是这样的，message字段设置为not_analyzed，表示对这个字段不做分词索引，但对这个字段本身仍然是要索引的，也就说可以用term进行搜索 ...

tornado 数据库初始化

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 起因: 在使用tornado构建的web服务中，我们常常需要对数据库进行访问，如何数据连接才是最为友好的方式，我们一般写法可能是这样的 db.py class DB(object): def __init__(self): self.mysql_db = MySQLDatabase(host=mysql_conf['host'], user=mysql_conf['username'], passwd=mysql_conf['password'], database='test') self.cache_redis = Redis(host=redis_conf['host'], port=redis_conf['port'], db=redis_conf['db']) 然后 import tornado.ioloop import tornado.web from db import DB db = DB() # ***注意这里*** class MainHandler(tornado.web.RequestHandler): def get(self): db.mysql_db.excute() ... application = tornado.web.Application([ (r"/", MainHandler), ]) if __name__ == "__main__": print 'starting....' application.listen(8090) tornado.ioloop.IOLoop.instance().start() 这个写法一般情况下并不会出问题，因为tornado是基于epoll模型的，整个tornado是单线程的，逐个的处理每一个收到的Event，不会出现对数据库连接的并发访问，也就不存在线程安全问题但是db作为一个模块中的变量暴露在外部显得非常突兀，一种更内聚的做法，可能是这样 from db import DB class MainHandler(tornado.web.RequestHandler): def initialize(self): self.db = DB() # ***留意这里*** def get(self): self.db.mysql_db.excute() ... application = tornado.web.Application([ (r"/", MainHandler), ]) if __name__ == "__main__": print 'starting....' application.listen(8090) tornado.ioloop.IOLoop.instance().start() initialize(self) 是 RequestHandler 中的预留方法，供子类覆盖，用于执行数据库等初始化动作, 只会在对应的handler被初始化时，调用有且一次 ...

算法题(3)

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 在不考虑闰年，润日的情况下计算某人的下一次生日 from datetime import datetime, timedelta def solve(birthday): now = datetime.now() now = datetime(now.year, now.month, now.day) year = (now.year - birthday.year) + birthday.year x = datetime(year, birthday.month, birthday.day) if x < now: x = datetime(x.year + 1 , x.month, x.day ) return x else: return x b1 = datetime(1985, 11, 1) print solve(b1) b2 = datetime(1982, 3, 15) print solve(b2) b3 = datetime(1985, 7, 15) print solve(b3) ~