编程语言 | 萌叔

tornado 数据库初始化

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 起因: 在使用tornado构建的web服务中，我们常常需要对数据库进行访问，如何数据连接才是最为友好的方式，我们一般写法可能是这样的 db.py class DB(object): def __init__(self): self.mysql_db = MySQLDatabase(host=mysql_conf['host'], user=mysql_conf['username'], passwd=mysql_conf['password'], database='test') self.cache_redis = Redis(host=redis_conf['host'], port=redis_conf['port'], db=redis_conf['db']) 然后 import tornado.ioloop import tornado.web from db import DB db = DB() # ***注意这里*** class MainHandler(tornado.web.RequestHandler): def get(self): db.mysql_db.excute() ... application = tornado.web.Application([ (r"/", MainHandler), ]) if __name__ == "__main__": print 'starting....' application.listen(8090) tornado.ioloop.IOLoop.instance().start() 这个写法一般情况下并不会出问题，因为tornado是基于epoll模型的，整个tornado是单线程的，逐个的处理每一个收到的Event，不会出现对数据库连接的并发访问，也就不存在线程安全问题但是db作为一个模块中的变量暴露在外部显得非常突兀，一种更内聚的做法，可能是这样 from db import DB class MainHandler(tornado.web.RequestHandler): def initialize(self): self.db = DB() # ***留意这里*** def get(self): self.db.mysql_db.excute() ... application = tornado.web.Application([ (r"/", MainHandler), ]) if __name__ == "__main__": print 'starting....' application.listen(8090) tornado.ioloop.IOLoop.instance().start() initialize(self) 是 RequestHandler 中的预留方法，供子类覆盖，用于执行数据库等初始化动作, 只会在对应的handler被初始化时，调用有且一次 ...

算法题(3)

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 在不考虑闰年，润日的情况下计算某人的下一次生日 from datetime import datetime, timedelta def solve(birthday): now = datetime.now() now = datetime(now.year, now.month, now.day) year = (now.year - birthday.year) + birthday.year x = datetime(year, birthday.month, birthday.day) if x < now: x = datetime(x.year + 1 , x.month, x.day ) return x else: return x b1 = datetime(1985, 11, 1) print solve(b1) b2 = datetime(1982, 3, 15) print solve(b2) b3 = datetime(1985, 7, 15) print solve(b3) ~

算法题(4)

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 题目: 有十个球编号分别为 0 ~ n - 1 ，放在袋中，任意抓2个，求所有可能的情况 def choice2(n): ll = [] for i in range(n): for j in range(i+1, n): ll.append((i, j)) return ll res = choice2(10) print len(res)

算法题（5）

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 题目: 已知一条线段从0 到 10000，给定一个L线段(x,y), 找出所有包含线段L线段如下图所示，假定 (x, y) 为 (2,4) ，所有能否覆盖L线段的组合为 (0, 4) (0, 5) (0, 6) (1, 4) (1, 5) (1, 6) (2, 4) (2, 5) (2, 6) 解题思路：观察线段可以看出所有能否覆盖L线段的组合(t1, t2)，左侧的坐标点t1必须满足 0 <= t1 <= x 右侧的坐标点t2 必须满足 y<= t2 <= 10000 def find_segment(x, y): res_list = [] for t1 in range(0, x + 1): for t2 in range(y, 10000 + 1): res_list.append((t1, t2)) return res_list print find_segment(2, 4) 如果要求所有重叠的线段，该怎么做？

requests 库的另类用法（stream）

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 起因: 同事让我帮他抓取一批URL，并获取对应URL的<title>标签中的文字，忽略对应URL网站的封禁问题，这个任务并不是一个特别麻烦的事情。然后实际跑起来，却发现流量打的很高，超过10Mb/s。经过排查发现，是因为很多URL，实际是下载链接，会触发文件下载，这些URL对应的html中根本不会包含<title>标签，那么处理逻辑就很清晰了，先拿到headers，取出Content-Type，判断是否是 text/html，如果不是，则该Response的body体，就没有必要读取了。查找requests的相应资料 By default, when you make a request, the body of the response is downloaded immediately. You can override this behaviour and defer downloading the response body until you access the Response.content attribute with the stream parameter: tarball_url = 'https://github.com/kennethreitz/requests/tarball/master' r = requests.get(tarball_url, stream=True) At this point only the response headers have been downloaded and the connection remains open, hence allowing us to make content retrieval conditional: ...

算法题(6)

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 题目：已知一个矩阵 matrix = [ ['A', 'P', 'H', 'S'], ['U', 'L', 'O', 'A'], ['O', 'M', 'L', 'K'], ['F', 'B', 'I', 'R'], ] 在矩阵中查找多个字符串，字符串的数量可能很多 ["LFK", "HMM", "RPOOI"] 对于字符串"LFK"，它由3个字符组成，‘L’、‘F’、‘K’ 这三个字符在矩阵中的位置分别为(1,1),(3,0),(2,3) 对于这个三个字符串分别返回 LFK True, [(1,1),(3,0),(2,3)] HMM 矩阵中的每个字符只能被使用一次，字符M在矩阵中只有一个，因此 HMM 无法找到 False, [] RPOOI True, [(3,3),(0,1),(1,2),(2,0),(3,2)] 分析矩阵中的字符数量有限（全是大写字母），且每个字符只能被使用一次，由于待查的字符串数量很多，所以需要为查找建立索引最终代码如下: # encoding=utf-8 from collections import defaultdict def get_position(matrix, str_list): # 初始化索引 dd = defaultdict(list) for i in range(len(matrix)): for j in range(len(matrix[i])): ch = matrix[i][j] dd[ch].append((i, j)) # deal res = [] for ss in str_list: print ss curr_dd = defaultdict(int) flag = True position_list = [] for ch in ss: curr_dd[ch] += 1 if curr_dd[ch] <= len(dd[ch]): position_list.append(dd[ch][curr_dd[ch] - 1 ]) else: flag = False res.append((False, [])) break if flag: res.append((True, position_list)) return res if __name__ == '__main__': matrix = [ ['A', 'P', 'H', 'S'], ['U', 'L', 'O', 'A'], ['O', 'M', 'L', 'K'], ['F', 'B', 'I', 'R'], ] str_list = ["ISIS", "ALLAHU"] res = get_position(matrix, str_list) for item in res: print item

在一个Python脚本中加载2种不同版本的库

起因: 从ES集群A往ES集群B导数，然后比对2个ES的数据差异，逐个ID比对。由于ES集群A的版本是1.4.x，ES集群B的版本是5.3.x，所以无法使用同一个ES client包 1. 加载不同版本的client包对比的过程是，取相同发布时间区间的文章ID，然后比对id的差异伪码如下： es_A_ids = get_es_A_ids() es_B_ids = get_es_B_ids() diff_ids = es_A_ids - es_B_ids 可以想到的是在访问完集群A后重新加载elasticsearch 库 ## load elasticsearch==1.4.0 es_A_ids = get_es_A_ids() ## load elasticsearch==5.3.0 es_B_ids = get_es_B_ids() diff_ids = es_A_ids - es_B_ids 但是很有趣的是，elasticsearch在load完上一个版本以后，它的版本没有发生变化 2. 清理已经load 的module 经过查资料，我明确了这个问题，python的module，只会被load，有且一次，所以要保证不同版本的module被再次load，只能先clear 原先load的module ES的module都以elasticsearch开头，因此把它们都清理掉 for key in sys.modules.keys(): if key.startswith('elasticsearch'): del sys.modules[key] 完整代码 import importlib import sys # use elasticsearch 5.x es_lib_path = "/Users/woshiaotian/es_5x/lib/python2.7/site-packages" # 注意要把es_lib_path放在sys.path 的首位，确保load module的时候，该目录下的ES库，能够被优先加载 sys.path.insert(0, es_lib_path) #print sys.path elasticsearch = importlib.import_module("elasticsearch") print elasticsearch sys.path.pop(0) for key in sys.modules.keys(): if key.startswith('elasticsearch'): print key del sys.modules[key] # use elasticsearch 1.x es_lib_path = "/Users/woshiaotian/es_1x/lib/python2.7/site-packages" sys.path.insert(0, es_lib_path) elasticsearch = importlib.import_module("elasticsearch") print elasticsearch sys.path.pop(0) #print sys.path for key in sys.modules.keys(): if key.startswith('elasticsearch'): print key del sys.modules[key] sys.modules This is a dictionary that maps module names to modules which have already been loaded. This can be manipulated to force reloading of modules and other tricks. However, replacing the dictionary will not necessarily work as expected and deleting essential items from the dictionary may cause Python to fail. ...

golang中可变长参数的使用

版权声明本站原创文章由萌叔发表转载请注明萌叔 | http://vearne.cc 起因: 使用的Redis的时候，需要使用LPUSH 往一个key中一次写入多个value 我使用的是garyburd/redigo 这个库函数定义如下 // Do sends a command to the server and returns the received reply. func Do(commandName string, args ...interface{}) (reply interface{}, err error) 显然函数是可变长参数解决方法 1. 列表 package main import ( "fmt" "github.com/garyburd/redigo/redis" ) func main() { dialOption1 := redis.DialDatabase(0) dialOption2 := redis.DialPassword("xxxx") rs, err := redis.Dial("tcp", "127.0.0.1:6379", dialOption1, dialOption2) if err != nil { fmt.Println(err) } // redis的key是 "mykey" args := []interface{}{"mykey"} args = append(args, 10, 20) args = append(args, 30) count, err := redis.Int(rs.Do("LPUSH", args ...)) fmt.Println("count:", count) } 2. 字典 package main import ( "fmt" "github.com/garyburd/redigo/redis" ) func main() { dialOption1 := redis.DialDatabase(0) dialOption2 := redis.DialPassword("xxxx") rs, err := redis.Dial("tcp", "127.0.0.1:6379", dialOption1, dialOption2) if err != nil { fmt.Println(err) } test_map := make(map[string]int) test_map["zhangsan"] = 1 test_map["lisi"] = 2 test_map["wangwu"] = 3 // redis的key是 "my_hash" args := []interface{}{"my_hash"} for f, v := range test_map { args = append(args, f, v) } str, err := redis.String(rs.Do("HMSET", args ...)) fmt.Println(str) } 参考资料 How do I call a command with a variable number of arguments Go实例学：可变长参数函数