Fork me on GitHub

版权声明 本站原创文章 由 萌叔 发表
转载请注明 萌叔 | https://vearne.cc

因为知识有限,如果文章中有错误,请指出,谢谢。

起因:

我们生产环境的一台机器,使用的是SSD硬盘,但是测试的结果,读写速度只能到200MB/s, 而按照网上提供给的数据,SSD至少能够达到500MB/s, 顺序读写
2018110403.jpeg-176.6kB

会不会我对SSD的设置有问题

1. 文件大小和文件占用空间

在操作系统中,文件大小和文件占用空间大小是2个不同的概念
文件大小可以理解为文件实际大小,比如字节数
文件占用空间大小是为了存储这个文件所分配的硬盘空间的大小
1) 文件 xxx.log
2018110404.png-60.2kB
文件的大小仅有7个字节,而实际占用了512 * 8 = 4k个字节

2) 文件 xxx2.log

[root@xxx tmp]#stat /tmp/xxx2.log
  File: `/tmp/xxx2.log'
  Size: 38890       Blocks: 80         IO Block: 4096   regular file
Device: fc01h/64513d    Inode: 814342      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-11-13 12:24:42.966209408 +0800
Modify: 2017-11-13 12:24:42.971209436 +0800
Change: 2017-11-13 12:24:42.971209436 +0800

细心的朋友会发现,文件空间总是大于文件大小,而且Blocks恰好为8的整数倍,这是因为操作系统为了和内存分页大小保持一致,便于数据从硬盘加载到内存中,以及便于从内存将文件回写到硬盘上,所以所做的妥协。
内存页大小

[root@xxxx tmp]#getconf PAGE_SIZE
4096

2. 4k问题

概念1: 除非开启了HugePage,否则现在的操作系统默认内存页大小都是4k
概念2: 操作系统对硬盘进行读写是以Block为单位的,且一个Block只能被分配给一个文件

IO Block: 4096
概念3: 目前大部分的硬盘的Physical Block也是4KB
2018110405.png-47.1kB

要确保,尽可能少的对硬盘进行读写,就要分区的起始边界与Physical Block的边界对齐
2018110406.png-20.1kB

Why Align Partition
4K alignment is a quite important issue for hard disks which employ 4K sectors, including both Solid State Drive (SSD) and advanced formatted mechanical hard disk. And actually it refers to the alignment between 4K physical sector and cluster.

A cluster is the smallest logical amount of disk space that can be allocated to hold a file. One cluster can only hold content of one file no matter how small the file is.
-------- From Wikipedia
When we are creating partitions on hard disks which use 4K physical sector, the partition always does not start from the starting position of a physical sector, which means physical sector and cluster will be shifted:
As a result, reading data saved in one cluster will access 2 physical sectors, thus increasing access time or slowing down read speed. Similarly, writing a file will operate 2 physical sectors at least, which definitely will increase writing times and waste disk space. Therefore, it is very necessary to perform 4K alignment, and to align partition can help reach the goal.

如果不能对齐,那么每次操作系统的读写操作都会带来2倍的IO操作,性能当然大打折扣。

3. 正确的分区

[root@xxxx]# fdisk /dev/sdn
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x4395e819.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.
# 其实这里fdisk已经提示我们了要确保对齐

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').

Command (m for help): c
DOS Compatibility flag is not set

Command (m for help): u
Changing display/entry units to sectors

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
# 确保分区的起始的sector是8的整数倍
First sector (2048-3519069871, default 2048): 2048 
... ...

4. 硬盘读写性能测试

目前我看到资料,有2种测速方法

4.1 hdparm

[root@xxx]# hdparm -Tt /dev/sdn

/dev/sdn:
 Timing cached reads:   25226 MB in  2.00 seconds = 12627.23 MB/sec
 Timing buffered disk reads: 1060 MB in  3.00 seconds = 352.91 MB/sec

-t Perform timings of device reads for benchmark and comparison purposes. For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system (no other active processes) with at least a couple of megabytes of free memory. This displays the speed of reading through the buffer cache to the disk without any prior caching of data. This measurement is an indication of how fast the drive can sustain sequential data reads under Linux, without any filesystem overhead. To ensure accurate measurements, the buffer cache is flushed during the processing of -t using the BLKFLSBUF ioctl.

-T Perform timings of cache reads for benchmark and comparison purposes. For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system (no other active processes) with at least a couple of megabytes of free memory. This displays the speed of reading directly from the Linux buffer cache without disk access. This measurement is essentially an indication of the throughput of the processor, cache, and memory of the system under test.

-T 并不真的对硬盘读,主要测试的是CPU/cache/内存的吞吐能力
-t 才是对硬盘实际读的速度

4.2 dd

dd if=/dev/zero of=/bbd/ssd01/test.bdf  bs=2M count=1000 oflag=direct

dd if=/bbd/ssd01/test.bdf of=/dev/null  bs=2M count=1000 iflag=direct

5. 结论

经过重新格式化后,磁盘的读写速度大概有1倍左右的提升,说明对齐之后,性能确实有提升。

参考资料:

  1. SSD指标
  2. hugepage
  3. 硬盘概念
  4. 4k Alignment
  5. 测试硬盘的读写速度

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注