• Welcome to LiuJason's Blog!

Debian10下ext4日志journal使用外部缓存设备|用SSD加速ext4

Linux笔记 Jason 4 years ago (2020-01-29) 901 Views 0 Comments QR code of this page
文章目录[隐藏]

前言

最近公司组了一套集群:3个90TB的存储服务器和3个计算服务器,计算服务器后续还会增加,然后内网使用10G的万兆交换机相连。一开始使用NFS将存储服务器的磁盘空间共享给计算服务器,但是发现计算服务器对小文件的读写较多,线程数一多就出现 IO Delay,实际看到网卡的流量并不高。
也就是说:瓶颈出现在网络的IO上,而非带宽上!
解决方案:换iSCSI,然后加上本地缓存。
可能的几种尝试:ZFS(内存缓存)、Bcache(SSD缓存)、LVM+缓存、ext4 journal(写日志)

为什么选ext4 journal

最开始我也考虑过直接NFS上开启本地缓存(FS-CACHE),但是看到缓存会有很多问题,比如多出挂载的时候会出现一致性错误,由于这个平台不止我一个人在管理,怕到时候忘记交代了出现问题,因此直接选择一对一的iSCSI更合适,而且块级存储的iSCSI比起目录存储的NFS性能上更好一些(SCSI is just about the fastest protocol you'll find. It's basically straight disk block access over a wire)。
然后在有iSCSI的条件下,本地缓存本来是想用ZFS的,因为是单盘iSCSI,所以也不用担心阵列导致的问题以及ZFS根据硬件信息调控磁盘IO的问题,冗余性已经交给存储服务器的硬件raid了。网上也确实有人这么做了,但是稳定性和安全性没有得到确认,毕竟是生产环境,还是稳一点好。
LVM和ext4其实没太大差别,如果为了后续扩容考虑的话LVM其实是更好的选择,但是目前分过来的容量目测也够了,而且ext4上存虚拟机磁盘可以按实际用量存储,而lvm除非用thin构架,否则开多少磁盘就占用多少。
这里另附一个缓存性能对比表,仅供参考

前置条件

首先iSCSI target配置和挂载看这里:https://www.liujason.com/article/502.html
然后进行如下操作:

  • mkfs.ext4 /dev/sdc 把iSCSI磁盘格式化
  • 新建目录mkdir /iscsi
  • 写入/etc/fstab:/dev/sdc /iscsi ext4 default 0 0
  • 挂载上去mount -a

挂载完成后查看磁盘情况:

root@PVE-EU-1 ~ # df -h
Filesystem                Size  Used Avail Use% Mounted on
udev                      126G     0  126G   0% /dev
tmpfs                      26G  1.3M   26G   1% /run
/dev/mapper/vg0-root      781G   37G  711G   5% /
tmpfs                     126G   66M  126G   1% /dev/shm
tmpfs                     5.0M     0  5.0M   0% /run/lock
tmpfs                     126G     0  126G   0% /sys/fs/cgroup
/dev/md0                  487M  110M  352M  24% /boot
/dev/fuse                  30M  152K   30M   1% /etc/pve
tmpfs                      26G     0   26G   0% /run/user/0
172.17.2.1:/storage        74T  5.8T   69T   8% /mnt/pve/NFS-EU-1
172.17.3.1:/pve-eu-3-zfs   54T  619G   54T   2% /mnt/pve/NFS-EU-3
/dev/sdc                  5.0T   14G  4.7T   1% /iscsi

开始配置缓存

首先现在SSD上VG里面分出来一个LV作为缓存(/dev/vg0/cache),然后将这个块作缓存设备:

root@PVE-EU-1 ~ # mke2fs -O journal_dev /dev/vg0/cache
mke2fs 1.44.5 (15-Dec-2018)
Discarding device blocks: done                            
Creating filesystem with 8388608 4k blocks and 0 inodes
Filesystem UUID: d56e9b69-e012-4a6f-808c-474de9182b55
Superblock backups stored on blocks: 

Zeroing journal device:

查看iSCSI磁盘的ID

root@PVE-EU-1 ~ # ls -l /dev/disk/by-path/ip-* /dev/disk/by-id/scsi-*
lrwxrwxrwx 1 root root 9 Jan 30 10:12 /dev/disk/by-id/scsi-360000000000000000e00000000010001 -> ../../sdd
lrwxrwxrwx 1 root root 9 Jan 30 10:12 /dev/disk/by-path/ip-172.17.3.107:3260-iscsi-iqn.2020-01.pve-eu-3:iscsieu1-lun-1 -> ../../sdd

创建使用外部缓存的ext4磁盘:

root@PVE-EU-1 ~ # mkfs.ext4 -J device=/dev/vg0/cache /dev/disk/by-id/scsi-360000000000000000e00000000010001
mke2fs 1.44.5 (15-Dec-2018)
Using journal device's blocksize: 4096
Creating filesystem with 1342177280 4k blocks and 167772160 inodes
Filesystem UUID: b217c0e8-3fc9-4b5b-97ef-629a2a295b22
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
        102400000, 214990848, 512000000, 550731776, 644972544

Allocating group tables: done                            
Writing inode tables: done                            
Adding journal to device /dev/vg0/cache: done
Writing superblocks and filesystem accounting information: done

然后根据最开始的那张性能图,设置fstab参数:

/dev/disk/by-id/scsi-360000000000000000e00000000010001 /iscsi ext4 rw,relatime,journal_checksum,journal_async_commit 0 0

挂载即可

性能测试

使用fio进行io读写性能测试
缓存前:

#顺序读
mytest: (groupid=0, jobs=30): err= 0: pid=18491: Wed Jan 29 23:06:05 2020
  read: IOPS=39.7k, BW=620MiB/s (651MB/s)(6204MiB/10001msec)
    clat (usec): min=112, max=31486, avg=753.33, stdev=435.77
     lat (usec): min=112, max=31487, avg=753.44, stdev=435.77
    clat percentiles (usec):
     |  1.00th=[  400],  5.00th=[  490], 10.00th=[  545], 20.00th=[  603],
     | 30.00th=[  644], 40.00th=[  685], 50.00th=[  717], 60.00th=[  750],
     | 70.00th=[  791], 80.00th=[  857], 90.00th=[  971], 95.00th=[ 1090],
     | 99.00th=[ 1401], 99.50th=[ 1713], 99.90th=[ 4228], 99.95th=[ 6849],
     | 99.99th=[21890]
   bw (  KiB/s): min=16448, max=23776, per=3.32%, avg=21116.73, stdev=1854.02, samples=578
   iops        : min= 1028, max= 1486, avg=1319.78, stdev=115.89, samples=578
  lat (usec)   : 250=0.02%, 500=5.69%, 750=54.01%, 1000=31.81%
  lat (msec)   : 2=8.15%, 4=0.22%, 10=0.08%, 20=0.02%, 50=0.01%
  cpu          : usr=0.28%, sys=0.83%, ctx=397789, majf=0, minf=120
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=397086,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=620MiB/s (651MB/s), 620MiB/s-620MiB/s (651MB/s-651MB/s), io=6204MiB (6506MB), run=10001-10001msec

Disk stats (read/write):
  sdc: ios=353516/28, merge=33863/2, ticks=265290/62, in_queue=2936, util=98.47%
#顺序写
mytest: (groupid=0, jobs=30): err= 0: pid=19381: Wed Jan 29 23:08:33 2020
  write: IOPS=18.8k, BW=294MiB/s (309MB/s)(2944MiB/10002msec); 0 zone resets
    clat (usec): min=550, max=56581, avg=1590.78, stdev=1617.61
     lat (usec): min=550, max=56581, avg=1591.07, stdev=1617.61
    clat percentiles (usec):
     |  1.00th=[  938],  5.00th=[ 1020], 10.00th=[ 1074], 20.00th=[ 1139],
     | 30.00th=[ 1188], 40.00th=[ 1254], 50.00th=[ 1319], 60.00th=[ 1401],
     | 70.00th=[ 1500], 80.00th=[ 1729], 90.00th=[ 2180], 95.00th=[ 2474],
     | 99.00th=[ 5669], 99.50th=[10814], 99.90th=[25822], 99.95th=[30802],
     | 99.99th=[54789]
   bw (  KiB/s): min= 5280, max=13024, per=3.33%, avg=10031.98, stdev=2006.96, samples=594
   iops        : min=  330, max=  814, avg=626.99, stdev=125.44, samples=594
  lat (usec)   : 750=0.02%, 1000=3.18%
  lat (msec)   : 2=83.85%, 4=11.32%, 10=1.09%, 20=0.37%, 50=0.15%
  lat (msec)   : 100=0.02%
  cpu          : usr=0.19%, sys=1.08%, ctx=188705, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,188435,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=294MiB/s (309MB/s), 294MiB/s-294MiB/s (309MB/s-309MB/s), io=2944MiB (3087MB), run=10002-10002msec

Disk stats (read/write):
  sdc: ios=0/184696, merge=0/4796, ticks=0/294868, in_queue=29140, util=95.73%
#随机读
mytest: (groupid=0, jobs=30): err= 0: pid=20702: Wed Jan 29 23:12:15 2020
  read: IOPS=39.0k, BW=610MiB/s (639MB/s)(6098MiB/10002msec)
    clat (usec): min=278, max=102296, avg=767.45, stdev=299.23
     lat (usec): min=279, max=102296, avg=767.57, stdev=299.24
    clat percentiles (usec):
     |  1.00th=[  465],  5.00th=[  562], 10.00th=[  611], 20.00th=[  660],
     | 30.00th=[  693], 40.00th=[  717], 50.00th=[  742], 60.00th=[  775],
     | 70.00th=[  807], 80.00th=[  848], 90.00th=[  930], 95.00th=[ 1020],
     | 99.00th=[ 1254], 99.50th=[ 1532], 99.90th=[ 3261], 99.95th=[ 4555],
     | 99.99th=[ 8979]
   bw (  KiB/s): min=16672, max=22400, per=3.33%, avg=20813.92, stdev=821.16, samples=577
   iops        : min= 1042, max= 1400, avg=1300.85, stdev=51.33, samples=577
  lat (usec)   : 500=1.82%, 750=49.96%, 1000=42.44%
  lat (msec)   : 2=5.58%, 4=0.14%, 10=0.06%, 20=0.01%, 100=0.01%
  lat (msec)   : 250=0.01%
  cpu          : usr=0.31%, sys=1.37%, ctx=390537, majf=0, minf=120
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=390285,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=610MiB/s (639MB/s), 610MiB/s-610MiB/s (639MB/s-639MB/s), io=6098MiB (6394MB), run=10002-10002msec

Disk stats (read/write):
  sdc: ios=386399/7, merge=0/2, ticks=293426/13, in_queue=1516, util=99.04%
#随机写
mytest: (groupid=0, jobs=30): err= 0: pid=21040: Wed Jan 29 23:13:18 2020
  write: IOPS=1441, BW=22.5MiB/s (23.6MB/s)(227MiB/10068msec); 0 zone resets
    clat (usec): min=152, max=275020, avg=20724.80, stdev=31748.12
     lat (usec): min=152, max=275020, avg=20725.15, stdev=31748.13
    clat percentiles (usec):
     |  1.00th=[   180],  5.00th=[   204], 10.00th=[   221], 20.00th=[   269],
     | 30.00th=[  8455], 40.00th=[ 10028], 50.00th=[ 10814], 60.00th=[ 11731],
     | 70.00th=[ 13042], 80.00th=[ 28967], 90.00th=[ 63177], 95.00th=[ 91751],
     | 99.00th=[141558], 99.50th=[158335], 99.90th=[274727], 99.95th=[274727],
     | 99.99th=[274727]
   bw (  KiB/s): min=   96, max= 2432, per=3.36%, avg=774.74, stdev=635.51, samples=597
   iops        : min=    6, max=  152, avg=48.39, stdev=39.72, samples=597
  lat (usec)   : 250=16.76%, 500=11.93%, 750=0.26%, 1000=0.08%
  lat (msec)   : 2=0.05%, 4=0.10%, 10=10.40%, 20=38.11%, 50=9.51%
  lat (msec)   : 100=8.40%, 250=4.31%, 500=0.11%
  cpu          : usr=0.04%, sys=0.17%, ctx=29101, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,14515,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=22.5MiB/s (23.6MB/s), 22.5MiB/s-22.5MiB/s (23.6MB/s-23.6MB/s), io=227MiB (238MB), run=10068-10068msec

Disk stats (read/write):
  sdc: ios=0/14333, merge=0/2, ticks=0/9903, in_queue=5636, util=53.16%

这个随机写真的是惨不忍睹啊,如果能把随机写入改为连续写入那性能会大幅提升呀!
缓存后:

#随机读
mytest: (groupid=0, jobs=30): err= 0: pid=3640: Wed Jan 29 23:57:58 2020
  read: IOPS=43.3k, BW=677MiB/s (710MB/s)(300MiB/443msec)
    clat (usec): min=157, max=1523, avg=683.14, stdev=120.39
     lat (usec): min=157, max=1523, avg=683.25, stdev=120.40
    clat percentiles (usec):
     |  1.00th=[  388],  5.00th=[  494], 10.00th=[  545], 20.00th=[  603],
     | 30.00th=[  635], 40.00th=[  660], 50.00th=[  685], 60.00th=[  709],
     | 70.00th=[  734], 80.00th=[  758], 90.00th=[  807], 95.00th=[  857],
     | 99.00th=[ 1057], 99.50th=[ 1188], 99.90th=[ 1369], 99.95th=[ 1401],
     | 99.99th=[ 1516]
  lat (usec)   : 250=0.15%, 500=5.40%, 750=71.23%, 1000=21.57%
  lat (msec)   : 2=1.65%
  cpu          : usr=0.67%, sys=0.67%, ctx=19555, majf=0, minf=120
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=19200,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=677MiB/s (710MB/s), 677MiB/s-677MiB/s (710MB/s-710MB/s), io=300MiB (315MB), run=443-443msec

Disk stats (read/write):
  sdc: ios=17055/0, merge=62/0, ticks=11589/0, in_queue=0, util=80.65%
#随机写
mytest: (groupid=0, jobs=30): err= 0: pid=2942: Wed Jan 29 23:56:17 2020
  write: IOPS=12.5k, BW=195MiB/s (205MB/s)(1952MiB/10002msec); 0 zone resets
    clat (usec): min=344, max=168170, avg=2393.38, stdev=6630.45
     lat (usec): min=345, max=168170, avg=2393.65, stdev=6630.46
    clat percentiles (usec):
     |  1.00th=[   717],  5.00th=[   930], 10.00th=[  1057], 20.00th=[  1188],
     | 30.00th=[  1270], 40.00th=[  1336], 50.00th=[  1418], 60.00th=[  1516],
     | 70.00th=[  1647], 80.00th=[  1844], 90.00th=[  2573], 95.00th=[  4752],
     | 99.00th=[ 23200], 99.50th=[ 38011], 99.90th=[106431], 99.95th=[116917],
     | 99.99th=[164627]
   bw (  KiB/s): min=  768, max=11488, per=3.30%, avg=6591.90, stdev=2992.13, samples=586
   iops        : min=   48, max=  718, avg=411.97, stdev=187.00, samples=586
  lat (usec)   : 500=0.02%, 750=1.38%, 1000=5.52%
  lat (msec)   : 2=77.11%, 4=10.43%, 10=2.90%, 20=1.48%, 50=0.84%
  lat (msec)   : 100=0.10%, 250=0.22%
  cpu          : usr=0.13%, sys=0.50%, ctx=125486, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,124957,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=195MiB/s (205MB/s), 195MiB/s-195MiB/s (205MB/s-205MB/s), io=1952MiB (2047MB), run=10002-10002msec

Disk stats (read/write):
  sdc: ios=0/122966, merge=0/887, ticks=0/294579, in_queue=109328, util=79.43%

This article is under CC BY-NC-SA 4.0 license.
Please quote the original link:https://www.liujason.com/article/505.html
Like (0)
发表我的评论
取消评论

表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址