调试qemu 硬盘io的过程

好久没有水文章了……在家无聊,正好最近也是在研究虚拟化相关的东西,就调一调qemu中文件写入的流程吧。

这里说的写入是指,qemu启动的虚拟机,虚拟机中如果发生文件IO,那么qemu如何知道要更新对应的虚拟磁盘文件呢?qemu这方面我比较菜,说实话,刚接触不到1周,感觉能水的文章还是挺多的。而且本篇大概率会有错误……反正不管,先从这个开始吧。

先粘一下编译选项,后面换机器不用再找了……直接复制
./configure --target-list=x86_64-softmmu --enable-kvm --enable-debug --enable-debug-info --enable-modules --enable-vnc --disable-strip

为了方便调试,我将qemu启动的虚拟机设置成为TinyCore Linux(http://www.tinycorelinux.net/)。毕竟现在我还在老家,搞不到Linux电脑,实际的调试环境是Windows上跑一个VirtualBox,里面跑个Linux,Linux再跑Qemu,如果是比较完整的Linux,估计我这台老爷机得卡死,所以一切最简化,用这个Linux安装一个命令行版的就可以了。

(后记:因为我启动参数配置错误,整个虚拟机跑在tcg模式下,性能依旧很慢,不过先不管这些,直接看看tcg下是如何通知到硬盘写入操作的,是否和kvm不同。)

我为虚拟机设置的磁盘格式是qcow2格式,然而问题来了,我该从哪里下手,换言之,我该断哪个函数?众所周知,也可能不知,与块设备相关的文件大部分位于block/下面。于是直接在block/下搜索qcow2 AND write,很快,发现几个函数,其中一个是qcow2_pre_write_overlap_check,看起来是一个很有用的校验函数。gdb挂上qemu后下个断点,很快地,就能断到它。

Thread 5 (Thread 0x7f8f31d33700 (LWP 23615)):
#0  0x0000562359abf4f0 in qcow2_pre_write_overlap_check (bs=0x56235abb8280, ign=0, offset=359936, size=4096, data_file=true) at block/qcow2-refcount.c:2817
#1  0x0000562359ab132a in qcow2_co_pwritev_part (bs=0x56235abb8280, offset=32256, bytes=4096, qiov=0x7f8f14136db0, qiov_offset=0, flags=0) at block/qcow2.c:2513
#2  0x0000562359afe694 in bdrv_driver_pwritev (bs=0x56235abb8280, offset=32256, bytes=4096, qiov=0x7f8f14136db0, qiov_offset=0, flags=0) at block/io.c:1171
#3  0x0000562359b0066a in bdrv_aligned_pwritev (child=0x56235aa76db0, req=0x7f8f183e9e10, offset=32256, bytes=4096, align=1, qiov=0x7f8f14136db0, qiov_offset=0, flags=0) at block/io.c:1980
#4  0x0000562359b00e44 in bdrv_co_pwritev_part (child=0x56235aa76db0, offset=32256, bytes=4096, qiov=0x7f8f14136db0, qiov_offset=0, flags=0) at block/io.c:2137
#5  0x0000562359ae736b in blk_co_pwritev_part (blk=0x56235aaa6ed0, offset=32256, bytes=4096, qiov=0x7f8f14136db0, qiov_offset=0, flags=0) at block/block-backend.c:1211
#6  0x0000562359ae73bd in blk_co_pwritev (blk=0x56235aaa6ed0, offset=32256, bytes=4096, qiov=0x7f8f14136db0, flags=0) at block/block-backend.c:1221
#7  0x0000562359ae7b93 in blk_aio_write_entry (opaque=0x7f8f14024650) at block/block-backend.c:1415
#8  0x0000562359beafcb in coroutine_trampoline (i0=335845504, i1=32655) at util/coroutine-ucontext.c:115
#9  0x00007f8f504286b0 in __start_context () at /lib/x86_64-linux-gnu/libc.so.6
#10 0x00007f8f31d2ef80 in  ()
#11 0x0000000000000000 in  ()

coroutine_trampoline是qemu实现协程的主要函数,而进入的入口则是blk_aio_write_entry

搜索对blk_aio_write_entry的引用,可以发现仅有这两处引用:

block-backend.c
1424    return blk_aio_prwv(blk, offset, count, NULL, blk_aio_write_entry,
1428                        blk_aio_write_entry, flags, cb, opaque);

分别位于

1424
blk_aio_pwrite_zeroes -> blk_aio_prwv

1428:
blk_aio_pwritev -> blk_aio_prwv

而在blk_aio_prwv中,可以明显的看到这个协程的创建过程。

static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
                                void *iobuf, CoroutineEntry co_entry,
                                BdrvRequestFlags flags,
                                BlockCompletionFunc *cb, void *opaque) {
    BlkAioEmAIOCB *acb;
    Coroutine *co;

    blk_inc_in_flight(blk);
    acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
    acb->rwco = (BlkRwCo) {
        .blk    = blk,
        .offset = offset,
        .iobuf  = iobuf,
        .flags  = flags,
        .ret    = NOT_DONE,
    };
    acb->bytes = bytes;
    acb->has_returned = false;

    /* HERE */co = qemu_coroutine_create(co_entry, acb);
    bdrv_coroutine_enter(blk_bs(blk), co);

    acb->has_returned = true;
    if (acb->rwco.ret != NOT_DONE) {
        replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
                                         blk_aio_complete_bh, acb);
    }

    return &acb->common; }

协程非常类似于线程。但是协程是协作式多任务的,而线程典型是抢占式多任务的。这意味着协程提供并发性而非并行性。
知道协程的创建位置就好办了,继续往上层的blk_aio_prwv挂断点。

很快,我们可以拿到这样的栈,而且是带消息循环的栈,大致就能知道断点下对了。

#0  blk_aio_prwv (blk=0x55a4a09c5800, offset=0, bytes=4096, iobuf=0x7f1dc8036c60, co_entry=0x55a49e41d9d0 <blk_aio_read_entry>, flags=0, cb=0x55a49e0ddbc2 <dma_blk_cb>, opaque=0x7f1dc8036c00)
    at block/block-backend.c:1360
#1  0x000055a49e41ddc5 in blk_aio_preadv (blk=0x55a4a09c5800, offset=0, qiov=0x7f1dc8036c60, flags=0, cb=0x55a49e0ddbc2 <dma_blk_cb>, opaque=0x7f1dc8036c00) at block/block-backend.c:1479
#2  0x000055a49e0de16a in dma_blk_read_io_func (offset=0, iov=0x7f1dc8036c60, cb=0x55a49e0ddbc2 <dma_blk_cb>, cb_opaque=0x7f1dc8036c00, opaque=0x55a4a09c5800) at dma-helpers.c:243
#3  0x000055a49e0dde9a in dma_blk_cb (opaque=0x7f1dc8036c00, ret=0) at dma-helpers.c:168
#4  0x000055a49e0de119 in dma_blk_io (ctx=0x55a4a08876d0, sg=0x55a4a171b788, offset=0, align=512, io_func=0x55a49e0de11f <dma_blk_read_io_func>, io_func_opaque=0x55a4a09c5800, 
    cb=0x55a49e1cadf1 <ide_dma_cb>, opaque=0x55a4a171b460, dir=DMA_DIRECTION_FROM_DEVICE) at dma-helpers.c:232
#5  0x000055a49e0de1c7 in dma_blk_read (blk=0x55a4a09c5800, sg=0x55a4a171b788, offset=0, align=512, cb=0x55a49e1cadf1 <ide_dma_cb>, opaque=0x55a4a171b460) at dma-helpers.c:250
#6  0x000055a49e1cb11f in ide_dma_cb (opaque=0x55a4a171b460, ret=0) at hw/ide/core.c:915
#7  0x000055a49e1d4d79 in bmdma_cmd_writeb (bm=0x55a4a171c5b0, val=9) at hw/ide/pci.c:306
#8  0x000055a49e1d5aad in bmdma_write (opaque=0x55a4a171c5b0, addr=0, val=9, size=1) at hw/ide/piix.c:75
#9  0x000055a49df42831 in memory_region_write_accessor (mr=0x55a4a171c700, addr=0, value=0x7f1dd8ea5a48, size=1, shift=0, mask=255, attrs=...) at /home/leon/qemu-4.2.0/memory.c:483
#10 0x000055a49df42a18 in access_with_adjusted_size (addr=0, value=0x7f1dd8ea5a48, size=1, access_size_min=1, access_size_max=4, access_fn=0x55a49df42771 <memory_region_write_accessor>, 
    mr=0x55a4a171c700, attrs=...) at /home/leon/qemu-4.2.0/memory.c:544
#11 0x000055a49df459c2 in memory_region_dispatch_write (mr=0x55a4a171c700, addr=0, data=9, op=MO_8, attrs=...) at /home/leon/qemu-4.2.0/memory.c:1475
#12 0x000055a49dee5a07 in address_space_stb (as=0x55a49eeac0e0 <address_space_io>, addr=49216, val=9, attrs=..., result=0x0) at /home/leon/qemu-4.2.0/memory_ldst.inc.c:378
#13 0x000055a49e0a7d16 in helper_outb (env=0x55a4a0bfa3e0, port=49216, data=9) at /home/leon/qemu-4.2.0/target/i386/misc_helper.c:33
#14 0x00007f1dbd998d65 in code_gen_buffer ()
#15 0x000055a49df7ad63 in cpu_tb_exec (cpu=0x55a4a0bf1b80, itb=0x7f1dbde60980 <code_gen_buffer+31852886>) at /home/leon/qemu-4.2.0/accel/tcg/cpu-exec.c:172
#16 0x000055a49df7bc47 in cpu_loop_exec_tb (cpu=0x55a4a0bf1b80, tb=0x7f1dbde60980 <code_gen_buffer+31852886>, last_tb=0x7f1dd8ea6078, tb_exit=0x7f1dd8ea6070)
    at /home/leon/qemu-4.2.0/accel/tcg/cpu-exec.c:618
#17 0x000055a49df7bf61 in cpu_exec (cpu=0x55a4a0bf1b80) at /home/leon/qemu-4.2.0/accel/tcg/cpu-exec.c:731
#18 0x000055a49df33eb8 in tcg_cpu_exec (cpu=0x55a4a0bf1b80) at /home/leon/qemu-4.2.0/cpus.c:1473
#19 0x000055a49df3470e in qemu_tcg_cpu_thread_fn (arg=0x55a4a0bf1b80) at /home/leon/qemu-4.2.0/cpus.c:1781
#20 0x000055a49e50488c in qemu_thread_start (args=0x55a4a0956070) at util/qemu-thread-posix.c:519
#21 0x00007f1df39476db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#22 0x00007f1df366988f in clone () from /lib/x86_64-linux-gnu/libc.so.6

基本就是ioport直接写的方式。通过这个硬件直接操作的方式,向cmd646设备写数据,来通知bmdma_write后面一系列函数。具体的后面再看,等过段时间我去linux机器上再确认Kvm的通知方式是否不一样,虽然感觉应该是一样的。

标签:none

添加新评论

captcha
请输入验证码