Skip to content
This repository has been archived by the owner on Jun 13, 2024. It is now read-only.

diskstream produces extra 2MB chunk in fixed vhd #39

Open
mischief opened this issue Nov 23, 2016 · 12 comments
Open

diskstream produces extra 2MB chunk in fixed vhd #39

mischief opened this issue Nov 23, 2016 · 12 comments

Comments

@mischief
Copy link

hello,

i've recently written some tooling using this repo's code for uploading VHD images to Azure while converting from Dynamic to Fixed disks on the fly. unfortunate there seems to be an extra 2M block inserted in the file, which appears to have caused Azure to reject it.

this is trivially reproducible by creating some Dynamic VHD file, e.g. qemu-img create -f vpc small.vhd 1G, and then using DiskStream.Read to read it and write an on-the-fly converted Fixed disk (e.g., using io.Copy). if the result of that is compared with a different conversion mechanism that works (qemu-img convert -f vpc -O raw small.vhd small.raw; qemu-img convert -f raw -O vpc -o subformat=fixed small.raw small-2.vhd) you will notice there is 2MB extra data in the file produced by azure-vhd-utils.

i haven't been able to find the offending code yet, but i'm reporting this to get more eyes.

i will add some more detail later.

@anuchandy
Copy link
Member

@mischief can you elaborate the part which appears to have caused Azure to reject it. I think you are able to upload the VHD, after that what next operation on Azure cause issue? I am trying to repro this.

I will also take a look at why the extra 2 MB block is getting inserted.

@mischief
Copy link
Author

@anuchandy some folks at azure made us aware of the issue.

We have root-caused that the reason the images are failing to start is
that the physical size of the VHD blobs does not match the size
reported in the VHD footer.  It is off by 2 MB.

InvalidVhd: The footer of VHD specifies size (32633782272) which does
not match VHD blob size (32635879936) - VHD Footer Size (512)

@anuchandy
Copy link
Member

anuchandy commented Nov 25, 2016

Ok, I have access to one of the CoreOS dynamic VHD prepared for Azure. Inspecting footer and header of this VHD.

Footer

$./vhd inspect footer --path /Users/anuthomaschandy/git-workspace/vhds/coreos_production_azure_image.vhd 
.....
CreatorApplication: qemu
CreatorVersion    : 327683
CreatorHostOsType : Windows
PhysicalSize      : 32633782272 bytes
VirtualSize       : 32633782272 bytes
DiskGeometry      : Cylinder:63232 Heads:16 Sectors:63
DiskType          : Dynamic
.....

Header

$./vhd inspect header --path /Users/anuthomaschandy/git-workspace/vhds/coreos_production_azure_image.vhd 
Cookie            : cxsparse
DataOffset        : -1
TableOffset       : 1536
HeaderVersion     : 65536
MaxTableEntries   : 15562
BlockSize         : 2097152 bytes
CheckSum          : 4294964081
ParentUniqueID    : 0-0-0-00-000000
ParentTimeStamp   : 2000-01-01 00:00:00 +0000 UTC
Reserved          : 0
.....

As per VHD spec https://technet.microsoft.com/en-us/virtualization/bb676673.aspx#ECD, the description of Max Table Entries is: This field is there for holding the most number of entries that are present in the “Block Allocation Table” or BAT. This must be equivalent to the number of disk blocks which is calculated by dividing the disk size with the block size.

footer.VirtualSize / header.BlockSize = header.MaxTableEntries

32633782272 / 2097152 = 15561

So as per VHD spec the expected value is 15561 but the actual is 15562

The DiskStream.Size() is calculated by header.MaxTableEntries * header.BlockSize + 512, not footer.VirtualSize + 512. I would assume calculating this from the number of entries in the BAT (i.e. Max Table Entries) is correct because if we use size in footer that will cause last BAT entry to point to a block which not true block size since footer overlaps last 512 bytes of this block.

Two options here:

  1. Regenerate the footer with PhysicalSize and VirtualSize fields set to header.MaxTableEntries * header.BlockSize. This will match with blobSize - 512.
  2. Report this issue to qemu, as the tool seems not honoring the VHD specification.

1 seems a workaround over 2 bug. What do you think?

@anuchandy
Copy link
Member

anuchandy commented Nov 25, 2016

Above discussed fix is in the master, could you please test?

@mischief
Copy link
Author

@anuchandy ill test it today soon. thanks for looking into this.

@anuchandy
Copy link
Member

thanks @mischief, please not that you will still see the extra 2 MB but this time size in footer will be set to match with the sum of size of data blocks in the disk stream, so what we want to validate is the whether produced fixed disk boot in Azure or not.

@mischief
Copy link
Author

@anuchandy unfortunately if i recall correctly, that will be difficult. i was able to boot instances uploaded myself, but we later got a report from microsoft that other customers were not due to the hypervisor rejecting the image. i would rather all inconsistencies (size different, footer block sum) be fixed before going through the process again :-)

i will look more into qemu-img's implementation.

perhaps the validation code in https://github.com/Microsoft/azure-vhd-utils/blob/master/vhdcore/validator/validator.go should catch this condition? seems like a good place for it.

@mischief
Copy link
Author

@anuchandy this appears to be the calculation that qemu uses (https://github.com/qemu/qemu/blob/v2.7.0/block/vpc.c#L790):

num_bat_entries = (total_sectors + block_size / 512) / (block_size / 512);

i am going to try patching qemu without + block_size / 512, since it seems that produces the correct MaxTableEntries.

@mischief
Copy link
Author

@anuchandy ironically it looks like i am rediscovering an issue from almost 2 years ago - https://github.com/Azure/azure-xplat-cli/issues/1600#issuecomment-76768106

@anuchandy
Copy link
Member

anuchandy commented Nov 28, 2016

@mischief yes its the same issue. The fix was same for node cli as well, but i am still not sure how to fix this in qemu.

@mischief
Copy link
Author

@anuchandy i've patched qemu and we're testing it now. i think that since this appears to be a miscalculation in qemu-img when creating vpc (vhd), there's nothing that needs to be fixed in azure-vhd-utils.

however, if you think it is appropriate, there should be some validation in this library to check that the BAT size matches virtualsize/blocksize.

@anuchandy
Copy link
Member

@mischief cool, yes it make sense to add a validation - will do that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants