Shrinking the daily builds

When the 2.8 beta status was declared I decided to try to port some add-ons I maintain to 2.80. In the beta phase, the add-on API has mostly stabilized, but there’s still occasional changes that can make add-ons misbehave. So to make sure add-ons work, I often like to test them against the most recent version of Blender’s code.

There are two options to get the latest and greatest updates to Blender: downloading the source code and compiling it yourself, or downloading one of the daily Blender buildbot releases. While compiling Blender from source is fairly straightforward, I’ve found downloading daily builds to be easier.

After downloading daily builds, sometimes I compare them against previous releases with a diff tool to see exactly what changed. During these comparisons I’ve noticed a large number of files inside two daily releases will be byte-to-byte identical. Seeing all the duplicate files, I was curious how much bandwidth could be saved if just downloading the files that changed was an option instead of downloading an entire build.

To investigate this I compared these 2 daily builds I had on my hard drive against one another:

blender-2.80.0-git-dc3b5024be1a-windows64 (2019-01-23)
Directory info: 4,824 files, 498 directories
Total build size:
Compressed: 119 MiB (125,167,090 bytes)
Uncompressed: 329 MiB (345,067,650 bytes)

blender-2.80.0-git.a1ae04d15a9f-windows64 (2019-01-30)
Directory info: 4,820 Files, 498 directories
Total build size:
Compressed: 119 MiB (125,176,390 bytes)
Uncompressed: 329 MiB (345,072,704 bytes)

To make sure there was a fair number of changes between the builds I used daily builds released roughly a week apart.

With daily builds, it’s usually just the Blender executable and scripts directory that will be updated. This was the case with the above two builds. Here is how the executable and scripts directory from the two builds compare:

blender-2.80.0-git-dc3b5024be1a-windows64
Blender executable size:
Compressed: 30 MiB (31,514,665 bytes)
Uncompressed: 81 MiB (85,029,376 bytes)
Scripts directory size:
Compressed: 30.7 MiB (32,224,547 bytes)
Uncompressed: 86.8 MiB (91,072,440 bytes)

blender-2.80.0-git.a1ae04d15a9f-windows64
Blender executable size:
Uncompressed: 81.1 MiB (85,066,752 bytes)
Compressed: 30.0 MiB 31,529,457 bytes)
Scripts directory size:
Uncompressed: 86.8 MiB (91,040,118 bytes)
Compressed: 30.7 MiB (32,220,885 bytes)

The Blender executable and scripts directory usually take up around 160 MiB uncompressed, but they can be squeezed them down to around 60 MiB by zipping them. 60 MiB a substantial reduction from the usual 120 MiB file size for the windows daily releases, but there are better ways to reduce download size.

To really save on file size and bandwidth, file patching is the way to go. File patching is basically updating software to a newer version by applying a patch containing only the bytes in a file that changed instead of downloading a whole file. This practice is very common with operating system updates, video games, web browsers, and other software, but not so much with Blender releases. So I wanted to see how much benefit file patching could provide.

As this was a just for testing, I used the first open source option I could find with a directory patching option. In this case that option was “xdelta3” with the aid of the “xdelta3-dir-patcher” python script. With this setup creating and applying patches has to be done from the command line. To create a patch (or “diff”) from the Blender builds, this was the command I used:

python C:\patch_test\xdelta3-dir-patcher.py diff C:\patch_test\blender-2.80.0-git-dc3b5024be1a-windows64 C:\patch_test\blender-2.80.0-git-a1ae04d15a9f-windows64 patch_dc3b5024be1a_to_a1ae04d15a9f.tar.gz

The resulting patch file…

patch_dc3b5024be1a_to_a1ae04d15a9f.tar.gz
13.0 MiB (13,677,202 bytes)

Yes, the patch is an impressive 13 MiB in size. That means if this patch was available online, it would be possible to update from the “dc3b5024be1a” build to the “a1ae04d15a9f” build just by downloading the 13 MiB file and applying the patch instead of having to download the full 119 MiB daily build (close to a 90% reduction in file size). This is obviously a huge bandwidth saver.

To verify the patch works, I applied the newly generated patch back to the older build directory:

python C:\patch_test\xdelta3-dir-patcher.py apply C:\patch_test\blender-2.80.0-git-dc3b5024be1a-windows64 patch_dc3b5024be1a_to_a1ae04d15a9f.tar.gz C:\patch_test\new-a1ae04d15a9f-dir --ignore-euid

The newly created new-a1ae04d15a9f-dir was byte-to-byte identical to the contents of the “blender-2.80.0-git-a1ae04d15a9f-windows64” daily build.

So this raises the question, could this setup be used against the current Blender daily builds? It’s possible, but there are several caveats that are worth mentioning:

  • This could likely be done directly with buildbot, but generating patches for daily releases after building them would create some additional overhead
  • Xdelta3 stores some username information in file patches which may not always be desirable
  • the xdelta3-dir-patcher script does not currently support patching over date and time info on Windows (which is why the --ignore-euid option was used above). This means Windows users would have all the files in the “patched” directory have their creation time set to the time of the time the patch was applied instead of when the updated build was released.
  • There are more robust (but more CPU intensive) patching options available like Google’s Courgette (these weren’t tested as I was unable to find an “off-the-shelf” directory patching option)
  • Significantly reducing the size of updates will certainly lower bandwidth consumption for people downloading Blender, but it may actually increase bandwidth consumption on the other end because of something called Jevons Paradox
  • As a counterpoint to the above, patching could also lead to code updates getting more end-user testing if it led to end-users updating daily builds more often

Compression and bandwidth savings are always interesting topics. That said, you also have to consider how many people would care enough to warrant the extra hassle of having to patch a file, versus just download the one they want.

In the end, there may only be 5 people that download our software more than once, and of those, maybe only 2 that are bandwidth constrained enough to care. If it were that big of a deal, they would probably just start compiling their own Blender, which would reduce the bandwidth down to a few hundred kilobytes a day worth of git pulls. Hard to beat that.

Seems very similar to mp3 vs. ogg. Ogg was always “better”, but people like Twit.tv stopped providing it because only 3 or 4 people would download it, compared to the time and effort it required to encode and host it.

Anyway, just some things to consider. Cheers!

It would be great to have an easy way to update daily builds, with minimal download size. By itself patching seems too much of a hassle, it needs to be part of an integrated system.

1 Like

How 10 KBs sounds for an update ? :smiley:

that’s how large are the updates of my commercial build of Blender. Of course my build is customized to be build as individual DLLs so I am kinda cheating there. I did not use this architecture for tiny size updates though, my aim was to modularize my code, eliminate build/compile time almost completely and support live coding. But it is a nice side effect.

I will have to agree with Brecht , patching is a bit tedious cause the user will have to know exactly the precise build to apply the patch unless someone makes an auto updater to handle this.

Also I am not so sure there a real need for this, how many people actually bother downloading experimental builds ? Many need convincing to even switch to 2.8 beta. But if there is enough users I guess its a nice solution.

Call me crazy but I am fascinated by torrent technology, I always thought that one needs a lot of seeders to get things done but i recently did an experiment of torrenting a zip file from one computer to another computer that were connected to the same network via wifi and it went better than expected. No the speed was not close to your usual download speeds but not that bad either for 1 seeder. I am examining libtorrent library for several usages , auto update is one of them. Probably a poor replacement for the standard practice but workable none the less.

Compared to the 30 seconds it takes me to download the full file?

At some point you just end up reinventing Steam, or some of the stuff that Ubuntu uses to distribute files, such as jigdo. Even docker would be sufficient here, provided that you are shipping to Linux users (although technically Windows too now, iirc).

Torrents was something I was looking into in our VM’s. Getting the Tracker up is fine, but you still have to seed it (had some issues here due to NAT, so was unable to fully test). It works, but it doesn’t save bandwidth, it just distributes it, which kinda defeats the whole thread suggestion of reduction the size. Unless you are talking about P2P the patches. But this gets back to the idea of Steam or some fancier Linux distributes like pacman/yum that use binary deltas.

We do have an rsync server which could possibly help in some ways, but only if we extract the files. If we are going to go so far as wanting binary deltas though, we might as well just have an SVN server that we commit the files to.

Out of all of these things, I think offering a Steam “beta” snapshot is the best all around, due to multi-arch support already for them. The tricky part is more about automating the upload of these files into Steam, in a secure fashion! After all, you would make our buildbot pretty much the juiciest target on the planet.

Not worth the hassle imho, there are relatively few people who update who update to every single build we do (there’s days the buildbots get kicked multiple times, it’s not just the daily build) so you’ll get multiple 13mb updates stacked and you may as well end up downloading the complete package. (especially if people only update once every few days/weeks)

We could do it for official builds only (ie 2.79a -> 2.79b) but there it’s not worth the hassle cause on our normal schedule we have like 3-4 releases a year? lots of hassle so people can save out a 100mb download 3-4 times a year.

Sounds like something we could do, but i’m pretty sure our time is better spend elsewhere.

4 Likes

For people who would want this, doing your own builds seems like a better and more complete solution. It gets you the latest changes as much as a day earlier than waiting for the nightly build, and git seems quite efficient at synchronizing with remotes. I can usually update and build faster than I can download and extract the daily zip.

I just don’t think there are many people who are going to want to download almost every daily build who are not already doing their own builds. Almost certainly not enough to justify supporting this as a general public feature.

And there would likely be substantial ongoing support issues keeping an incremental update feature working across all platforms and every individual user environment.

Also, just saying, diffing two Blender versions “to see what changed” is usually going to tell you very little about what changed. It’s much more effective to skim the Diffusion commit History for the project(s) you’re interested in rather than trying to guess based on which files changed. And doing so might eliminate the need to download that day’s version if there are no changes that seem relevant to you.

I do think there would be interest in this. There’s already a large number of people using the 2.80 daily builds and I’ve seen a lot of requests for 2.80 versions of various add-ons. There was also a studio that developed a GUI app (and a command line app) just for downloading the latest builds that gained a fair bit of attention a while back.

I agree that the setup I used in my tests is not going to work for the most end-users, but (as was already mentioned) if it could be wrapped inside an easy to use package (like the BlenderUpdater app) I think it would get a lot of use.

I’m not sure if Steam would be a good fit as I don’t think it allows for “rollbacks” to previous versions (in cases where a new daily build was DOA). I also don’t think torrenting would work for something that would need to have a new tracker everyday.

While it’s not possible to match the efficiency of “git pull”, in order to reach that level of efficiency most end-users would first have to download over 10 GB worth of dependencies (at least on Windows, I can’t remember the situation on Linux and Mac). That’s a fairly deep hole for people to climb out of before they start seeing bandwidth savings at the “git pull” stage. In comparison, none of the patching tools I mentioned are more than a few hundred kilobytes. The biggest dependency for patching (other than the patches themselves) might be be Python, but most Linux and MacOS environments already ship with Python installed and the Windows installer is only around 30 MB.

Even with a dev environment set up, I still find myself often using daily builds, in part, to make sure I am using the same build that people using my add-ons are.

As far as portability concerns go, the patching tools I listed have already been tested on the 3 operating systems Blender supports (the Mac build of Xdelta may be too outdated, but I think there may be a more recent Mac version in multipatch). Another option could be porting the XDelta3 script to work with Courgette (which should build on all major platforms as it’s what Google uses to create updates for Chrome).

Thanks for all the replies. I’m still not sure how viable this idea is, but my early results were quite promising, so I wanted to share those results and get some opinions on this before I sink anymore time in.

Your idea is very viable it’s like always a matter of where to find the time to cram in another cool feature and who will do it.

Even for my project I debate how high priority is an auto updater for the near future.

One of the things I discovered during my torrent experiment is that if I re-torrent the same folder because bittorrent uses checksum hashes to identify files , the person that got the previous torrent will only get the updated files instead of the entire folder again. Unfortunately because Blender size is mainly the executable itself this cannot be as efficient as patching. However the bonus is the torrent file can be send as a link to a user which makes it easier to communicate updates. Because there will be no need to create patches , or worry about whether the patch is applied over the correct version etc. The torrent describes the entire folder so there is no chance of screwing things up or even compromising security with malicious patches.Thus torrent takes the responsibility to make sure that the users gets the exact files he intends without the need of special setup. So basically this is standard behavior you can get with any torrent client, I have tested this with qbittorrent.

Blizzard updater (battlenet client) uses libtorrent this way to provide updates for all its games. So this is heavily mature technology for updates. Although I do not know about the exact torrent technology Blizzard uses being closed source.

This technique can be used to go forward in version, backward or even update a normal build with a customized version, append additional addons and files etc. If speed of download is an issue, or lack of seeders bittorent also support direct HTTPS downloads. If one is paranoid it also supports downloading without using tracker by declaring one trusted source.There are virtually no limitations

Most likely there is an implementation out there, open source, that does this with patches.

Compression wise the good news is that Blender comes included with 7ZIP SDK known as LMZA

https://www.7-zip.org/sdk.html

In my build take the 83 MB Blender.exe and turns it to 20MB Blender.7z and we can make safely the assumption here that the user already has an existing build of Blender. That makes it possible to turn Blender itself into an auto updater.

You could do it with something like rsync. If you could simply have a publicly accessible rsync daemon setup with the extracted daily zip archives, then just point an rsync client at the location of the version/platform you want, and rsync will go through and compare checksums on each file, and for a file that’s different it will even recursively compare the parts of the file so that it only needs to transfer the parts that differ.

The beauty of this scheme is that it puts all the responsibility on the client side, and there’s no new development required on the server side apart from some trusted party setting up the public rsync daemon and a process to extract each night’s build.

If it actually became popular, you could implement an rsync add-on within Blender to automate it, and then you might actually be able to substantially reduce bandwidth needed for Blender downloads and thus save BF some money.

You realise that we get sponsored rack space and free bandwidth from XS4ALL in the rack at the data center, right? :wink:

Also we already have an rsync available, it’s just not anonymous and doesn’t serve the daily builds.

@kilon Oh, you meant using torrents for the hashing feature instead of the distribution setup. Honestly, I don’t think this would be worth the effort. I’ve actually tried to use torrents for patching files in the past and quite often it failed hard. The reason is the “torrent format” doesn’t actually hash files, but file “chunks”.

To see why using a torrent to patch a blender directory is problematic, I’ll use an arbitrary example. Say a torrent’s chunk size was set to 8 bytes and the torrent file contained a single 10 byte file. That file would be split into 2 chunks to be hashed, the first 8 bytes of the file going to the 1st chunk hash and the latter 2 bytes would go into a second chunk hash. So what happens if a single byte is added to the beginning of the 10 byte file in an update and a second torrent is created? Anyone trying to “patch” the old 10 byte file using the second torrent will end up having to re-download the whole file. Why? Because torrents check the chunk hashes by their location in a file:

old_file: ABCDEFGHIJ   chunk_1: ABCDEFGH  chunk_2: IJ
new_file: KABCDEFGHIJ  chunk_1: KABCDEFG  chunk_2: HIJ

For this reason, I suspect Blizzard may be using some non-standard derivation of the torrent format rather than the original specification. I did a quick search, but did not find any info on it.

Thanks for the info on LMZA though. I didn’t know that was included with Blender.

@Zoot That sounds interesting, but is there something like rsync that’s cross platform? The last time I checked, rsync support on Windows was limited to non-existent (and the rsync version Mac OS shipped with was ancient). If there’s no support on one of Blender’s largest platforms, you would need a custom solution for that platform and we’re back at square one.

1 Like

Yeap I agree with your assesment. I am not so sure how it would work in practice, however from what I seen in libtorrent it does allow you to customise how you tread patches. You are correct in that if the standard Bittorrent protocol is followed that is not patch friendly and you are probably correct that Blizzard most likely used a customised version.

An implementation that could work here is that of a diff extension for torrent, make a client that reads the binary file, finds which pieces are identical and then examine the pieces that are not identical, if it find similarity inside diffirerent pieces break them to smaller pieces and add the piece needed for patching. Of course that should happen at the server end not at the client, but I think its not that hard to do. So I was thinking more a unification of the patching and torrent technology instead of 100% torrent.

In the end as I said the main issue is the Blender.exe which is the one that is trickiest to patch because its the one that is the largest and most frequent updated. So yeah it will need a customisation of torrent technology to make this super efficient and minimise the size of the patch. But that is the fun part :wink:

I am not suggesting you should do it, the reason why I am discussing it because I am considering this method for sharing 3d assets without the need of server , where anyone will be able to upload their own 3d assets and dowload them and have access to “infinite” potential size asset library and less so on providing software updates. I dropped the idea because its certainly something I will implement at some point because I am fascinated by the torrent technology and also the idea of not have to mess with web dev at all (eeewww :smiley: ) and I thrown it here just for inspiration.

1 Like

I have not read all the comments … so someone may have already thought about it …
I would agree to use an “upgrade” where blender does a scan of the files that have been changed and download only these …
this would be a good and simple compromise for all…

something like as real operating system does …

A quick search suggested there are multiple rsync clients and rsync wrapper apps for Windows, though it looks like the free ones require Cygwin and the native ones are all proprietary from what I’ve found so far, so maybe not as good solution as I was hoping.

Y’all seem to forget that the reason so many people are on buildbot builds right now, is because that is the only way to get 2.8. Once 2.8 stabilizes and gets an official build we’ll be back to our regular couple of times/year releases for most users for which the current infrastructure of manual downloading or getting automatic updates though steam/windows store will suffice just fine.

With windows being my platform, I have very little interest in ductaping some stuff together with rsync on windows or some hybrid torrent and having to maintain it till the end of time, just because a few you want to have an easy couple of months before the official release of 2.8.

On lzma ,when we did the last 2.79 release i suggested to offer 7z packages (116M zip, 75M 7z) since the packager we use already supports this, but it was decided it would be too confusing for some users.

1 Like

And you seem to forget that I never suggested torrent as the default way to download blender 2.8, this is something I research for my own reasons. Quite contrary I started this thread by agreeing with brecht that there are not that many potential users to make this worthwhile. Not even _nburn pushed for this to be implemented , quite the contrary he started this thread with asking whether this is possible.

Which is why I joined this discussion.

What you are discussing is whether we should do this officially when the OP is not even certain if this is a viable and worthwhile idea, something I also question about my customised bittorent solution for my own set of problems. Its likely I will do it, but not in the near future.

Bottom line we use this thread to brainstorm and discuss possible solutions to problems. I do not think we do any harm. Also why your platform of choice is relevant on the bittorrent protocol I fail to see. I also fail to see what you tare talking about when you say “easy couple of months before the official release of 2.8”. Why the release of 2.8 is relevant in this discussion ?

7zip can also do SFX packages , which are self extracting exe files, the increase in size compared to 7z files is negligible.

What you are discussing is whether we should do this officially

I am, given i’m the one who ends up with it.

when the OP is not even certain if this is a viable and worthwhile idea

And i’m giving a heads up, that i have yet to see a convincing reason to do this, hence the energy spend solving the possible technical issues of a problem currently not worth solving may be better spend elsewhere.

But in all fairness, i can’t make you do anything or stop you from doing anything, you’re free to do whatever you want, that is what makes blender great!

That being said i personally dislike the combative way you argue, and tend to stay out of threads you’re in, I made my standpoint clear , I have nothing further to add, the thread is all yours!

1 Like

@kilon Heh, the implementation you describe sounds somewhat similar to an idea I came up with back when I was experimenting with torrent patching. I think my idea was to include “binary samples” (say first 8 bytes of the chunk) along with the chunk hashes that would be checked before doing any hashes to ensure you were working with the same data. I even did a few tests based on this idea to see how doable it was, but I can’t remember how they turned out. When I get a chance, I’ll see if I can dig up my notes on that.

For the Blender.exe issue, if you haven’t checked out bsdiff and Courgette I would recommend looking into them. IIRC, they were both purpose built around patching x86/x64 executables. If you need a Windows version, there’s a compiled 64 bit copy of Courgette in this post. Using that build of Courgette I was able to shrink an update patch for a blender executable down to around 1 MB! :open_mouth:

@Zoot Yeah, that sounds like the situation I ran into. I tried to set up rsync (or maybe something similar) using Cygwin before when trying to back up files from an old computer, but I ended up giving up in frustration. Maybe the situation with Cygwin and rsync is better now, but the recent comments I saw on stackoverflow and similar sites weren’t very reassuring.

3 Likes

Looks like there are some Python rsync implementations out there (slow) and at least one with accelerated librsync that theoretically should be cross-platform, so there are some pieces out there if someone wanted to play with making an add-on.