Binary patching

Hey, it sounds great, right? you have this huge app which costs people a lot of time to download and you would like to save them time downloading it.

So why not calculate some difference and only send that?

Well... first, where does patching work?

Patching text files (esp source) works best for small localized changes.

Here's an example:

Old:
const char message[]="Unable to load library because:%s";
New:
const char message[]="Unable to load library because: %s";

OK, so how expensive is the diff for this change? probably less than five lines.

When is it not worth it to make this change?

when the file is less than five lines.

When is it worth it?

probably when the file has >20 lines.

OK, so for small changes to source files it can be worth it.

What about changes to xml files?

Well, let's look at a sample xml fragment (which could represent a xul document if you read 'a' as 'hbox', 'b' as 'label' and 'c' as 'vbox')
oldnew
<a>
  <b/>
</a>
<c>
  <a>
    <b/>
  </a>
</c>

For your standard diff application, the patch is 10+ lines. and frequently a change like this affects the entire file (ignoring the license which probably shouldn't have been shipped in the first place). So you have a 50 line file and a 110 line diff. This would of course never be worth it.

OK, so perhaps standard diff is bad. what about an xml xpath based diff application. especially if whitespace doesn't matter. well, that's an interesting question. i can't answer it. i tried looking for such an app and couldn't find one.

What about binary files?

  • Changes to them come in two flavors, the simple change (fixing an off by one error, or adding/changing a letter in a string) which is similar to the .c patch in the first case. and bigger changes which are more like the xml case. unlike the xml case, you can't cheat on whitespace, everything is significant.

    So how do changes happen in mozilla binary files? well frequently we change an API in xpcom or some other core library and all users of it have to change. Now it's true that a change could be as simple as changing the spelling of a function, but generally the API change affects the entire calling convention.

    instead of:

    - void ProcessPendingReqests();
    + void ProcessPendingRequests();
    

    and a few matching changes to the callers. we have things like:

    nsresult nsIContent::GetDocument(nsIDocument **aDoc);
    nsIDocument * nsIContent::GetDocument();
    

    Such a change will affect hundreds of files and significantly change both the source code and the generated binary code.

    For some types of changes you can recycle patches (e.g. a change which says replace all instances of ' teh ' with ' the ', or if you happen to shift the location of a global function by a certain amount) but for the binary changes like the ones that change calling conventions the adjacent code is likely to be very different and you won't be able to use a simple drop in replacement system.

    OK, so that's nice.

    what about pictures?

    they're simple, right?

    well yes:

    So when does it make sense to make binary patches?

    When you have very few releases, a very big product, very few changes, and a very large user-base of people who don't make other changes to your product.

    Hrm, so where does mozilla fit?

    Well.