I’m not familiar with the code of the VSE but for me it sounds like a plan to implement a node system. I guess it should be a strategic decision to go that direction.
- Using meta strip (and transform effect stript) would be the obvious solution to transform logically connected / “hierarchical” image strips. But there is that concatenation problem.
- Using only image strips (no meta strip) can be kind of a solution. But is not user friendly at all. It is more work to achive the “same” thing with meta strips and it is very hard to modify things after the image strips are animated. We have to key every image strip instead of one meta strip. AND the transformation is not filtered (related topics is here).
- Using image strips and transform effect strips (no meta strip) basically the same issue as at 1. There is no transform concatenation.
I hope it helps.
Side note: unfortunatelly File / External Data / Pack Resources does not work with VSE.