For adaptive sampling you may not need the entire image, just the local neighborhood just like denoising. If the entire image is needed, it’s possible to make things work that way at the cost of extra memory usage or overhead going saving/loading things to/from a disk cache.
For work scheduling, my plan is to allow devices to render many small tiles at once for better work distribution. For CPU rendering it would also be possible to add code for multiple cores to work on the same tile, distributing the individual pixels between them. I don’t think dividing tiles into smaller tiles is the right strategy.