I'm experimenting with the layered window support and finding that it doesn't seem to "keep up", aka, 30fps, with full screen (1680x1050) flips/updates on midrange graphics hardware. I'm hoping the original designer of this code (or someone who has hacked it) is trolling this board.

I have a guess as to why the implementation is sluggish, but it's just a theory I'm trying to get to the bottom of.

I noticed with is_layered() == true a "shadow_window" is created for off-screen rendering and at update time the bits are sucked out of that into a DIB and then shown with the win32 UpdateLayeredWindow call.

I did some profiling and I believe the "suck out of the shadow window into the DIB" is pretty expensive, probably due to the "one-way" optimization of graphics hardware, i.e., blitting from system RAM into graphics RAM is fast, but the other way ain't so fast. (Just a theory!)

My question is (and I'll probably experiment with this): Why not render directly into a DIB, and then blit that? My hunch is much higher frame rates will be achievable. I noticed a commented out PFD_DRAW_TO_BITMAP in opengl_creation_helper.cpp so I wondered if someone has experimented with this technique.

Any comments, tips, or wisdom from prior attempts are *more* than welcome!