This way, I can use immediate mode, which is way faster than using
buffers for some reason. Since I'm not using profiles anymore, I
dropped the minimum requirement to OpenGL 3.1. If a driver doesn't
support Legacy GL, then it can use the slow buffer code.
But seriously, I need to figure out why using buffers is so slow.
If this was a common problem, Modern OpenGL wouldn't have made it the
only option.
For some reason CPU usage is still double that of the SDLTexture
backend (SDL2 uses OpenGL 2.1, with glEnable/glDisable-style
immediate mode).
If I downgrade to OpenGL 2.1, and use VBO-less glDrawArrays, I get
great performance. I just wish I knew what the AMD driver is doing
that's so much faster.
Turns out performance is absolutely abysmal on my laptop's copy of
Windows 10 (AMD A9 APU).
This is only one of the weird bottleecks: glFramebufferTexture2D
is a CPU sinkhole, so don't call it often.
This fixes this weird thing you'd see in the enhanced branch if the
screen is scaled (happens if you resize the window).
When the screen is scaled, OpenGL uses linear interpolation. This
would cause it to fetch samples from outside the texture. IIRC, by
default, the default behaviour is GL_REPEAT, which causes it to blend
with samples from the other side of the texture, creating strange
pixels at the end of the screen.
Goddammit 3.2 is so complex.
The reason I want 3.2 is because I'm not convinced Legacy OpenGL will
be well-supported in the future. 3.2 is about as far back as I can go
without breaking forward-compatibility (it was the first version to
introduce the Core Profile).
I'd replace it with a custom matrix, but I get the feeling the
overhead of uploading a 4x4 matrix every quad is higher than just
manipulating the vertexes CPU-side
Saves us having to do a bunch of extra legwork (setting up the blank
default texture, setting the colour modifier to white, etc.), and
should improve performance.
While I was at it, I made the colour modifier a uniform, so it only
has to be set once per quad, rather than once per vertex.
This replaces the glBegin/glEnd stuff. Even though vertex arrays
aren't removed from newer OpenGL versions like glBegin/glEnd are,
they *are* deprecated, so I want to switch to VBOs eventually.