Making A Humble OpenGL Rotating Cube

Some years ago I was running Ubuntu 20.04 in my desktop computer and I (suddenly) wanted to talk to the GPU driver. Not through Vulkan or some fancy abstraction, just to do a "hello world" in the most direct way possible. I didn't document the findings back then, so I will do it now, running Windows 10 in my laptop.

Before pasting textures or making specular/ambient/diffuse lights glow, I had to set the environment, so I focused in rendering a humble but colorful triangle first.

The Legacy Mode

First, we need to call GLFW to create a window and initialize an OpenGL context. That will be the interface through which the GPU will receive drawing commands. Then, GLEW is initialized to make sure the program can use all the modern OpenGL functions exposed by the driver.

Once that’s done, we should configure a projection matrix with gluPerspective() (to simulate a 3D view). This enables depth testing (so objects closer to the camera hide those behind them), and sets a neutral gray background color with glClearColor().

Inside the main loop, the screen will be cleared every frame, and three vertices will be sent to the GPU using the old immediate mode commands (glBegin, glVertex, glEnd). Each vertex is assigned a color, and the GPU beautifully interpolates them across the triangle. Finally, buffers are swapped (glfwSwapBuffers) to show the rendered image, and events are polled so the window stays responsive.

When the user closes the window, everything will be cleaned up with glfwTerminate().

#include <stdio.h>
#include <GL/glew.h>
#include <GLFW/glfw3.h>

int main(void) {
    if (!glfwInit()) {
        fprintf(stderr, "Failed to initialize GLFW\n");
        return -1;
    }

    GLFWwindow* window = glfwCreateWindow(800, 600, "Colorful Triangle", NULL, NULL);
    if (!window) {
        fprintf(stderr, "Failed to create window\n");
        glfwTerminate();
        return -1;
    }

    glfwMakeContextCurrent(window);
    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    gluPerspective(45.0, 800.0/600.0, 0.1, 100.0);
    glMatrixMode(GL_MODELVIEW);

    if (glewInit() != GLEW_OK) {
        fprintf(stderr, "Failed to initialize GLEW\n");
        return -1;
    }

    glEnable(GL_DEPTH_TEST);
    glClearColor(0.2f, 0.2f, 0.2f, 1.0f);
    glLoadIdentity();
    glTranslatef(0.0f, 0.0f, -5.0f);

    while (!glfwWindowShouldClose(window)) {
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
        glBegin(GL_TRIANGLES);
            glColor3f(1.0f, 0.0f, 0.0f);
            glVertex3f(-0.5f, -0.5f, 0.0f);
            glColor3f(0.0f, 1.0f, 0.0f);
            glVertex3f( 0.5f, -0.5f, 0.0f);
            glColor3f(0.0f, 0.0f, 1.0f);
            glVertex3f( 0.0f,  0.5f, 0.0f);
        glEnd();
        glfwSwapBuffers(window);
        glfwPollEvents();
    }

    glfwTerminate();
    return 0;
}

Someone could say: "but we don't have multiple objects! gluPerspective() could be omitted safely". The line gluPerspective(45.0, 800.0/600.0, 0.1, 100.0); sets up the aforementioned projection matrix, which defines how 3D coordinates are mapped to the 2D screen. The first gluPerspective parameter is the field of view (FOV) in degrees, the second is the aspect ratio of the window, and the last two define the near and far clipping plane.

Without this projection, OpenGL stays in its default orthographic mode (a flat, untransformed coordinate space where everything between -1 and +1 on each axis is directly mapped to screen space). Since our triangle is translated back with glTranslatef(0.0f, 0.0f, -5.0f) it would end up outside the visible range, and nothing appears on screen. gluPerspective() switches to a perspective projection, where objects farther away appear smaller, and our triangle at z = -5 finally becomes visible.

OK then we could build this using gccfrom a terminal. We don’t copy any of the .h header files into the project folder itself. Instead, we declare them in the source code with normal #include statements, and told the compiler where to find them using -I flags (include paths). That keeps the project clean and portable.

gcc triangle.c -o triangle.exe ^
 -I"C:\glew-2.1.0\include" ^ 
 -I"C:\glfw-3.4\include" ^ 
 -I"C:\glm" ^ 
 -L"C:\glew-2.1.0\lib\Release\x64" ^ 
 -L"C:\glfw-3.4\lib-mingw-w64" ^ 
 -lglew32 -lglfw3 -lglu32 -lopengl32 -lgdi32 -luser32 -lkernel32

This way, the compiler knows where to look for GL/glew.h, GLFW/glfw3.h, and glm/glm.hpp, without polluting the working directory. Running the built .exe requires having glew32.dll in the same folder. I've uploaded everything (source, glew32.dll and triangle.exe) to a GitHUB repository.

The triangle above uses the old fixed pipeline, which, of course, is now deprecated. Modern OpenGL (3.3 and newer) drops all the built-in transformations and immediate-mode calls. Instead, we send vertex data and transformation matrices directly to the GPU through buffer objects and GLSL shaders. The program structure typically expands to three or four small files — one for the main logic and one for each shader (vertex.glsl, fragment.glsl).

Legacy APIWhat it did implicitlyModern Replacement
glBegin, glVertex3f, glColor3fSent each vertex and color to the driver immediatelyStore all vertices in a VBO (Vertex Buffer Object) and draw them with glDrawArrays or glDrawElements
glMatrixMode, glTranslatef, gluPerspectiveManaged model, view, and projection matrices internallyYou create these matrices manually (with code or GLM) and upload them to the GPU as uniforms
Built-in lighting and colorsFixed pipeline computed lightingThe fragment shader computes colors and shading
No explicit shader codeGPU ran a default internal programYou now write GLSL shaders explicitly

The Modern Triangle

To make the first “modern” example as small as possible, I rewrote the triangle using:

  • GLFW to create the window and OpenGL context
    • GLEW to load the modern OpenGL function pointers
      • GLSL shaders (vertex + fragment)
        • A single array of vertices stored in GPU memory

          This version stops using deprecated calls. It still draws the same colored triangle (maybe a bit bigger), but now the vertices go to a buffer, the shader receives them, and the GPU paints the pixels.

          Files from this "modern" triangle can be downloaded from the GitHub repository.

          From Triangle to Cube

          Once that worked, I moved from a 2D triangle to a rotating 3D cube. This required adding a few more things:

          • A depth buffer (glEnable(GL_DEPTH_TEST)) so closer faces hide the far ones
            • Three matrices (model, view, projection) combined into a single MVP uniform
              • A vertex index buffer (EBO) so vertices can be reused for multiple faces
                • A simple math layer in C to replace GLM (mat_perspective, mat_rotate, etc.)
                  • Continuous rotation over time, using glfwGetTime()

                    At this point, we’re writing modern OpenGL in pure C (this is: no C++, no GLU, no legacy functions) but still drawing efficiently (we are looking for the humble rotating cube after all) and talking directly to the GPU.

                    The main.c looks like this:

                    #include <stdio.h>
                    #include <stdlib.h>
                    #include <math.h>
                    #include <GL/glew.h>
                    #include <GLFW/glfw3.h>
                    
                    #define DEG2RAD(x) ((x) * 0.017453292519943295769f)
                    
                    // Simple 4x4 matrix utilities (column-major)
                    static void mat_identity(float *m) {
                        for (int i = 0; i < 16; i++) m[i] = 0.0f;
                        m[0] = m[5] = m[10] = m[15] = 1.0f;
                    }
                    
                    static void mat_multiply(float *r, const float *a, const float *b) {
                        float tmp[16];
                        for (int i = 0; i < 4; i++)
                            for (int j = 0; j < 4; j++)
                                tmp[i*4+j] =
                                    a[i*4+0]*b[0*4+j] +
                                    a[i*4+1]*b[1*4+j] +
                                    a[i*4+2]*b[2*4+j] +
                                    a[i*4+3]*b[3*4+j];
                        for (int i = 0; i < 16; i++) r[i] = tmp[i];
                    }
                    
                    static void mat_perspective(float *m, float fov, float aspect, float znear, float zfar) {
                        float f = 1.0f / tanf(DEG2RAD(fov) / 2.0f);
                        mat_identity(m);
                        m[0] = f / aspect;
                        m[5] = f;
                        m[10] = (zfar + znear) / (znear - zfar);
                        m[11] = -1.0f;
                        m[14] = (2.0f * zfar * znear) / (znear - zfar);
                        m[15] = 0.0f;
                    }
                    
                    static void mat_translate(float *m, float x, float y, float z) {
                        mat_identity(m);
                        m[12] = x;
                        m[13] = y;
                        m[14] = z;
                    }
                    
                    static void mat_rotate(float *m, float angle, float x, float y, float z) {
                        float c = cosf(angle), s = sinf(angle);
                        float len = sqrtf(x*x + y*y + z*z);
                        if (len == 0.0f) { mat_identity(m); return; }
                        x /= len; y /= len; z /= len;
                        mat_identity(m);
                        m[0] = x*x*(1-c)+c;   m[4] = x*y*(1-c)-z*s; m[8]  = x*z*(1-c)+y*s;
                        m[1] = y*x*(1-c)+z*s; m[5] = y*y*(1-c)+c;   m[9]  = y*z*(1-c)-x*s;
                        m[2] = x*z*(1-c)-y*s; m[6] = y*z*(1-c)+x*s; m[10] = z*z*(1-c)+c;
                    }
                    
                    static GLuint compile_shader(GLenum type, const char *src) {
                        GLuint s = glCreateShader(type);
                        glShaderSource(s, 1, &src, NULL);
                        glCompileShader(s);
                        GLint ok;
                        glGetShaderiv(s, GL_COMPILE_STATUS, &ok);
                        if (!ok) {
                            char log[512];
                            glGetShaderInfoLog(s, 512, NULL, log);
                            fprintf(stderr, "Shader error: %s\n", log);
                        }
                        return s;
                    }
                    
                    int main(void) {
                        if (!glfwInit()) {
                            fprintf(stderr, "Failed to init GLFW\n");
                            return -1;
                        }
                    
                        glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
                        glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
                        glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
                    
                        GLFWwindow *win = glfwCreateWindow(800, 600, "Rotating Cube (C)", NULL, NULL);
                        if (!win) { fprintf(stderr, "Window creation failed\n"); glfwTerminate(); return -1; }
                        glfwMakeContextCurrent(win);
                    
                        glewExperimental = GL_TRUE;
                        if (glewInit() != GLEW_OK) {
                            fprintf(stderr, "GLEW init failed\n");
                            glfwTerminate();
                            return -1;
                        }
                    
                        const char *vshader =
                            "#version 330 core\n"
                            "layout(location=0) in vec3 aPos;\n"
                            "layout(location=1) in vec3 aColor;\n"
                            "out vec3 vColor;\n"
                            "uniform mat4 MVP;\n"
                            "void main(){\n"
                            "  gl_Position = MVP * vec4(aPos,1.0);\n"
                            "  vColor = aColor;\n"
                            "}\n";
                    
                        const char *fshader =
                            "#version 330 core\n"
                            "in vec3 vColor;\n"
                            "out vec4 FragColor;\n"
                            "void main(){ FragColor = vec4(vColor,1.0); }\n";
                    
                        GLuint vs = compile_shader(GL_VERTEX_SHADER, vshader);
                        GLuint fs = compile_shader(GL_FRAGMENT_SHADER, fshader);
                        GLuint prog = glCreateProgram();
                        glAttachShader(prog, vs);
                        glAttachShader(prog, fs);
                        glLinkProgram(prog);
                        glDeleteShader(vs);
                        glDeleteShader(fs);
                    
                        float vertices[] = {
                            // positions          // colors
                            -1,-1,-1,  1,0,0,  1,-1,-1,  0,1,0,  1, 1,-1,  0,0,1,  -1, 1,-1,  1,1,0,
                            -1,-1, 1,  1,0,1,   1,-1, 1,  0,1,1,   1, 1, 1,  1,1,1,  -1, 1, 1,  0,0,0
                        };
                        unsigned int indices[] = {
                            0,1,2, 2,3,0,  1,5,6, 6,2,1,
                            5,4,7, 7,6,5,  4,0,3, 3,7,4,
                            3,2,6, 6,7,3,  4,5,1, 1,0,4
                        };
                    
                        GLuint VAO,VBO,EBO;
                        glGenVertexArrays(1,&VAO);
                        glGenBuffers(1,&VBO);
                        glGenBuffers(1,&EBO);
                    
                        glBindVertexArray(VAO);
                        glBindBuffer(GL_ARRAY_BUFFER,VBO);
                        glBufferData(GL_ARRAY_BUFFER,sizeof(vertices),vertices,GL_STATIC_DRAW);
                        glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,EBO);
                        glBufferData(GL_ELEMENT_ARRAY_BUFFER,sizeof(indices),indices,GL_STATIC_DRAW);
                    
                        glVertexAttribPointer(0,3,GL_FLOAT,GL_FALSE,6*sizeof(float),(void*)0);
                        glEnableVertexAttribArray(0);
                        glVertexAttribPointer(1,3,GL_FLOAT,GL_FALSE,6*sizeof(float),(void*)(3*sizeof(float)));
                        glEnableVertexAttribArray(1);
                    
                        glEnable(GL_DEPTH_TEST);
                    
                        GLint mvpLoc = glGetUniformLocation(prog, "MVP");
                    
                        while (!glfwWindowShouldClose(win)) {
                            glClearColor(0.1f,0.1f,0.1f,1.0f);
                            glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT);
                    
                            float t = (float)glfwGetTime();
                            float model[16], rot[16], view[16], proj[16], mv[16], mvp[16];
                            mat_rotate(rot, t, 1.0f, 1.0f, 0.5f);
                            mat_translate(view, 0.0f, 0.0f, -6.0f);
                            mat_perspective(proj, 45.0f, 800.0f/600.0f, 0.1f, 100.0f);
                    
                    		mat_multiply(mv, rot, view);    // model * view (reversed)
                    		mat_multiply(mvp, mv, proj);    // (model * view) * projection (reversed)
                    
                            glUseProgram(prog);
                            glUniformMatrix4fv(mvpLoc, 1, GL_FALSE, mvp);
                            glBindVertexArray(VAO);
                            glDrawElements(GL_TRIANGLES, 36, GL_UNSIGNED_INT, 0);
                    
                            glfwSwapBuffers(win);
                            glfwPollEvents();
                        }
                    
                        glfwTerminate();
                        return 0;
                    }
                    

                    I compile the thing using gcc: gcc main.c -o cube.exe -I"C:\glew-2.1.0\include" -I"C:\glfw-3.4\include" -I"C:\glm" -L"C:\glew-2.1.0\lib\Release\x64" -L"C:\glfw-3.4\lib-mingw-w64" -lglew32 -lglfw3 -lopengl32 -lgdi32 -luser32 -lkernel32

                    And then our colorful rotating cube is built:

                    A Brief Lesson on How Rendering Evolved

                    From Fixed to Programmable Pipelines

                    Back in the "immediate mode days" (using techniques like in our first humble 'legacy triangle' render), every call like

                    glVertex3f(...);

                    glColor3f(...);

                    had to cross the CPU → GPU boundary individually. Each vertex, color, and transformation was sent across the system bus (AGP or PCI Express) one call at a time.

                    That meant:

                    • Thousands of context switches per frame (driver overhead)
                      • Redundant data transfers for geometry that never changed (every polygon of the the same cardboard tree was transferred each frame)
                        • The CPU spent more time preparing draw calls than doing actual work

                          The old fixed pipeline was convenient because it hid most of the complexity. OpenGL handled transformations, lighting, and shading automatically. Developers could just call glVertex3f() or glColor3f() and see results on screen without worrying about how the GPU worked internally. It spared you from thinking about shaders or matrices, but at the cost of doing everything the hard way under the hood. Every draw call forced the driver to rebuild state and resend geometry, making the whole process simple to write but terribly inefficient to run.  It looked something like this:

                          CPU --> Driver --> GPU (draw one triangle, forget it)

                          From Fixed Function to Programmable Shaders

                          Modern GPUs shifted the model completely.

                          Instead of pushing one vertex at a time, the CPU now uploads entire buffers of vertex data to VRAM once through the PCIe bus. The GPU keeps that geometry resident, and the CPU simply tells it what to draw and with which program (shader).

                          So now the flow is:

                          CPU --> GPU (upload once)

                          GPU --> GPU (transform + shade + rasterize per frame)

                          The CPU does orchestration; the GPU does the math. This is the core reason the Fixed Function Pipeline (FFP) became inefficient. Every glBegin/glEnd block was essentially a separate draw command, wasting bandwidth and driver (CPU) time.

                          In DirectX terms, this is roughly the leap from DirectX 9’s mostly fixed pipeline to DirectX 10’s fully programmable shader model: the moment when GPUs stopped being hardwired “triangle drawers” and became general-purpose math engines.

                          Before shaders were fully programmable, GPUs like the GeForce 6800 or Radeon X850 XT already had separate vertex and pixel (fragment) processing units. The X850 XT, for instance, featured 16 pixel shaders, 6 vertex shaders, 16 texture mapping units (TMUs), and 16 render output units (ROPs), all specialized hardware blocks.

                          Shader TypeRuns OnPurposeWorkload
                          Vertex ShaderGPU vertex unitsTransforms 3D coordinates into screen space (applying model, view, projection matrices)Per vertex (usually thousands per frame)
                          Fragment ShaderGPU pixel unitsCalculates the final color of each pixel (lighting, texture, reflection)Per pixel (millions per frame)

                          When programmable shaders arrived (DirectX 9+ / OpenGL 2.0), those hardware blocks evolved into general-purpose arithmetic units, capable of running custom shader code written in GLSL or HLSL. That’s when GPUs stopped being rigid “raster machines” and became parallel math processors, capable of everything from game graphics to physics simulation and AI inference.

                          The Hardware Perspective

                          Every GPU frame involves two types of bandwidth:

                          1. PCIe/AGP bandwidth: how fast data moves between CPU and GPU
                            1. VRAM bandwidth: how fast the GPU can fetch its own geometry, textures, and shader data

                              The old pipeline kept the PCI/AGP/PCIe bus very busy: every vertex crossed the bus again and again. The modern model minimizes that by keeping all vertex data on the GPU side and only changing what’s necessary (uniforms, transformation matrices, etc.). The modern pipeline prioritizes high VRAM bandwidth, which is far faster than the PCI/AGP/PCIe connection to system RAM.

                              That’s why VRAM size matters more than ever: it stores not just textures but entire scene meshes, shader constants, and framebuffers. And it’s why PCIe bandwidth still matters, but less for rendering and more for streaming dynamic content (textures, geometry streaming, or compute buffers).

                              In the 90s and early 2000s, the CPU often computed transformations, lighting, and even texture effects before sending the results to the GPU. That limited frame rates and made graphics scale linearly with CPU clock speed.

                              With shaders, almost everything runs on GPU cores (in parallel) freeing the CPU to handle game logic, physics, and I/O. The GPU became a general-purpose parallel processor specialized in vector math and floating-point throughput.

                              That’s the practical reason we buy newer GPUs: not just for more VRAM or MHz, but for more shader ALUs, register space, and bandwidth to feed those parallel units.

                              TL;DR: modern OpenGL (and DirectX, Vulkan, etc) lets you use your GPU for what it was actually designed to do, instead of making your CPU fake it.

                              Loading Textures: From Colored Cube to Textured Surface

                              After building the colorful rotating cube, the next step was obvious: make one of its faces carry an actual image. I decided to go full DIY and draw a pattern of bytes read from disk and map it onto geometry. Later I'm explaining on this respect, see the "Notes on Texture Management" section.

                              Until now, the GPU was shading each pixel based purely on vertex colors. Those colors existed only in GPU registers: values we sent directly through glColor3f (in the fixed pipeline) or vertex attributes (in the shader pipeline). The rasterizer simply interpolated them across triangles to fill the surfaces.

                              Here I’m using my child-age interpretation of the OpenGL Lego World: a dinosaur playing with Legos.

                              With texturing, we introduce a second data stream: pixels (or texels) stored in GPU memory. Instead of “coloring” geometry, we now sample color from an image, using 2D coordinates (U,V) that correspond to each vertex.

                              That means every vertex now carries:

                              • Its position (x, y, z)
                                • Its color (r, g, b)
                                  • Its texture coordinates (u, v)
                                    • Optionally, a flag or index telling the shader which texture to use

                                      This turns the vertex into a kind of “data packet”: geometry, color, and image mapping info, all flowing through the same programmable pipeline.

                                      Adding a texture doesn’t just make things prettier, it changes what the GPU does:

                                      Without TexturesWith Textures
                                      The fragment shader only interpolates color values.Each fragment now performs a memory lookup into a texture stored in VRAM.
                                      All color data comes from vertex attributes.Color data comes from both vertex attributes and image samples.
                                      The memory footprint is negligible.Texture memory becomes a large consumer of VRAM and bandwidth.
                                      The CPU only sets a few uniform variables.The CPU must now manage texture objects, filtering modes, and mipmaps.

                                      I went from painting triangles (like a wireframe) to feeding data into a texture pipeline, and that’s the foundation of everything from sprites to physically based rendering.

                                      Why Start with One Textured Face?

                                      Mapping a single image onto one face of the cube makes you understand:

                                      • How texture coordinates (U,V) relate to geometry.
                                        • What happens when vertices are shared between textured and untextured polygons.
                                          • Why identical geometry sometimes needs duplicated vertices with different attributes.

                                            By isolating one textured face and leaving the others colorful, you can literally see how the texture behaves (if it interpolates, bleeds, and responds to transformations), all without complexity of lighting or multiple materials.

                                            Summarizing:

                                            StageConcept IntroducedWhat Changes in Code
                                            🎨 Colorful TriangleVertex color interpolationSimple per-vertex RGB
                                            🧊 Rotating CubeModel-view-projection transformsAdded matrices & depth test
                                            🧩 Textured FaceTexture sampling and UV mappingAdded image loading, texture unit, and fragment shader sampling

                                            The files for the cube (sources, dlls and binaries) are here in my GitHub Repository.

                                            Notes on Texture Management

                                            I used a 'raw file' approach to load a texture:

                                            GLuint LoadBMP(const char *imagepath) {
                                                // Data read from the header of the BMP file
                                                unsigned char header[54];
                                                unsigned int dataPos;
                                                unsigned int imageSize;
                                                unsigned int width, height;
                                                // Actual RGB data
                                                unsigned char *data;
                                            
                                                // Open the file
                                                FILE *file = fopen(imagepath,"rb");
                                                if (!file) {
                                                    printf("Image could not be opened\n");
                                                    return 0;
                                                }
                                            
                                                // Read the header, i.e. the 54 first bytes
                                            
                                                // If less than 54 bytes are read, problem
                                                if (fread(header, 1, 54, file) != 54) {
                                                    printf("Not a correct BMP file\n");
                                                    return 0;
                                                }
                                            
                                                // A BMP files always begins with "BM"
                                                if (header[0] != 'B' || header[1] != 'M') {
                                                    printf("Not a correct BMP file\n");
                                                    return 0;
                                                }
                                            
                                                // Make sure this is a 24bpp file
                                                if (*(int*)&(header[0x1E]) != 0) {
                                                    printf("Not a correct BMP file\n");
                                                    return 0;
                                                }
                                                if (*(int*)&(header[0x1C]) != 24) {
                                                    printf("Not a correct BMP file\n");
                                                    return 0;
                                                }
                                            
                                                // Read the information about the image
                                                dataPos = *(int*)&(header[0x0A]);
                                                imageSize = *(int*)&(header[0x22]);
                                                width = *(int*)&(header[0x12]);
                                                height = *(int*)&(header[0x16]);
                                            
                                                // Some BMP files are misformatted, guess missing information
                                                if (imageSize == 0) imageSize = width * height * 3; // 3 : one byte for each Red, Green and Blue component
                                                if (dataPos == 0) dataPos = 54; // The BMP header is done that way
                                            
                                                // Create a buffer
                                                data = malloc(imageSize);
                                            
                                                // Read the actual data from the file into the buffer
                                                fread(data, 1, imageSize, file);
                                            
                                                // Everything is in memory now, the file can be closed
                                                fclose(file);
                                            
                                                // Create one OpenGL texture
                                                GLuint textureID;
                                                glGenTextures(1, &textureID);
                                            
                                                // "Bind" the newly created texture: all future texture functions will modify this texture
                                                glBindTexture(GL_TEXTURE_2D, textureID);
                                            
                                                // Give the image to OpenGL
                                                glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, width, height, 0, GL_BGR, GL_UNSIGNED_BYTE, data);
                                            
                                                // OpenGL has now copied the data. Free our own version
                                                free(data);
                                            
                                                // Poor filtering, or ...
                                                //glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
                                                //glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); 
                                            
                                                // ... nice trilinear filtering ...
                                                glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
                                                glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
                                                glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
                                                glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
                                                glGenerateMipmap(GL_TEXTURE_2D);
                                            
                                                // Return the ID of the texture we just created
                                                return textureID;
                                            }
                                            

                                            LoadBMP() is a sort of "bare-metal bitmap loader and OpenGL texture uploader".

                                            • Opens a .bmp file directly in binary mode.
                                              • Parses its 54-byte header to extract width, height, bit depth, and data offset.
                                                • Reads the raw RGB data into memory.
                                                  • Creates a texture object with glGenTextures and glBindTexture.
                                                    • Uploads that data to the GPU using glTexImage2D.
                                                      • Configures texture filtering and mipmaps.
                                                        • Frees the local buffer and returns the texture ID.

                                                          Essentially, it’s the manual path from disk to GPU texture memory. I wanted not to just load a texture, but to witness what happens when raw bytes become color on a polygon. Moreover:

                                                          • It avoids external dependencies, so it’s ideal for minimal examples and environments with only core OpenGL + libc.
                                                            • It makes the data layout explicit: you literally read the header and compute offsets.
                                                              • It reinforces how OpenGL doesn’t care where the data came from — it just needs a correctly formatted memory block and a glTexImage2D call.

                                                                You’d rarely ship this in a production engine, but it's interesting to know what happens behind stbi_load().

                                                                Use it when:

                                                                • You’re learning the texture pipeline and want to deliberately avoid abstraction.
                                                                  • You need a minimal self-contained OpenGL sample (no dependencies).
                                                                    • You control your assets (simple 24-bit BMPs) and portability isn’t a concern.

                                                                      Avoid it when:

                                                                      • You need to support PNG, JPEG, DDS, or alpha channels (BMP is limited).
                                                                        • You need robustness (no endian handling, no compression support, no safety checks).
                                                                          • You’re targeting modern OpenGL or a real application. In that case use stb_image.h, SOIL2, SDL_image, etc.

                                                                            What’s Interesting About Raw Texture Loading

                                                                            • Header parsing: The offsets (0x0A, 0x12, 0x16, etc.) are literal BMP header byte positions.
                                                                              • Color order:GL_BGR is used because BMP stores pixels as BGR, not RGB.
                                                                                • Memory ownership: The data is freed immediately after glTexImage2D, because OpenGL copies it into GPU memory.
                                                                                  • Filtering & mipmaps: The function sets up linear filtering and auto-generates mipmaps.

                                                                                    Understanding UV Bleeding

                                                                                    What you’re seeing isn’t a bug. Here, OpenGL politely does texture coordinate interpolation across triangles that don’t have proper texcoords.

                                                                                    When I told the GPU that some cube faces aren’t textured (the ones with aIsTextured = 0), they still share vertices with the textured face, so their corner vertices still contain valid texture coordinates (0,0 … 1,1). OpenGL dutifully interpolates them, even if the shader later discards them, and the fragment shader ends up sampling part of the dinosaur 🦖.

                                                                                    Each cube face shares corner vertices with the adjacent ones. When one face uses UVs (texture coordinates) and another doesn’t, the shared vertex ends up having UVs anyway, the GPU can’t guess that I didn’t want them used.

                                                                                    I defined 24 distinct vertices (4 per face) so that each face has independent texture coordinates and attributes. This removes “UV bleeding” between adjacent faces, and it’s also the correct base if you ever want to put different textures (or lighting) per face.

                                                                                    Closing Thoughts

                                                                                    Rebuilding the lost "rotating cube” with a bit of help from LLMs (GPT-5, Claude 4.5 and Gemini 2.5) and the collective wisdom buried in old blogs, forums, and Stack Overflow threads, I could retrace the full development path: from the fixed-function triangle to a textured cube running entirely on shaders.

                                                                                    This is about understanding what actually happens between the code, the driver, and the GPU. That same pipeline, once made of a few registers and dedicated units, has evolved into a massively parallel machine that can render a cube with a dinosaur, simulate a galaxy, or train a neural network.

                                                                                    Revisiting a “humble” example like this reminds us how far graphics programming has come, and how elegant the fundamentals still are when stripped back to their bare essentials.

                                                                                    No Pages Found