Renderer 2.x - Basic 3D algorithms

 

 

    (November 2004)

One of my hobbies that has persisted over the years is my real-time pure-software 3D renderer. I began writing it in the days of Hercules and CGA cards... so it has come a long way :-)

I always try to find ways to make it (a) clearer in terms of code, and (b) faster in terms of execution.

The former is mostly accomplished via C++ templates, that can completely unify the rendering logic for most rasterizer modes. And as for speed, we are now firmly in the age of multi-core CPUs - so real-time software rendering can (finally) do per-pixel lighting and soft-shadows.

Rendered statue
Looks like ray-tracing, but runs
at 60fps on a 4-core Phenom!

Features

This is a (more or less) clean implementation of the basic algorithms in polygon-based 3D graphics. The code includes...

  • 3D transformations (from object coordinates, to world coordinates, to screen coordinates)
  • Point rendering (vertex-based or triangle-based)
  • Gouraud shading (complete Phong equation calculated per vertex)
  • Phong shading (complete Phong equation calculated per pixel)
  • Z-Buffer
  • Shadow mapping (with soft shadows)
  • Portable display and keyboard handling through libSDL

The supported 3D formats are:

  • .3ds, i.e. the well known 3D Studio format (via lib3ds), and...
  • .tri, a simple binary dump of vertex and triangle data.
  • .ply, only objects saved from shadevis are supported.

Implementation wise, the code...

  • Is orchestrated via autoconf/automake, so it will compile and run cleanly on most platforms (tested so far on Linux/x86, Mac OS/X, Windows (using MinGW gcc), OpenSolaris (GCC/TBB), OpenBSD/amd64 and FreeBSD/amd64)
  • Includes a separate VisualC directory for Windows/MSVC users, with all dependencies pre-packaged for easy compilation
  • Can be configured to use OpenMP, provided that your compiler's support for OpenMP is mature enough (e.g. GCC since version 4.3.2)
  • Can be configured to use Intel Threading Building Blocks, thus taking advantage of multi-core CPUs and executing faster
  • Uses C++ template-based metaprogramming, in order to move as much rendering logic as possible from run-time to compile-time.
Speed-wise...

This is a software-only renderer, so don't expect hardware class (OpenGL) speeds. Then again, speed is a relative thing: the train object (available inside the source package, in the "3D-Objects" folder) was rendered at a meager 6fps on my Athlon XP, back in 2003. Around 2005, however, my Pentium4 desktop at work took this up to 11 fps. As of 2007, by way of Intel's Threading Building Blocks (or OpenMP) the code uses both cores of my Core2Duo to run at 23fps... And since it uses TBB/OpenMP, it will automatically make use of any additional cores... so give the CPUs a few more years... :-)

Update, November 2009: On a 4-core AMD Phenom at 3.2GHz, the train now spins at 80 frames per second... Guess I proved my point :-)

The code also runs 20-25% faster if compiled under 64-bit environments.

Screenshots

Shadow mapping can render self-shadowing 3D objects in real-time (61KB image). And it can work with multiple light sources (the same 3D object, this time lit by two lights) (63KB image). Self-shadowing is easily identifiable, especially if complex models are used and you zoom in.
Chessboard
This scene displays at around 50fps... I must hook this up with gnuchess! :-)
 
The program allows changing the rendering mode at runtime, thus allowing interactive control of the balance between rendering speed and rendering quality:
 
Points Ambient Phong Shadows
Points Ambient occlusion Per-pixel Phong Shadow maps

In case you were wondering, here is what a shadow map looks like (67KB image).

Shadow maps are "special pictures" rendered along the normal rendering pipeline, but from the point of view of the light source. They provide the "light-height" information that tells the rasterizer when the pixel drawn is in shadow and when not. Normally, these shadows are sharp, and don't look good; so instead of looking at one "shadow pixel" only, we also look at its neighbours, and we thus get nice looking soft-shadows...

Castle
Self-shadowing at its finest...

Download, compile and run

Pharaoh
  
Lit with two light sources
  
The source code is available under the GPL (current version: 2.1n, November 15, 2009). Windows binaries are also available (compiled with TDM/MinGW and Pthread-W32).
Update: I used the renderer to create simple models of spiral galaxies for my nephews :-) The galaxies are rendered as point clouds, here are the Windows binaries).

For Windows/MSVC users:

Just open the project solution (under VisualC/) and compile for Release mode. It is configured by default to use Intel TBB for multithreading, since Microsoft decided to omit OpenMP support from the free version of its compiler (the Visual C++ Express Edition). All dependencies (include files and libraries for SDL and TBB) are pre-packaged under VisualC/, so compilation is as easy as it can get.

When the binary is built, right-click on "Renderer-2.x" in the Solution explorer, and select "Properties". Click on "Configuration Properties/Debugging", and enter ..\..\3D-Objects\statue.ply inside the "Command Arguments" text box. Click on OK, hit Ctrl-F5, and you should be seeing the statue spinning. Use the controls described below to fly around the object.

The default compilation options are set for maximum optimization, using SSE2 instructions.

If you have the commercial version of the compiler (which supports OpenMP) you can switch from TBB to OpenMP:

  • Configuration Properties - C/C++ - Language - OpenMP: Set To "Yes"
  • Configuration Properties - Preprocessor - Definitions: Change USE_TBB to USE_OPENMP
...and recompile.

For everybody else (Linux, BSDs, Mac OS/X, etc)

Compilation follows the well known procedure...

  bash$ ./configure
  bash$ make
The source package includes a copy of the sources for lib3ds 1.3.0, and the build process will automatically build lib3ds first.

CPU-specific optimizations can also be used if CXXFLAGS is passed to configure, like this:

  bash$ CXXFLAGS="-O3 -mfpmath=sse -march=core2 -msse \
      -msse2 -msse3 -mrecip" ./configure
  bash$ make
On my Core2Duo these options increase rendering speed by 24%. Compiling under 64-bit environments (e.g. AMD64 or Intel EM64T) is further improving speed; compiled with the same options, the code runs 25% faster under my 64-bit Debian.
A note for Mac OS/X developers: The default Mac OS/X developing environment (XCode) includes an old version of GCC (4.2.x). This version is known to have issues with OpenMP, so if you do use it, your only available option with multicore machines is Intel TBB (which works fine). You can, however, download the latest GCC from High Performance Computing for Mac OS/X, which offers the GCC 4.4.x series. Results are much better this way: the 4.4.x series support the SSE-based -mrecip option, which boosts the speed by more than 30%, and they also include mature OpenMP support.

Stolen Parthenon (Elgin) marbles
Some of the stolen Parthenon art (Elgin marbles)...
 
After a successful make, fly around the objects with:
  bash$ cd 3D-Objects
  bash$ ../src/renderer/renderer statue.ply
  • Hit 'R' to stop/start auto-spin.
  • Use the cursor keys, 'A' and 'Z' to pilot.
  • Rotate the light with 'W', 'Q'.
  • 'S' and 'F' are 'strafe' left/right, 'E' and 'D' are 'strafe' up/down.
    (strafe keys don't work in auto-spin mode).
  • Page up/page down change the rendering mode, cycling through:
    • Points
    • Points via triangle culling
    • Ambient (when ambient occlusion data are available in the 3D model, this actually looks good)
    • Gouraud (complete Phong lighting per vertex)
    • Phong (complete Phong lighting per pixel)
    • Phong and shadow maps
    • Phong and soft shadow maps
  • ESC quits.
Try the other 3D objects, too: trainColor.tri, legocar.3ds, pharaoh.ply, etc...

Command line parameters

Usage: renderer [OPTIONS] [FILENAME]

  -h         this help
  -r         print FPS reports to stdout (every 5 seconds)
  -b         benchmark rendering of N frames (default: 100)
  -n N       set number of benchmarking frames
  -m <mode>  rendering mode:
       1 : point mode
       2 : points based on triangles (culling,color)
       3 : triangles, ambient colors
       4 : triangles, Gouraud shading, ZBuffer
       5 : triangles, per-pixel Phong, ZBuffer
       6 : triangles, per-pixel Phong, ZBuffer, Shadowmaps
       7 : triangles, per-pixel Phong, ZBuffer, Soft shadowmaps

Creating more 3D objects on your own

  1. Use MeshLab to convert your 3D object to .PLY.
  2. Load it up in shadevis and hit ENTER to have shadevis calculate the ambient occlusion factors per vertex. After that, hit 'D' as many times as necessary to lower the diffuse light to 0%, and hit 'a' to pump up the ambient to 100%. Hit 'S' to save the object.
  3. Load the saved '..._vis.ply' with my renderer.
Enjoy!

 

Rant 1: Why did you do this, you crazy person?
Tie fighter
The dark side... of coding SMP
  

Well...

I've always loved coding real-time 3D graphics. I always experimented with algorithms, always tried to make things run faster, look better... And as a side effect, I became a better coder for it :-)

Anyway, these sources are my "reference" implementations. At some point around 2003, I decided that it was time to clear up the code mess that I've been hacking on over the years and focus on code clarity - ignoring execution speed. To that end, floating point is used almost everywhere (fixed-point begone!) and this being Phong shading, the complete lighting equation is calculated per pixel. I basically created a "clean" implementation of everything I have ever learned about polygon-related graphics. The clarity of the code also paved the way for the OpenGL version...

Rant 2: Tales of Multicore

This code was single threaded until late 2007. At that point, I heard about OpenMP, and decided to try it out. I was amazed at how easy it was to make the code "OpenMP-aware": I simply added a couple of pragmas in the for-loops that drew the triangles and the shadow buffers, and ...presto!

The only things I had to change were static variables, which had to be moved to stack space. Threading code can't tolerate global/static data, because race conditions immediately appeared when more than one thread worked on them.

Skeleton
Once I began using OpenMP, the
renderer crashed many C++ compilers.
As of 2009, they have finally adapted!

Only two compilers truly supported OpenMP at the time: Intel's compiler (version 8.1) and Microsoft's CL. GCC unfortunately died with 'internal compiler error'. I reported this to the GCC forums, found out that I was not the only one who had noticed, and was told (by the forum guys) to wait.

While waiting for GCC to catch up, I kept researching multicore technologies. Functional languages seem particularly adept to SMP, and I've put them next in line in my R&D agenda (Ocaml and F# in particular). Before leaving C++ behind, though, I heard about Intel Threading Building Blocks (TBB) and decided to put them to the test. TBB is a portable set of C++ templates that makes writing threading code a lot easier than legacy APIs (CreateThread, _beginthread, pthread_create, etc). TBB is also open-source, so it was easy to work with it and figure out its internals. Truth be told, it also required more changes in my code (OpenMP required almost none). Still, it is a vast improvement compared to conventional threading APIs.

I must also confess that I have not invested a lot of effort in using these technologies; I only enhanced two of my main rendering loops to make them SMP aware. Still, this was enough to boost the speed (on my Core2Duo) by 80%! Judging by the gain/effort ratio, this is one of the best bargains I've ever met...

As of now (October 2008), GCC 4.3.2 is up to speed and compiles OpenMP code just fine. TBB is of course running perfectly (since it is simply a C++ template library), so choose freely between any of the two, and easily achieve portable multithreading.

When I say portable, I mean it:

  1. OpenMP binaries (./configure --enable-openmp --disable-tbb) for...
    • Windows (via TDM/MinGW GCC 4.3.2)
    • Linux (via GCC >= 4.3.2 in both 32 and 64bit)
    • Linux (via Intel's compiler in 32 bit)
    • Mac OS/X (follow these instructions to get a GCC 4.4.x, which supports important SSE optimizations (-mrecip) and has stable support for OpenMP - Xcode's GCC 4.2.x is too old for OpenMP).
  2. TBB binaries (./configure --disable-openmp --enable-tbb) for...
    • Linux (via GCC in both 32 and 64bit)
    • Linux (via Intel's compiler in 32 bit)
    • Mac OS/X (even with Xcode's old GCC 4.2.x)
    • FreeBSD 7.0/64bit
    • OpenSolaris (tested with 2008.11 / GCC 3.4.3)
  3. Single-threaded binaries for...
    • Poor OpenBSD4.3/64: it doesn't have real, SMP threads. Not yet, at least :-) It only has user-space ones (as Linux did at some point). But it does compile the code, albeit in single-threaded mode.

Talk about portable code!

Torus
Dynamic scheduling makes sure all cores
are kept busy, even for low tesselations
  

If you're still in the... dark ages and use legacy APIs (CreateThread, _beginthread, pthread_create, etc) you are really missing out: Under both OpenMP and Intel TBB, I increased the rendering frame rate of the train object by more than 40%, by simply replacing...

#pragma omp parallel for
with
#pragma omp parallel for schedule(dynamic,100)
(similar change for TBB, at code inside Scene.cc).

Why? Because these modern threading APIs allow us to easily adapt to different loads per thread, by using dynamic thread scheduling.


Back to homepageLast update on: Sun Dec 6 13:49:10 2009 (Valid HTMLValid CSS)