Texture and environment mapping
I've added texture mapping to the mix. This includes mapping mathematically defined spheres with arbitrary images and also (finally) actually use the UV-mapping
coordinates that are (kindly) provided with many of the 3D models (meshes) you can download from e.g. turbosquid.com.
So now, I can import meshes (like I did before), but I can
actually use the provided texture images.
Having implemented spherical texture-mapping it was only a small leap to do spherical environmental mapping, so I implemented that too.
I greatfully use ASSIMP to import a great variety of 3D mesh-formats (.3ds, .obj, .fbx, .blend, .xxx)
and it is versatile... but it seems to fight back.
Maybe it's because ASSIMP targets OpenGL, which I do absolutely not, and the conversion of their format to my own structure
of vertices, polygons and meshes and textures and, and, and is... well... cumbersome at best.
I'm finding myself manually investigating which texture
images belong to which of the meshes in the scenes I load with ASSIMP, using a different third party image-library to load the images, and then map them onto the meshes using the provided UV-coordinates.
Among other things this means, that my scene files are becoming more and more complex. Here's an example of a relatively simple scene with all the pool balls from 1 to 15, a checkered floor and a spherical environment map (XML):
<?xml version="1.0" encoding="utf-8"?>
<shape type ="ENVIRONMENT_SPHERE" name="SkyEnvironment" active="yes">
<position>0.0, 0.0, 0.0</position>
<shape type ="INFINITE_PLANE" name="Plane" active="yes">
<checker_texture>White Black 5 5</checker_texture>
<p1>0.0, -20.0, 0.0</p1>
<p2>1.0, -20.0, 0.0</p2>
<p3>0.0, -20.0, 1.0</p3>
<shape type ="IMPORT_SCENE" name="Cabin" active="yes">
... which resulted in this (faulty) image (not all the pool balls got mapped an image and are just black with the basic material values I gave them):
See the image in better resolution further below.
I also made a 'slim' version of my ray tracer, only running console (no immediate output to the screen) in preparation of a hopefully future version,
that will run on multiple cores _without_ the need to do threading. With threading comes the inherent problem of protecting simultaneous access to data. If I could get my ray tracer to do parts of images or whole scenes in different processes, even on different
CPU's in a network...
To approach this idea, I did a preliminary test with a simple (but fully capable) version of my ray tracer, and it worked beautifully. Actually, but not surprising, eliminating all of the graphics stuff (MFC) and windows specific
crap, it was a LOT faster and less error prone.
In fact: This was the first time (in a long time), I could build a _release_ version of my code and REALLY see what it was capable of. At least a factor of 20 faster!
I imagine a front
end peace of software, that intelligently controls a host of processes of ray tracers that, in turn, complete parts of or whole scenes, depending on the complexity.
If the scene complexity means that each image will take days to render, I will dedicate
a lot of CPU (eventually also GPU power (CUDA)) to render parts of each frame (so we can see results pretty fast).
If the scene is not so complex, I'll have each CPU/core render whole frames.
In any case, I hope to kill the need for managing
threads on my own. The operating systems are fully capable of deciding whats best as it is, so running rendering processes matching the individual PC (!) capability is my best guess.
Berating comments expected ;-)
So... I've had a small pause developing but took it up again.
Actually I stopped working on my ray tracer because I didn't know what to do next.
I'd achieved all my immediate goals including some of the more
advanced stuff like Depth of Field (DOF), glossy surfaces and translucency.
I know what I _want_ with my ray tracer, but that wish is just not humanly possible with only one single
I want the whole shabang; an editor where you can model your scene and make the camera fly through the scene, select materials, lights and many many more things, that can
only be accomplished with a lot of more coders in a large company.
... so I stick to what I like best: improving my ray tracer ;-)
As an example, I started out thinking naively, I would create my own 3D editor for my ray tracer, so I could create my own meshes and edit them on a vertex level... and wasted a lot of time on that. Then I figured I'd
look for a library that could help me read various mesh-formats.
Come ASSIMP to the rescue!
That took care of the whole "how to get great meshes"-problem. Now I just download them from TurboSquid.
Still, the more advanced my ray tracer became, the more obvious became the need for speed.
The introduction of distribution ray tracing techniques forced me to take a serious look at my code.
These techniques do not come for free. Like I stated elsewhere; each time I introduced a new distribution technique, the rendering time exploded exponetially.
For every pixel, add 1000+ DepthOfField rays.
For each of these 1000 extra rays, another 1000+ soft shadow rays have to be shot.
1 million rays/pixel already (without acceleration techniques, but still).
Add adaptive anti aliasing to the process and multiply this
1.000.000 with perhaps 15 rays... now it's 15 million rays/pixel.
Add glossy surfaces to the gamble, and multiply the 15 million rays/pixel by 1000... now 15 billion (15.000.000.000)
Add translucency and... all hell does NOT break loose! It takes forever and a day.
a LOT of adaptive acceleration techniques, and you can minimize the number of rays very significantly.
This simple scene, with a couple' spheres and some chrome/steel square
pillars and both soft shadows _and_ DOF active, took almost 24 hours to render:
This is not an especially interesting scene; it just tells me, that using multiple distribution techniques simultaneously is HEAVY on the CPU. It was rendered with 6 threads on my new MSI GS73VR 6RF Stealth Pro laptop. 4 physical, 8 logical cores... all maxed out: 24 hours!
same scene with a slightly different camera angle, DOF still active but with hard shadows; less than an hour:
So...sticking to the puralistic thinking of a ray tracer, not giving in to short cuts in the pursuit of the greatest looking images I'm thinking: Having applied many
of the common acceleration techniques: I have to somehow be able to speed this up significantly without compromising the quality.
There are two basic ways of getting there:
Smarter acceleration techniques
Soft Shadow Volumes for Ray Tracing
This technique seems to effectively limit the number of needed shadow testing rays to _one_ ray/pixel vs. eg. 1000+. Very worth looking into.
Perfecting my scene-structure
I've used various techniques to speed up the process of finding which object and polygon
the primary (and secondary) rays hit (octrees, quadtrees, ligths in normal direction, objects 'visible' in normal direction etc.) and at the same time, keep the memory usage at a reasonable level.
I've still to investigate and use other stuff in that direction, which may greatly improve rendering time.
To be honest, my approach to multithreading
and the problem of synchronization was, well... to not synchronize at all. Every time you enter a critical section, you effectively stop the execution of other threads trying to access the same data (variable).
My solution was to simply copy the entire scene to each thread, so each thread had its own entire set of data to work on. 4 threads = four entire scenes.
I tried to keep it to a minimum, and succeeded to a certain extend
reality is, that ray tracing _is_ a painstakingly slow process, and rightly so. It tries to mimic real scenes as closely as mathematically possible barring real life.
So far I've
added threading to the process and I've had great success with this approach on a stand-alone PC. I was able to almost fully exploit the power of every physical kernel of my PC. This meant, that I could speed up the rendering by almost a factor of four.
This is not enough though if I process scenes that take 24 hours and easily more.
I need raw CPU POWER....
and a LOT of it.
I'll be back! :-)
Added translucency. It looks fantastic but like glossy surfaces, it is a slow process.
The code for
translucency is almost a copy-paste from the glossy surfaces algorithm, except that I need to handle the ray differently at depths and, of course, use transmission rays ;-)
multithreading is old news... it works significantly great, except my core i7, 4 core ASUS G750J can't handle using all (even three) cores simultaneously any longer.... it just crashes after a few minutes (cooling fans doing overtime).... need me some 18-core
XEON based thing, thats designed for heavy loads 24/7.
For the first time I realize the potential of true multicore workstations... XEON, OPTERON based systems. Imagine having
a server with two CPU slots, each with an 18 core XEON !!
36 stable cores or more at your fingertips... no heat problems!
Moved on to glossy surfaces, which is really what prompted my wish for more and stable cores.
Glossy surfaes is another of the distribution ray
tracing techs. Glossy surfaces are everywhere in real life, but the effect is exeedingly 'expensive' to copy in ray tracing. It requires a lot of rays to reproduce... again, one of the recursive trace's that just explodes the number of rays.
It basially reproduces the imperfections on a surface, and makes the reflections look 'blurry', instead of the completely sharp reflections you get from the 'first' reflection vector (perfect
reflection) you calculated,
To be continued....
I've added multithreading to the mix and it works beautifully!
I have a PC with 4 physical cores (older core i7 model CPU):
Processor: Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz, 2401 Mhz, 4 Core(s), 8 Logical Processor(s)
with four threads, utilizing all physical cores, the PC appearantly overheats, and crashes after a few minutes... the cooling fans work like crazy.
Running with three threads,
I get about 90% from each extra core, and so an image that took 103 seconds to render without multithreading now takes 44 seconds.
I've included two videos (Camtasia recordings)
of the rendering processes, to show the difference. The quality of the recordings sucks, but they show the difference just fine.
If I use all four cores, the cooling fans
do overtime, and my PC actually crashes after about 2 minutes, so I stick with three threads until I figure out how to control it.