Profile Lib in Action
Prev: Specification | Next: Specification |
VSProfileLib is heavily based on the article by Steve Rabin, “Real-Time In-Game Profiling“, published in Game Programming Gems I (see References). I read the article when the book came out, but didn’t manage to get the companion code running properly. Since the idea was so simple, I decided to build my own, and that is where VSProfileLib comes from.
VSProfileLib allows us to measure the segments of code we want, and a hierarchy of segments can be built to break down how the time is being spent. As any profiler tool, VSProfileLib itself takes its time, hence, VSProfileLib also computes the time it takes doing the profiling, so that we can subtract it from total time spent by the application, and get a more accurate performance assessment.
VSProfileLib not only profiles CPU code but also the OpenGL GPU side using OpenGL timers.
Note that profiling the GPU takes its toll performance wise, so use it wisely.
The following image shows the profiler running on top of an OpenGL application.
The profiler report shows, in real time and on top of the rendered scene, the number of calls (#c), the CPU total time in milliseconds (ms), the GPU time, also in milliseconds, and the time it took to execute the profile itself (wt), for each profile section. To build a profile section just enclose a sequence of instructions in the source code as described below. As you can see from the figure above, the profiler allows nested sections, i.e. a hierarchy of sections can be built and the report will provide indentation to show the relation between sections. There is a section labelled “Frame” which encloses all other sections. This outer section corresponds to a frame in the source code. Inside it several more sections are defined. The section “Render one bench” is called 341 times, and the accumulated number of milliseconds spent executing is 8.27. Inside this section there are subsections. For instance, there is one labelled “Render Array”. This subsection is executed 682 times per “Frame”, or 2 times per “Render one bench”. This is because the bench has two meshes. This subsection took 2.57 milliseconds on the CPU, and 5.49 milliseconds on the GPU. Naturally, we can profile the sequence of instructions used to display the profile report, and that’s what the section “Dump” is actually doing.
Note that the VSProfileLib does not have functionality to write on top of an OpenGL application. VSFontLib was used for this purpose.
Settings
The behavior of the profiler is defined by two constants: VSPL_PROFILE and VSPL_CLOCK. The first one defines what to profile, CPU and GPU, as well as providing a means to disabling profiling. The second one allows several options for the CPU clock. See the specification section for possible values for these constants.
CPU Profiling
To define a profiler section, select a sequence of instructions in the application source code, provide a label for it, and the profiler does the rest. Lets take a look at an example. Consider the following render function:
void renderScene(void) { glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); vsml->loadIdentity(VSMathLib::MODELVIEW); vsml->lookAt(camX, camY, camZ, 0,0,0, 0,1,0); vsml->scale(scaleFactor, scaleFactor, scaleFactor); glUseProgram(myShader.getProgramIndex()); for (int i = -1; i < 2; ++i) { for (int j = -1; j < 2; ++j) { vsml->pushMatrix(VSMathLib::MODELVIEW); vsml->translate(20.0f * i, 0.0f, 20.0f * j); myModel.render(); vsml->popMatrix(VSMathLib::MODELVIEW); } } glutSwapBuffers(); }
This function draws a frame of our scene. The scene consists of a 3×3 grid of benches, similar to the above figure, but with less benches.
Now lets add some CPU profiling sections. A profiler section is defined as follows:
{ PROFILE("label"); // the code that is part of the section }
The open curly bracket, followed by the instruction PROFILE("label")
defines the beginning of the profile section. The end of the profile section is the corresponding closing curly bracket.
So to add a profile section for the whole frame, using the label “Frame”, all that is required is:
void renderScene(void) { PROFILE("Frame"); glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); vsml->loadIdentity(VSMathLib::MODELVIEW); vsml->lookAt(camX, camY, camZ, 0,0,0, 0,1,0); vsml->scale(scaleFactor, scaleFactor, scaleFactor); glUseProgram(myShader.getProgramIndex()); for (int i = -1; i < 2; ++i) { for (int j = -1; j < 2; ++j) { vsml->pushMatrix(VSMathLib::MODELVIEW); vsml->translate(20.0f * i, 0.0f, 20.0f * j); myModel.render(); vsml->popMatrix(VSMathLib::MODELVIEW); } } glutSwapBuffers(); }
Note that, when profiling a section that is completely inside curly brackets, there is no need to add more brackets.
To profile the for
cycle where the benches are drawn we could do:
void renderScene(void) { PROFILE("Frame"); glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); vsml->loadIdentity(VSMathLib::MODELVIEW); vsml->lookAt(camX, camY, camZ, 0,0,0, 0,1,0); vsml->scale(scaleFactor, scaleFactor, scaleFactor); glUseProgram(myShader.getProgramIndex()); { PROFILE("Render Models"); for (int i = -1; i < 2; ++i) { for (int j = -1; j < 2; ++j) { vsml->pushMatrix(VSMathLib::MODELVIEW); vsml->translate(20.0f * i, 0.0f, 20.0f * j); myModel.render(); vsml->popMatrix(VSMathLib::MODELVIEW); } } } glutSwapBuffers(); }
As shown in the example above, we had to enclose the section in curly brackets, and add the PROFILE("Render Models")
statement.
Here is another example with the same function with a few more profile sections:
void renderScene(void) { PROFILE("Frame"); glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); vsml->loadIdentity(VSMathLib::MODELVIEW); vsml->lookAt(camX, camY, camZ, 0,0,0, 0,1,0); vsml->scale(scaleFactor, scaleFactor, scaleFactor); glUseProgram(myShader.getProgramIndex()); { PROFILE("Render Models"); for (int i = -1; i < 2; ++i) { for (int j = -1; j < 2; ++j) { vsml->pushMatrix(VSMathLib::MODELVIEW); vsml->translate(20.0f * i, 0.0f, 20.0f * j); { PROFILE("Render One Bench"); myModel.render(); } vsml->popMatrix(VSMathLib::MODELVIEW); } } } { PROFILE("Swap"); glutSwapBuffers(); } }
One thing left to do: show the results on top of our scene. VSProfileLib provides a static method, const std::string &DumpLevels()
to get a string with the profile report. Once we get that string we can render it using VSFontLib as shown below:
{ PROFILE("Dump"); glUseProgram(myShader.getProgramIndex()); std::string s = VSProfileLib::DumpLevels(); vsfl.prepareSentence(profileSentence, s); vsfl.renderSentence(10, 30, profileSentence); }
Note that we are profiling this section to know how much time we’re wasting getting and printing the report. This code assumes that vsfl
is a properly initialized global variable of type VSFLFont
, and that profileSentence
is a valid sentence, i.e. it is the return value of genSentence()
, from the VSFontLib library. The following code in the main function should do the trick:
profileSentence = vsfl.genSentence(); vsfl.loadFont("couriernew10",true);
If in doubt regarding VSFontLib check out its pages. It is recommend to use a font that can be displayed as fixed size, such as Courier New, for proper report formatting. This is all you have to do to use the profiler.
Actually, there is still one thing left. What happens when you’re done profiling and want to get your app running without the profiler? There could be many profile sections, and commenting the code for each wouldn’t be a very elegant solution. Fortunately, there is only one line of code to change (oh, and comment the printout of the profile report, but that’s only one segment of code).
To disable profiling just go to vsProfileLib.h and define VSPL_PROFILE
as:
#define VSPL_PROFILE VSPL_PROFILE_NONE
And that’s it. No more profiling!
GPU Profiling
In the above example, two sections also profile the GPU, namely “Binds” and “Render Array”. The OpenGL time queries do take time and that can been seen in the wasted time column.
To profile the GPU we have to set VSPL_PROFILE
to VSPL_PROFILE_CPU_AND_GPU
. See the specification section for more details. Profiling the GPU actually profiles both CPU and GPU. There is no option to profile the GPU alone.
The procedure to profile the GPU is the same as before, but instead of using PROFILE
we use PROFILE_GL
.
For instance to profile a call to glDrawElements
we can do:
{ PROFILE_GL("Render Array"); glDrawElements(GL_TRIANGLES,myMeshes[nd->mMeshes[n]].numFaces*3,GL_UNSIGNED_INT,0); }
The code above uses OpenGL time queries. To collect the results use:
VSProfileLib::CollectQueryResults();
Then, when the results are displayed the query results will be presented along the CPU times. As mentioned before OpenGL time queries do take their time, so its a good idea to profile the CollectQueryResults
function as in the following example:
{ PROFILE("Collect GL Queries Time"); VSProfileLib::CollectQueryResults(); }
To keep the impact of the time queries to a minimum, this lib uses a double buffer approach for the queries. So in fact, when calling CollectQueryResults
we are getting the query results for the previous frame, not the actual frame.
Final Notes
When profiling a CG application it is natural that the work load varies a lot between frames. So in one instance we may be looking at a part of our scene that has little geometry to display, and a few frames latter we enter an area where there are tons of triangles. The profiler is based on averages, so it will require a lot of time to display the proper averages for the triangle crowded area as it will be averaging these new results with the old values it got from the previous frames. To work around this issue a method is provided to reset the profiler counters. This static method is void VSProfileLib::Reset()
. It is useful to set up a key, that when pressed, calls this method.
And now it is really over. That’s all we need to use the profiler. Still, as usual, there is a short section for the VSProfileLib specification, and of course the header and cpp files are also available in the next sections.
Prev: Specification | Next: Specification |
One Response to “Profile Lib in Action”
Leave a Reply Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Thanks for creating the library. I’m just wondering, when I tried to use it and call
{
PROFILE(“Collect GL Queries Time”);
VSPL::CollectQueryResults();
}
at the end of a frame like in the tutorial, I get an assert in VSPL::accumulate() line 200 trying to access element of the vector that doesn’t exist. If i don’t call PROFILE(“Collect GL Queries Time”) before calling VSPL::CollectQueryResults() it seems fine.