Results 1 to 17 of 17

Thread: ClanLib 2.3 significantly slower than 2.2 ?

  1. #1
    Lesser Knight
    Join Date
    Jan 2011
    Location
    Vienna, Austria
    Posts
    30

    Default ClanLib 2.3 significantly slower than 2.2 ?

    Hi everyone,

    I just tested the latest ClanLib 2.3 svn, and I get significantly less fps with the new version compared to 2.2 (~90 fps with 2.3 vs ~750 with 2.2). I know that 90 fps are still way more than necessary, but this makes we wonder.
    I am using
    Code:
    CL_GUIWindowManagerTexture.process();
    CL_GraphicsContext.clear();
    CL_GUIWindowManagerTexture.drawWindows(getGC());
    CL_DisplayWindow->flip();
    CL_KeepAlive::process()
    in my main loop. The program also uses the clanlib OpenGL classes. I do quite a lot of texture switching each time I draw, and at the moment I also draw each texture on its own with gc.draw_primitives (no batching yet).


    Has anyone else noticed something like this? Any ideas what might cause the problem, where I can start looking? My code is exactly the same when testing with 2.2 and with 2.3.

    Cheers,
    spin

  2. #2
    Lesser Knight
    Join Date
    Jan 2011
    Posts
    32

    Default

    Are you using Release build in both cases?

  3. #3
    Lesser Knight
    Join Date
    Jan 2011
    Location
    Vienna, Austria
    Posts
    30

    Default

    I think so.
    Just did an svn checkout, autogen, configure, make, make install without any options for clanlib.
    For my code, definitively yes.

  4. #4
    ClanLib Developer
    Join Date
    May 2007
    Posts
    1,824

    Default

    If the cause is the following...

    In Sources/opengl_graphic_context_provider.cpp :-

    Code:
       CL_OpenGLGraphicContextProvider::draw_primitives()
    
        Comment out the "CL_OpenGL::get_opengl_version_major() >= 3" block containing draw_primitives_legacy()
    
      And in CL_OpenGLGraphicContextProvider::set_primitives_array()
        Change if (CL_OpenGL::get_opengl_version_major() < 3) to always true (eg change 3 to 30000)
    ...It that sped things up, then ClanLib's got a problem!

    OpenGL 3.0 and above requires Vertex arrays to have vertex buffer objects.

    I was hoping that creating vertex buffer objects would be very fast. It is possible that's not the case.

    If so, ClanLib would need to hold a vertex buffer object in various places (e.g. CL_RenderBatch2D )

    NVidia and ATI driver with compatibility enabled works without vertex buffer objects with an OpenGL 3.0 context. But, on my machine with OpenGL 3.2 with and without compatibility does not work with ATI

  5. #5
    ClanLib Developer
    Join Date
    May 2007
    Posts
    1,824

    Default

    I have just tried it.

    It is partially that.

    I tried to create a 4 buffered vertex buffer object to prevent GPU memory reallocations and wait states

    But that did not achieve anything. As expected I think. If a vertex buffer object is still being used internally, the driver should move it into an internal temporary one ( see http://www.opengl.org/wiki/Buffer_Ob...-specification )

    ClanLib 2.2 = 100 FPS
    ClanLib 2.3 = 40 FPS
    ClanLib 2.3 With ring buffer patch (ring.patch) = 40 FPS
    ClanLib 2.3 With vertex arrays without vertex buffer objects = 52 FPS

    This is a more complex problem - The performance is significantly reduced elsewhere.

    Note, this is using the "GUI/DirectRender" example with all the gui windows enabled.
    The Pacman example seems to be okay.

    Note, the "vertex client arrays" restriction is in OpenGL 3.1, not OpenGL 3.0. So that "hack" can be removed by default, since ClanLib creates an OpenGL 3.0 context unless specified otherwise.
    Last edited by rombust; 08-01-2011 at 10:59 AM.

  6. #6
    ClanLib Developer
    Join Date
    May 2007
    Posts
    1,824

    Default

    Commited to SVN: "Client vertex arrays must have a vertex buffer object for opengl 3.1"

    So OpenGL 3.0 allows vertex arrays.

    I think that's correct. The specification suggests so.

  7. #7
    Lesser Knight
    Join Date
    Jan 2011
    Location
    Vienna, Austria
    Posts
    30

    Default

    For my program, this fixes the speed issues. I'm back to hundreds of fps again
    I tried both the fix you posted here and svn update, and both fixed the problem.

    Thank you!

  8. #8
    Lesser Knight
    Join Date
    Jan 2011
    Location
    Vienna, Austria
    Posts
    30

    Default

    I'm sorry I have to resurrect this thread, but I'm experiencing the very same problem again, this time with Windows 7.
    Using the precompiled 2.2.12 libs the framerate is approx. 15 times higher than with 2.3.3.
    On my Linux machine (Ubuntu 10.10) the latest 2.3.3 svn version still achieves high framerates.

    Any tips? Can I request a specific OpenGL version on windows somehow?

    Thanks,
    spin

  9. #9
    ClanLib Developer
    Join Date
    May 2007
    Posts
    1,824

    Default

    I have noticed a similar, possibly connected problem.

    For some reason in my 3d engine, it has started to throw the following exception:
    "Since OpenGL 3.0, vertex array attributes must contain a buffer object. Use CL_VertexArrayBuffer"

    But only with the latest nvidia driver.

    This is very odd, since by default ClanLib creates a OpenGL 3.0 context.

    I guess a bug in the nvidia driver.

    Either way, ClanLib requires a nicer solution. I do not know what.

  10. #10
    ClanLib Developer
    Join Date
    Sep 2006
    Location
    Denmark
    Posts
    554

    Default

    Do we have a specific ClanLib example/test that is much slower in 2.3 than in 2.2?

  11. #11
    Lesser Knight
    Join Date
    Jan 2011
    Location
    Vienna, Austria
    Posts
    30

    Default

    You can use Display_Shaders/HSVSprite with only a slight modification to reproduce the fps problem.
    In hsv_sprite_batch.h, replace
    Code:
    enum { num_vertices = 6*256 };
    with
    Code:
    enum { num_vertices = 6 };
    With this modification (i.e. the batcher is useless now) the frame rate with clanlib 2.2 is approx twice as high as with clanlib 2.3.

  12. #12
    ClanLib Developer
    Join Date
    Sep 2006
    Location
    Denmark
    Posts
    554

    Default

    OK, I tried profiling this a little bit using Very Sleepy.

    According to the profiler the modified HSVSprite example spends 80.03% of its time in operator new, with 99.69% of those calls originating from constructing a CL_SharedPtr<CL_PrimitivesArray_Impl> in CL_GraphicContext_Impl::create_prim_array().

    One of the differences between 2.2 and 2.3 is that I swapped our own implementation of CL_SharedPtr with std::shared_ptr. Apparently a bad idea.

    There are several ways to address this issue:

    1. Resurrect the old CL_SharedPtr. Unfortunately it was not entirely API compatible with std::shared_ptr, so its API would have to be ported to be compatible or the rest of ClanLib 2.3+ won't compile.
    2. Change CL_PrimitivesArray to not use CL_SharedPtr, but a more lightweight reference tracking class. This is probably nicer long term.


    But with all this said and done, there is a reason why the batcher exists. The overhead of doing many small draw_primitives() calls should be reduced, but no matter what asking the GPU to only draw a handful of primitives at a time will always be slow. Could the problem in your project be that drawing operations aren't being properly batched?

  13. #13
    Lesser Knight
    Join Date
    Jan 2011
    Location
    Vienna, Austria
    Posts
    30

    Default

    Yes, you are right, that surely is part of the problem. I have not implemented any batching yet.
    Still, I wonder why this is not an issue in Linux. There, I can draw way over 1000 textures, each with its own call to draw_primitives and still have several hundred frames per second. (Yes I know this is bad style. Batching is high up on my todo list). On Windows, the same scence slows down to 20 fps. Just sounds like a lot of impact for a bad shared pointer implementation.

    I'll try to do some profiling on my own and see what I can come up with.

  14. #14
    ClanLib Developer
    Join Date
    May 2007
    Posts
    1,824

    Default

    I know what the problem is:

    On my development machine, I ran some tests:

    (With an ATI card)

    With: num_vertices = 6*256
    ClanLib 2.2 : 2314 FPS
    ClanLib 2.3 : 2322 FPS
    ClanLib 2.4 : 2356 FPS

    With: num_vertices = 6
    ClanLib 2.2 : 716 FPS
    ClanLib 2.3 : 812 FPS
    ClanLib 2.4 : 792 FPS

    So, approximately the same.

    On my other machine with an NVidia GPU, I did some checks.

    Unfortunately I do not have ClanLib 2.2 on that machine... but...

    Using the NVIDIA 280.26 (GeForce) Driver and the NVIDIA 285.38 Beta (GeForce) Driver

    CL_OpenGLGraphicContextProvider::draw_primitives_l egacy() is called!

    This is very unusual, an OpenGL 3.1 context is created.
    But ClanLib asked for the driver to be an OpenGL 3.0 context.
    That explains the slowness.

    So
    1) NVIDIA needs to fix their driver
    2) Fix ClanLib so draw_primitives_legacy() is not required.

  15. #15
    ClanLib Developer
    Join Date
    May 2007
    Posts
    1,824

    Default

    Confirmed with NVidia:

    ClanLib 2.2: 444fps
    ClanLib 2.3: 200fps

  16. #16
    ClanLib Developer
    Join Date
    May 2007
    Posts
    1,824

    Default

    NVidia is correct:

    http://www.opengl.org/registry/specs...te_context.txt
    The attribute names WGL_CONTEXT_MAJOR_VERSION_ARB and
    WGL_CONTEXT_MINOR_VERSION_ARB request an OpenGL context supporting
    the specified version of the API. If successful, the context
    returned must be backwards compatible with the context requested.
    Backwards compatibility is determined as follows:

    If a version less than or equal to 3.0 is requested, the context
    returned may implement any of the following versions:

    * Any version no less than that requested and no greater than 3.0.
    * Version 3.1, if the GL_ARB_compatibility extension is also
    implemented.
    * The compatibility profile of version 3.2 or greater.
    Ideally, we should change ClanLib CL_RenderBatch2D::SpriteVertex to use a vertex buffer object.

  17. #17
    ClanLib Developer
    Join Date
    May 2007
    Posts
    1,824

    Default

    Applied a fix to ClanLib 2.3 SVN and 2.4 SVN


    Fix the OpenGL "allow_vertex_array_without_buffer_object" detection code by using the CL_OpenGLWindowDescription instead of the reported OpenGL version. This increases ClanGL speed on certain GPUs

Similar Threads

  1. Slower please
    By in forum Dink Smallwood HD
    Replies: 3
    Last Post: 08-15-2004, 01:29 PM

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •