Borderlands shadows and framerate performance

There’s been a large amount of discussion on the Gearbox forums as well as other places with regards to Borderlands (PC) having horrible framerates. Numerous people reported that disabling Dynamic Shadows in the Video configuration menu (also known as DynamicShadows=False in WillowEngine.ini) increased their framerates dramatically (by almost 2x).

People have also complained about slow mouse response, recommending all kinds of weird workarounds.

Well, I can confirm both statements on my own system which uses an nVidia GeForce 9800GT (model BFGE981024GTGE). All of the below tests were done on said card, on Windows XP SP3, the latest DirectX 9 runtimes, and nVidia driver version 195.62. Your experience may vary depending on what card you have and other whatnots.

I began by looking at WillowCompat.ini and determined that depending on what model of video card you have (based on PCI Device ID; you can look this up using GPU-Z — there are multiple revisions of the same type of card, so keep that in mind) the game chooses options for you. In my case, my card is PCI Device ID 10DE-0614 (the last 4 hexadecimal digits are what defines card model). There appear to be some oddities in WillowCompat.ini too; for example, Trilinear=True, which isn’t defined or used in any other INI file. This probably enables/disables trilinear filtering, but I’m surprised that setting isn’t defined anywhere else.

Anyway, I experimented with combinations of certain WillowEngine.ini settings, as I wanted to find out what config options — or combination of them — were causing the framerate drop. I spent a couple hours this weekend doing tests, using the entrance at Krom’s Canyon as my testing environment. The reason I chose Krom’s is that it loads very quickly, contains a good number of high-polygon models, textures which make use of reflections (think bump mapping), in addition to a large number of shadows. I actually saw some reflection/texture rendering issues while in Krom’s, which is what started me on this whole goose chase to begin with….. :-)

Finally, I should note that with regards to mouse responsiveness, I have bEnableMouseSmoothing=False set in WillowInput.ini, as well as OneFrameThreadLag=False in WillowEngine.ini. The column is meant to represent how the mouse feels both in-game as well as in the UI; players very likely know what I mean by “floaty” vs. “accurate”.

avg fps Mouse resp. DynamicShadows LightEnvironmentShadows bEnablePSSMShadows Comments
33 Floaty True True True
64 Accurate True True False No object or terrain shadows
38 Floaty True False True
65 Accurate True False False No object or terrain shadows
38 Floaty False True True No gun shadows
64 Accurate False True False No object or terrain shadows
38 Floaty False False True No gun shadows
65 Accurate False False False No object or terrain shadows

Screenshots were taken for each of the above tests:

What this chart proves is that bEnablePSSMShadows is what’s responsible for the increased/decreased framerate as well as mouse responsiveness. Given that fact, there’s no real point toDynamicShadows=True when PSSM is disabled. DynamicShadows=False seemed to disable things like rendering shadows across your gun and so on; not really something I care about.

This, of course, made me ask: “What exactly is PSSM?” Thankfully there’s a significant amount of documentation on Parallel-Split Shadow Maps. Here’s a few resources I found online which visually demonstrate the concept:

http://appsrv.cse.cuhk.edu.hk/~fzhang/pssm_project/
http://appsrv.cse.cuhk.edu.hk/~fzhang/pssm_vrcia/
http://www.stevestreeting.com/2008/08/21/parallel-split-shadow-maps-are-cool/
http://hax.fi/asko/PSSM.html

The UE (Unreal Engine) has a multitude of shadow map methods which are available: PSSM, VSM (Variance Shadow Maps), and something called PCF (Percentage-Closer Filtering) or Branching PCF. Here are some more resources:

http://www.punkuser.net/vsm/
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch08.html
http://pixelstoomany.wordpress.com/2007/09/03/a-not-so-little-teaser/
http://forum.beyond3d.com/showthread.php?t=40805

http://www.gamerendering.com/2008/11/15/percentage-closer-filtering-for-shadow-mapping/
http://www.fabiensanglard.net/shadowmappingPCF/index.php
http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter17.html

I tried a multitude of config combinations with regards to these features to see if I could keep some form of shadowing but without the performance hit. Needless to say, I wasn’t successful.

So it seems that the best way to achieve proper framerates, at least with a 9800GT, is to set bEnablePSSMShadows=False and DynamicShadows=False (for a few extra fps) and accept the results.

Other footnotes:

  • Adjusting MinShadowResolution and MaxShadowResolution had no effect on any of these tests. I mention this because I’ve seen forum users advocating decreasing these settings (from 1024 to 256) to solve framerate/mouse responsiveness issues. I adjusted these and found there to be no visual difference, nor any difference in framerate.
  • I’d really like to know what the bNVIDIA3d config setting does. It’s not mentioned in the UE documentation or the Wiki. Hmm…

RedHat — security? What’s that?

My friend mdl came across this gold mine of a bug report over at RedHat’s bug repository:

https://bugzilla.redhat.com/show_bug.cgi?id=534047

The bug itself isn’t what makes this a gold mine — the comments from the RedHat developers are. Here are some which myself and others found pristine:

It’s not insecure. We’ve had the mechanism checked.

I don’t particularly care how UNIX has always worked.

Regarding users being able to mount devices without root credentials: Of course he should. It’s been this way for years for console users e.g. mounting storage devices.

This concept amazes me. Comment #126 in the bug is pretty awesome too — worst workaround I’ve ever heard of. What planet are these developers from?

Writing FreeBSD memstick.img to a USB drive in Windows

I’ve received some hits on my blog of people looking how to write the FreeBSD memstick.img image to a USB flash drive under Windows. The official FreeBSD procedure works great, but only applies if you already have access to a FreeBSD box. Accomplishing the same under Windows is more of a hassle, but not too much.

The solution: download John’s Newbigin’s dd for Windows. This is an enhanced version of dd which also lets you list off raw devices in Windows — including USB sticks.

In the below example, I have a 4GB USB flash drive (HP, model v100w) connected on a USB port, under Windows XP SP3. This is the drive I want 8.0-RC2-amd64-memstick.img written to. Bolded text is used for denoting commands I’ve typed, as well as the device strings associated with the USB flash drive:

C:\>dd --list
rawwrite dd for windows version 0.5.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by the GPL.  See copying.txt for details
Win32 Available Volume Information
\\.\Volume{1ff1b266-ab71-11de-b1e8-806d6172696f}\
  link to \\?\Device\HarddiskVolume1
  fixed media
  Mounted on \\.\c:

\\.\Volume{808faa36-bdbc-11de-a116-806d6172696f}\
  link to \\?\Device\HarddiskVolume2
  fixed media
  Mounted on \\.\d:

\\.\Volume{1ff1b262-ab71-11de-b1e8-806d6172696f}\
  link to \\?\Device\CdRom0
  CD-ROM
  Mounted on \\.\e:

\\.\Volume{3794d0ff-abb4-11de-9377-00221578190a}\
  link to \\?\Device\CdRom1
  CD-ROM
  Mounted on \\.\f:

\\.\Volume{ec4923e1-c907-11de-a118-00221578190a}\
  link to \\?\Device\Harddisk1\DP(1)0-0+12
  removeable media
  Mounted on \\.\g:

NT Block Device Objects
\\?\Device\CdRom0
  size is 2147483647 bytes
\\?\Device\CdRom1
  size is 2147483647 bytes
\\?\Device\Harddisk0\Partition0
  link to \\?\Device\Harddisk0\DR0
  Fixed hard disk media. Block size = 512
  size is 300069052416 bytes
\\?\Device\Harddisk0\Partition1
  link to \\?\Device\HarddiskVolume1
\\?\Device\Harddisk0\Partition2
  link to \\?\Device\HarddiskVolume2
\\?\Device\Harddisk1\Partition0
  link to \\?\Device\Harddisk1\DR17
  Removable media other than floppy. Block size = 512
  size is 4009754624 bytes
\\?\Device\Harddisk1\Partition1
  link to \\?\Device\Harddisk1\DP(1)0-0+12
  Removable media other than floppy. Block size = 512
  size is 4009730048 bytes
...

The device string we want is the NT Block Device, not the Win32 Volume, and we’re interested in the Partition0 entry. Now that we know the device path, we can write memstick.img directly to that, using the exact same block size as what the official FreeBSD procedure recommends.

Note that the conv=sync parameter has been removed (not needed here, and this version of dd doesn’t understand it anyway), and I’ve added the --progress flag which indicates how many bytes have been written in real-time (useful).

Finally: please be sure you pick the correct device string! I won’t be held accountable if you screw this up and destroy your Windows machines’ hard disk. :-)

C:\>dd if=8.0-RC2-amd64-memstick.img of=\\?\Device\Harddisk1\Partition0 bs=10240 --progress
rawwrite dd for windows version 0.5.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by the GPL.  See copying.txt for details
1,044,858,880
102037+0 records in
102037+0 records out

Voilà.

Testing out FreeBSD 8.0-RC2

Those who haven’t read about my 8.0-RC1 experience should do so first:

Basically, my experience with 8.0-RC2 was identical to that of RC1, except some of the bugs/issues I experienced are now gone (hooray!).

Fixes/improvements:

  • The issue I experienced with the Boot Manager selection phase of installation has been fixed. Also, Standard is now the default option (first choice).
  • The geometry does not match label problem has been addressed by fixing the FreeBSD slice editor in sysinstall/sade; see below.
  • libdisk has been modified to work properly with GEOM; the FreeBSD slice editor links to libdisk — thanks to Randi Harper for tracking this commit down! Commonly when installing FreeBSD on a box, people go into the slice editor and press “a” to use the entire disk. Previously, users would end up with a disk where the first 63 sectors were unused (probably for the PBR/MBR and overall alignment), then the FreeBSD slice, and a third “unused” portion of the disk (which, if I remember correctly, was done solely for alignment reasons). Example:
    Offset       Size(ST)        End     Name  PType       Desc  Subtype    Flags
    
             0         63         62        -     12     unused        0
            63  390716802  390716864    ad8s1      8    freebsd      165    A
     390716865       5103  390721967        -     12     unused        0
    

    Starting with RC2, this is what you’ll see:

    Offset       Size(ST)        End     Name  PType       Desc  Subtype    Flags
    
             0         63         62        -     12     unused        0
            63  390721905  390721967    ad8s1      8    freebsd      165    A
    

    Note the lack of the last “unused” section.

    Sadly, this also means people will need to reinstall FreeBSD (specifically, deleting the slice and re-creating it) to benefit from this. As far as I know, you can’t fix this without a full reinstallation.

  • The EOF issue for ttys (re: ^D being shown) has been fixed and committed to CURRENT (FreeBSD 9.0), but hasn’t been MFC’d to RELENG_8 yet. Yes, it’s scheduled to be (in about 2 weeks). Big thanks to Ed Schouten for fixing this!
  • There were some ZFS commits which happened between RC1 and RC2 which may indicate that the ARC exhausting all available kmem is no longer possible. I have not been able to confirm/deny whether this fix works, but looking at the code, it may be sufficient. I’d need to get in touch with Kip Macy to confirm/deny.

Issues that are still pending:

  • bsdlabel still behaves incorrectly (“Class not found”). Instead, users should use gpart to write the bootstraps as follows: gpart bootcode [disk], where [disk] is ad4 or similar. Note that you pick the disk itself now, not the slice like in bsdlabel (unless you were using dangerously dedicated disks :-) ).
  • The ZFS notice pertaining to vfs.zfs.prefetch_disable when the system has less than 4GB RAM available has been re-worded again, but still is vague/unclear. A little bit of ego here — the person committing these changes should really consider changing the message to what I proposed.
  • I still haven’t received a reply to my request for clarification on ZFS stabilisation. Is /boot/loader.conf tuning for kmem-related parameters still required? We still need an official statement on this matter.

I also want to take a moment to send a shout-out to John Baldwin, who has been working incredibly hard on the FreeBSD kernel (specifically VM and ACPI) over the past 4 weeks. John, I’ve seen/followed your commits, and I appreciate the improvements! Thank you!