|
The C# Column
Performance optimisations in the .NET world
This article discusses various ways for improving performance
of managed applications running under the .NET platform.
Exception Handling
Throwing
exceptions can be very expensive in terms of program execution. This is because
the stack has to un-winded so as to make sure any local objects are destructed
properly. An application that throws a lot of exceptions is surely a good program
but not so from the point of execution speed. So make sure that an application
simply does not throw too many exceptions.
The Perfmon utility can be used to see how many exceptions
are being thrown by an application. Note that only when the exception is thrown
at run time the execution speed suffers. So in general try/catch blocks does
not directly relate to the performance of an application.
Moral? Do not use exception handlers to take run time decisions.
Use a simple if else construct whenever possible. Also even if we dont
explicitly throw exceptions, we may call the library functions that do. Also
the run time may throw exceptions. So a simple count of exceptions coded by
a program cannot reveal the true number of exceptions being thrown by a program
code.
VB.NET turns on int checking by default, to make sure that
things like overflow and divide-by-zero throw exceptions. This can be turned
off in order to gain performance. While using Interop services the underlying
COM routines can return error codes (HRESULTS) causes exceptions.
Calling Un-Managed Code
Try to design the application such that it doesnt rely
on small, frequent calls that carry too much overhead (intra-appdomain methods,
p/invoke, interop and remoting calls). A function call that performs several
tasks, such as a method that initialises several fields of an object is certainly
better than methods that does very simple tasks and require multiple calls to
get things done (such as setting every field of an object with a different call).
Note that whenever we call upon an unmanaged-routine from
the managed code the run-time has to perform several time-consuming activities
like:
(a) Perform data marshalling
(b) Fix Calling Convention
(c) Protect callee-saved registers
(d) Switch thread mode so that GC wont block unmanaged
threads
(e) Erect an Exception Handling frame on calls into managed
code
(f) Take control of thread (optional)
In order to speed up the unmanaged calls we can make use
of P/Invoke. Also COM interop is much more expensive than the regular P/Invoke
calls.
Data Marshalling
Data marshalling plays a vital role in the performance of
a managed application.
While calling P/Invoke data marshalling has to be done. For
primitive types, marshalling is not required at all. Classes that use explicit
layout attribute are also cheap in terms of performance.
The real performance penalty has to be paid when data translation,
such as text conversion from ASCI to Unicode occurs. A lot of marshalling overhead
can be cut down by simply agreeing on a certain data type or format across the
managed and unmanaged modules.
The sbyte, byte, short, ushort, int, uint, long, ulong, float
and double data types do not require any kind of marshalling. Any ValueTypes
and single-dimensional arrays containing any of these data types also get marshalled
real fast. Using simple structs (whenever possible) instead of classes can also
lead to performance improvements as a lot of boxing and unboxing is avoided.
Excessive boxing / unboxing of value types can result in
degraded performance We can keep track of how heavily boxing and unboxing is
being done by looking at GC allocations and collections using the Perfmon tool.
Collection Classes
Use the AddRange( ) method to add a whole collection, rather
than adding each item in the collection iteratively. Nearly all windows controls
and collections have both Add and AddRange methods, and each is optimised for
a different purpose. AddRange has some extra overhead but clearly improves performance
when adding multiple items.
The foreach keyword allows us to walk across items in a list,
string, etc, and perform operations on each item. This is a very powerful tool,
since it acts as a general-purpose enumerator over many types. The tradeoff
for this generalization is speed, and if we rely heavily on string iteration
we should use a simple for loop instead. Strings are simple character arrays,
they can be iterated much faster using much less overhead than other structures.
The JIT is smart enough to optimise away bounds-checking and other things inside
a for loop, but is prohibited from doing this on foreach.
Arrays
Multidimensional ArraysThe JIT is better at handling
jagged arrays than multidimensional ones, so use jagged arrays instead.
Strings
When a string is modified, the run time will create a new
string and return it, leaving the original to be garbage collected. Most of
the time this is a fast and simple way to do it, but when a string is being
modified repeatedly it begins to be a burden on performance: all of those allocations
eventually get expensive.
To solve this performance a StringBuilder object can be used.
However StringBuilder object itself requires extra overheads in terms of time
and memory. On a machine with fast memory, a StringBuilder becomes worthwhile
if youre doing about server string modifying operations repeatedly.
Assemblies
The number of assemblies that are loaded by an application
directly contributes to the amount of memory required for running the application.
For you load an entire assembly just to use one or two methods, were paying
a tremendous cost for very little benefit. Instead, if we can duplicate that
methods functionality using code can be a better way out.
Methods are JITed when they are first used, which means that
we pay a larger startup penalty if our application does a lot of method calling
during startup. Windows Forms use a lot of shared libraries in the OS, and the
overhead in starting them can be much higher than other kinds of applications.
Pre-compiling Windows Forms applications usually results in an increased performance.
The ngen.exe utility can be used to precompile an application.
Moreover this can be during install time to make sure that the application is
optimised for the machine on which it is being installed. Instead if we run
ngen.exe before we ship the program, the optimisations are restricted to the
ones available on the development machine.
Database
Stored procedures are highly optimised tools that result
in excellent performance when used effectively. Set up stored procedures to
handle inserts, updates, and deletes with the data adapter. Stored procedures
do not have to be interpreted, compiled or even transmitted from the client,
and cut down on both network traffic and server overhead.
Using Managed C++
The managed world also takes away a lot of burden from our
shoulders by taking care of routine jobs like memory management, thread scheduling
and type coercions. This allows us to focus our energies on the parts of the
program that need it. With Managed C++, we can choose exactly how much control
we want to keep since we can mix managed and un-managed code in the same application.
For example, in MC++ we can take the address of an item in
the middle of a character array and use a loop to iterate through the characters
in an array instead of using foreach to speed up things. We can also do a linked-list
traversal in MC++ by taking the address of the next field. Both
of these are not possible with languages like C# or VB.NET
MC++ gives us a lot more control over the boxing / unboxing
of value data types, so that we wont have to dynamically (or statically)
unbox to access values. This is another performance enhancement. Just place
the __box keyword before any type to represent its boxed form.
 |
Yashavant Kanetkar, one of the first
Express Computer columnists, is an established software expert, speaker
and author with several best-sellers to his credit, including titles like
“Let Us C” and the “Fundas” series. Contact him at kanetkar@dcubesoft.com |
|