Faster Composite Key For Dictionary

After watching some YouTube videos by Nick Chapsas about performance and memory allocations, I was wondering if there are any better ways of using the Dictionary<> class when I have a composite key that needs to be converted into a single key prior to being added to the dictionary. I have needed to do this, plus I have seen other developers that needed to do the same thing. I thought I should look into this to see if I could compare a few techniques for solving this using time and memory usage as my main units of comparisons. BenchmarkDotNet is being used to help determine the best one. This article attempts to describe my simple research to answer my questions on this topic.

A Few Techniques

To try this out, I made a special solution I called MultiIndex in Visual Stuido with various techniques to see how they compared with some random fake celebrity data.

A few ideas came to my mind that I figured would vary in performance. I imagine there are other techniques that are not top-of-mind. Here are some techniques in the fastest time first to the worst time being the last. For each technique either a string, string-array or a long are used as the key.

Add With Concatenated Hash Codes

This is the fastest technique of the bunch. To generate the composite key, I am calling the GetHashCode() method on the first name of the celebrity, casting it to a long, bit shifting it to the left 32 bits and finally doing a binary or on the results of the return value of the GetHashCode() method of the last name of the celebrity cast as an unsigned int.

[Benchmark]
public void AddWithConcatenatedHashCodes()
{
    var myDictionary = new Dictionary<long, CelebInfo>();

    foreach (var celeb in testData)
    {
        myDictionary.Add(((long)celeb.FirstName.GetHashCode() << 32) | (uint)celeb.LastName.GetHashCode(), celeb);
    }
}

I was not expecting this to be the best one.

Add With Unioned Hash Codes

This is an example of creating a composite key using a "union" in C#. I was expecting this technique was the one I was expecting to run the best. Similarly to the previous example the GetHashCode() method is used to help generate a composite long key with the help of the TwoKeyUnion class.

[Benchmark]
public void AddWithUnionedHashCodes()
{
    var myDictionary = new Dictionary<long, CelebInfo>();
    TwoKeyUnion keyBuilder;
    keyBuilder.fullKey = 0; // to avoid compiler error

    foreach (var celeb in testData)
    {
        keyBuilder.key0 = celeb.FirstName.GetHashCode();
        keyBuilder.key1 = celeb.LastName.GetHashCode();
        myDictionary.Add(keyBuilder.fullKey, celeb);
    }
}

The following is how I was implementing the union in C#.

/// <summary>
/// This struct is used to "concate" two ints (key0 and key1) into a long (fullKey).  
/// </summary>
[StructLayout(LayoutKind.Explicit)]
internal struct TwoKeyUnion
{
    [FieldOffset(4)]
    public int key0;

    [FieldOffset(0)]
    public int key1;

    [FieldOffset(0)]
    public long fullKey;
}

I have heard that bit-shifts and or‘ing integers is very simple and fast operations for CPUs, but I suspected that using the union would be faster because neither of those operations would need to be done and so I was expecting this to be the best technique for creating the key. Perhaps a deeper analysis of the IL would reveal the answer, or maybe a better understanding of processor pipelining and data forwarding within the pipeline might explain it, but I do not know why this is not the best technique for creating a composite key. On another hand, I do not think that this is as intuitive as the Add With Concatenated Hash Codes technique shown above so it seems harder to read, learn and maintain in my opinion.

Add With String Concatenated Key

This technique to create a composite key uses the plus operator on the first and last names with the separator between them.

[Benchmark]
public void AddWithStringConcatenatedKey()
{
    var myDictionary = new Dictionary<string, CelebInfo>();

    foreach(var celeb in testData)
    {
        myDictionary.Add(celeb.FirstName + Separator + celeb.LastName, celeb);
    }
}

This is only concatenating three strings to generate the composite key, so I should not be surprised that the performance is so close to the Add With String Interpolated Key example.

Add With String Interpolated Key

Starting with this technique and the following techniques strings are used as the composite key. For this case, I am using an interpolated string with the first and last names separated by a separator string.

[Benchmark]
public void AddWithStringInterpolatedKey()
{
    var myDictionary = new Dictionary<string, CelebInfo>();

    foreach(var celeb in testData)
    {
        myDictionary.Add($"{celeb.FirstName}{Seperator}{celeb.LastName}", celeb);
    }
}

I was actually expecting this one to be worse than the Add With String Concatenated Key technique because it was explained to me that interpolated strings is equivalent to using the StringBuilder class and that would require more allocations. This has made the results that much more interesting to me.

Add With String Array Key

This technique uses an array of strings as the composite key.

[Benchmark]
public void AddWithStringArrayKey()
{
    var myDictionary = new Dictionary<string[], CelebInfo>();

    foreach(var celeb in testData)
    {
        string[] twoKeys = [celeb.FirstName, celeb.LastName];
        myDictionary.Add(twoKeys, celeb);
    }
}

According to the results from BenchmarkDotNet, this technique uses less memory than both Add With String Concatenated Key and Add With String Interpolated Key.

Add With String Builder Made Key

This technique uses the StringBuilder class to help create the composite key.

[Benchmark]
public void AddWithStringBuilderMadedKey()
{
    var myDictionary = new Dictionary<string, CelebInfo>();
    var compositKeyBuilder = new StringBuilder();

    foreach(var celeb in testData)
    {
        compositKeyBuilder.Append(celeb.FirstName);
        compositKeyBuilder.Append(Separator);
        compositKeyBuilder.Append(celeb.LastName);
        myDictionary.Add(compositKeyBuilder.ToString(), celeb);
        compositKeyBuilder.Clear();
    }
}

I was expecting this to be equivalent to using an interpolated string to generate the composite key.

Add With String Joined Key

This technique uses the String.Join(...) method to create the composite key.

[Benchmark]
public void AddWithStringJoinedKey()
{
    var myDictionary = new Dictionary<string, CelebInfo>();

    foreach(var celeb in testData)
    {
        myDictionary.Add(String.Join(Separator, celeb.FirstName, celeb.LastName), celeb);
    }
}

Before looking at the results from BenchmarkDotNet, I thought I had heard that some fast magic occurred in the String.Join(...) method, so I was expecting this to run faster and use less memory that it does. Maybe this is only true for a lot of parameters passed into the method.

Add To Nested Dictionaries

This technique does not really combine the keys together into one to form a single composite key. It uses a dictionary in a dictionary to solve the problem in a different way.

[Benchmark]
public void AddToNestedDictionaries()
{
    Dictionary<string, Dictionary<string, CelebInfo>> outerDictionary = new Dictionary<string, Dictionary<string, CelebInfo>>();

    foreach(var celeb in testData)
    {
        if (!outerDictionary.TryGetValue(celeb.FirstName, out Dictionary<string, CelebInfo>? innerDictionary))
        {
            innerDictionary = new Dictionary<string, CelebInfo>();
            outerDictionary.Add(celeb.FirstName, innerDictionary);
        }
        if (!innerDictionary.TryGetValue(celeb.LastName, out var _))
        {
            innerDictionary.Add(celeb.LastName, celeb);
        }
    }
}

This technique was the slowest and used the most allocated memory.

Data Set-up

To run these benchmark tests demonstrating some techniques of creating composite keys, I created a TestData.txt file with the first and last names of 100 celebrities. When the test data is being loaded and put into the CelebInfo objects, a bogus address is also added. The address isn't used in the tests and was added just to make it more realistic since the values in the dictionary are more than just the key for those that are string or string array based keys.

The Results

Below is some of the output relavant for this research to answer my questions about this topic.

| Method                       | Mean      | Error     | StdDev    | Gen0   | Gen1   | Allocated |
|----------------------------- |----------:|----------:|----------:|-------:|-------:|----------:|
| AddWithConcatenatedHashCodes |  9.034 us | 0.1025 us | 0.0909 us | 2.4261 |      - |   9.95 KB |
| AddWithUnionedHashCodes      |  9.302 us | 0.0818 us | 0.0639 us | 2.4261 |      - |   9.95 KB |
| AddWithStringConcatenatedKey |  9.709 us | 0.0422 us | 0.0374 us | 3.6316 |      - |  14.89 KB |
| AddWithStringInterpolatedKey |  9.735 us | 0.0908 us | 0.0849 us | 3.6316 |      - |  14.89 KB |
| AddWithStringArrayKey        | 10.747 us | 0.1585 us | 0.1405 us | 3.3875 |      - |  13.86 KB |
| AddWithStringBuilderMadeKey  | 11.523 us | 0.0497 us | 0.0441 us | 3.7537 |      - |  15.34 KB |
| AddWithStringJoinedKey       | 12.844 us | 0.1184 us | 0.1107 us | 4.5929 |      - |   18.8 KB |
| AddToNestedDictionaries      | 20.599 us | 0.1199 us | 0.1001 us | 5.2490 | 0.0305 |  21.45 KB |

I found it interesting that the Add With String Array Key technique uses less memory allocations than the two faster techniques showing before it. This makes me think that in some contexts, it could be a better option than those two faster techniques.

Set-up - Hardware

The results showing above were run on an older Dell XPS running Windows 10. It has an Intel Core i7 CPU (920) at 2.67 GHz. It shows 6GB of RAM. I am sure you would get some different millage running this on different hardware. The code project is targeting .Net 8.

Get Hash Code method

Here are some links that I found interesting when doing some research on the GetHashCode() method.

Try it Yourself

The MultiIndex code has been uploaded to GitHub if you would like to see everything and try it yourself.

Limitations and Perspectives

Based on what I have read GetHashCode() method returns different results every time an app is started up (unless it is running the older .Net Framework). This is something to consider if keys need to be the same accross app domains or when the app restarts for some reason. Be careful when using it.

It seems like these code techniques are more about how to take two different strings as input and create a single identifier/key of those two strings and does not have much to do about keys and dictionaries. Is that OK?

Conclusion

The speeds showing in the table in the results section, with the excption of the last one, seem pretty close to each other. I am not trying to say with these results that any specific technique should or should not be used. My hope is that this information does help the developer use a technique that is best for your application. There are always a lot of factors to consider when one of these techniques is being considered besides run-time and memory effeciency. I do think that performance and effeciecy should be a factor to consider when writing code, but like many say, premature optimization is evil. Find a good balance of factors to create good code.