How can dereferencing the first character of a string take longer when the string is longer? I’m looking only at the first character, which should be constant time

Consider this program.

char* malloc_random_string_length(int length)
 char* s = (char*)malloc(length + 1);
 for (int i = 0; i < length; i++) {
  s[i] = '\0' + (rand() % 10);
 s[length] = '\0';
 return s;

int test()
 char* array1[NUMBER_OF_STRINGS];
 char* array2[NUMBER_OF_STRINGS];
 int i;
 int matches = 0;

 for (i = 0; i < NUMBER_OF_STRINGS; i++) {
  array1[i] = malloc_random_string_length(STRING_LENGTH);
  array2[i] = malloc_random_string_length(STRING_LENGTH);

 // Okay, now time this loop
 matches = 0;
 for (i = 0; i < NUMBER_OF_STRINGS; i++) {
  if (compare_in_some_way(array1[i], array2[i])) {

 // Return this value so the compiler won't optimize it out
 return matches;

This code creates two arrays, each with NUMBER_OF_STRINGS random strings, each of length STRING_LENGTH. It then calls compare_in_some_way on each pair of strings tallies how many of them pass the test.

Consider this comparison function:

int compare_in_some_way(char* a, char* b)
 return a == b; // just compare the raw pointers

When run with various values for each with NUMBER_OF_STRINGS and STRING_LENGTH, this code's running time is proportional to NUMBER_OF_STRINGS, and the STRING_LENGTH doesn't play a role.

On the other hand, consider this alternate comparison function:

int compare_in_some_way(char* a, char* b)
 return *a == *b; // compare the first characters

This compares the first characters of the strings. With this version, it naturally runs slower as NUMBER_OF_STRINGS increases, but surprisingly, it also runs slower as STRING_LENGTH increases.

How can the length of the string play a factor in how long it takes to compare the first character? The function doesn't even know what the length of the string is!

What we're seeing is the effect of data locality.

In the first version that compares only pointer values, the only memory accesses are to the memory for the arrays themselves. The data in those arrays are tightly packed, so the cache is used efficiently. Since you don't dereference the pointers, it doesn't matter where they point.

Reading the first character from the string adds another memory access, and the characteristics of that memory access vary depending on the length of the string.

From the experimental data, one can conclude that the string data is stored in roughly contiguous memory. When the strings are short, the first characters of each string are closer to each other, since there are fewer other characters in between. This means that they are more likely to occupy the same cache line.

As the strings get longer, the distance between their first characters increases, and fewer strings will fit inside a cache line. Eventually, the strings are long enough that each string ends up on a separate cache line, and you don't gain any significant benefit from locality.

Although the access to the first character of the string is always O(1), the constant inside the O can vary wildly depending on cache conditions.

Comments (4)
  1. Erik F says:

    It makes a lot of sense that this is true: the pathological case where both strings have been paged out could make the comparison time exponentially longer! It’s unlikely that this would happen very often nowadays, but in the days of Windows 95 it seemed to happen quite frequently.

  2. pierrebai says:

    Actually, the reason for the slow-down is more likely the cache eviction during the random string *initialization*, not the cache eviction during access. Allocating and initializing larger string will more-likely overflow the caches, so the comparison loop will now fetch each characters from memory.

    If the strings really are very short, maybe two will fit inside a single cache line, but that becomes false very fast. Cache lines are not big. I’m pretty sure the main culprit is the initialization. It could be tested by making the cache hot again by doing a first loop of untimed comparisons, then doing it a second time. As long as NUMBER_OF_STRINGS does not overflow the main cache, and STRING_LENGTH is not extremely small (say, smaller than 16), it should become constant again.

  3. paulmoore100 says:

    This is why it would be nice for perfmon , resource monitor etc to include cache miss / hit info (as well as instruction pipeline bubbles, branch mis predict etc). Today 100% cpu usage can mean cpu is 100% busy or CPU is 90% idle due to cache issues. Its almost impossible to find this info out

    1. DanStur says:

      The Intel performance profiler includes that information iirc.

      On Linux you can also use one of the valgrind suite to get the same information.

Comments are closed.

Skip to main content