vendredi 2 janvier 2015

Are noncontiguous arrays performant?


In C#, when a user creates an List and adds items to it, there is a chance it runs out of space and needs to allocate more space. It allocates double (or some other multiplier) the size of the previous array, copies the items over and discards the reference to the old array. I know that the list grows exponentially because each allocation is expensive and this limits it to O(log n) allocations, where just adding 10 extra items each time would result in O(n) allocations.


However for large array sizes there can be a lot of wasted space, maybe nearly half the array. To reduce memory I wrote a similar class NonContiguousArrayList which uses List as a backing store if there were less than 4M items in the list, then it would allocate additional 4M-size arrays as NonContiguousArrayList grew in size.


Unlike List these arrays are non-contiguous so there is no copying of data around, just an additional 4M allocation. When an item is looked up, the index is divided by 4M to get the index of the array containing the item, then modulo 4M to get the index within the array.


Can you point out problems with this approach? Here is my list:



  • Noncontiguous arrays don't have cache locality which results in bad performance. However at a 4M block size it seems like there would be enough locality for good caching.

  • Accessing an item is not quite as simple, there's an extra level of indirection. Would this get optimized away? Would it cause cache problems?

  • Since there is linear growth after the 4M limit is hit, you could have many more allocations than you would ordinarily (say, max of 250 allocations for 1GB of memory). No extra memory is copied after 4M, however I'm not sure if the extra allocations is more expensive than copying large chunks of memory.





Aucun commentaire:

Enregistrer un commentaire