This article will talk about Cache levels, their impact on performance, and touch on Data-Oriented Design.

The Facts

Here are some facts/truths/constraints which come with the current computer architecture:

$lscpu

$lscpu

L1, L2, L3 heirarchy

L1, L2, L3 heirarchy

http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/

http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/

From Game Programming Patterns

From Game Programming Patterns

Few Observations:

A close enough mental model

Cache Line

Cache Line

Natural Alignment and Memory Layout

For best performance, align data as follows:

  • Align 8-bit data at any address.
  • Align 16-bit data to be contained within an aligned 4-byte word.
  • Align 32-bit data so that its base address is a multiple of four.
  • Align 64-bit data so that its base address is a multiple of eight.
  • Align 80-bit data so that its base address is a multiple of sixteen.
  • Align 128-bit data so that its base address is a multiple of sixteen.
Memory Layout

Memory Layout


// Natural Alignment: 8 bytes
// Size: 24 bytes (8 + 8 + 8)

struct {
	a: u32,
	b: u64,
	c: u32
}


// Natural Alignment: 8 bytes
// Size: 16 bytes (4 + 4 + 8)

struct {
	a: u32,
	b: u32,
	c: u64
}


// Natural Alignment: 8 bytes
// Size: 24 bytes (8 + 8 + 8) + 1 (1 is packed in the u32 empty 4/8)

struct {
	a: u32,
	b: u32,
	c: u64,
	d: bool
}

Optimzation

The OG Example


// Version 1

#include <stdio.h>
#include <stdlib.h>

main () {
  int i,j;
  static int x[4000][4000];
  for (i = 0; i < 4000; i++) {
    for (j = 0; j < 4000; j++) {
      x[j][i] = i + j; }
  }
}

// ==========================================

// Verison 2

#include <stdio.h>
#include <stdlib.h>

main () {
  int i,j;
  static int x[4000][4000];
  for (j = 0; j < 4000; j++) {
     for (i = 0; i < 4000; i++) {
       x[j][i] = i + j; }
   }
}


Data-Oriented Design

Key Principles of Data-Oriented Design:

  1. Data Contiguity: DOD emphasizes keeping related data close together in memory, preferably in contiguous blocks, to promote spatial locality. This reduces cache misses and improves data access speed.

  2. Data Layout Optimization: Data-Oriented Design focuses on structuring data in a way that minimizes padding and aligns data elements to match the CPU’s natural alignment requirements. This optimization aims to make data structures compact and cache-friendly.

  3. Cache Consciousness: DOD seeks to maximize the utilization of CPU caches by designing algorithms and data structures that work well with the cache hierarchy. This includes designing data access patterns that exploit temporal and spatial locality to minimize cache misses.

  4. Avoiding Object Overhead: DOD aims to minimize the overhead introduced by object-oriented programming constructs, like vtables, virtual functions, and extra memory overhead for object metadata.

  5. Array-of-Structs (AoS) to Struct-of-Arrays (SoA) Transformation: In some cases, DOD promotes transforming data from an AoS layout (where each object contains all its properties) to an SoA layout (where all properties are stored in separate arrays). This transformation can improve cache efficiency and data locality.

Some practical tips from the land of DOD (straight out of Andrew Kelley’s Practical DOD):

Closing Notes