CAS is about latency, not bandwidth
As usual, the world is not that simple and Real World (TM) performance depends on more
than plain memory bandwidth. If you had tried a more diverse set of permance messurements,
notably a pointer chasing loop, you'd have observed something quite different: a low CAS latency can be more important than higher memory clock.
CL2 @ 166MHz = 12ns
CL1.5 @ 100MHz = 15ns
CL2 @ 133MHz = 15ns
CL3 @ 166MHz = 18ns
CL2 @ 100MHz = 20ns
CL3 @ 133MHz = 22ns
The discrete CL (CAS latency in cycles) is derived from the real continous-time latency
constraints, so the optimal CL and CLK setting depends on how close you can get.
For example, that constraint is 14ns, clearly you're better off going with CL2@133MHz than
with CL3@166MHz FROM A LATENCY PERSPECTIVE. For bandwidth, that latter is better. The optimum choise depends on the benchmark.