previous approach:
* cache 64K on first read
* cache 1M on subsequent intersecting reads
new approach:
* cache 64K on first read
* cache 1M on the next intersecting read
* cache 8M on subsequent intersecting reads
* cache 4M on standalone reads at offsets >1M
improves performance by 50% on windows
and should help on high-latency connections
not even the deprecationwarning that got silently generated burning
20~30% of all CPU-time without actually displaying it anywhere, nice
python 3.12.0 is now only 5% slower than 3.11.6
also fixes some other, less-performance-fatal deprecations