在PNX的PR#1302中,我们引入了Int2ObjectNonBlockingMap和Long2ObjectNonBlockingMap(简称为NBMap),一种无锁、非阻塞、以原始类型为键的并发映射,相比于Java中的原生HashMap具有显著的内存优势且性能相当,相比于Java中的原生ConcurrentHashMap具有显著的性能优势,相比于fastUtil中的原始类型Map具有显著的并发优势。
这有望被用在Level类中,以高性能地实现真正的并发实体和方块实体修改及更新、并发区块加载/卸载和并发方块更新,可以进一步释放提升PNX的多核性能。
NonBlockingMap最早是在Cliff Click(现任Azul JVM首席架构师)的博士论文中描述的,当时java1.4被广泛使用,此PR将其修改为适应java17的版本,并去除了其对Unsafe的依赖(虽然这会使得性能略有降低,但安全性是值得的,况且即使降低后仍有显著优势)。
NonBlockingMap使用原子更新以无锁方式实现并发修改/访问,由于对于原始类型的原子更新在如今的大部分硬件上都有直接快速支持,所以它具有相当强的性能优势:
256 long keys, 8 threads, write/read = 0/1 :
(比ConcurrentHashMap快132.9%)
256 long keys, 2 threads, write/read = 0/1 :
(比ConcurrentHashMap快29.7%)
256 long keys, 12 threads, write/read = 1/2 :
(比ConcurrentHashMap快324.2%)
512 int keys, 8 threads, write/read = 1/2 :
(比ConcurrentHashMap快382.2%)
当然它也有缺点:
这有望被用在Level类中,以高性能地实现真正的并发实体和方块实体修改及更新、并发区块加载/卸载和并发方块更新,可以进一步释放提升PNX的多核性能。
NonBlockingMap最早是在Cliff Click(现任Azul JVM首席架构师)的博士论文中描述的,当时java1.4被广泛使用,此PR将其修改为适应java17的版本,并去除了其对Unsafe的依赖(虽然这会使得性能略有降低,但安全性是值得的,况且即使降低后仍有显著优势)。
NonBlockingMap使用原子更新以无锁方式实现并发修改/访问,由于对于原始类型的原子更新在如今的大部分硬件上都有直接快速支持,所以它具有相当强的性能优势:
256 long keys, 8 threads, write/read = 0/1 :
Benchmark Mode Cnt Score Error Units
MapReadLongTest.fastUtilLong2ObjectOpenHashMapRead thrpt 3 5550619.283 ± 9132062.819 ops/s
MapReadLongTest.fastUtilSyncLong2ObjectOpenHashMapRead thrpt 3 136044.923 ± 5639.646 ops/s
MapReadLongTest.jdkConcurrentHashMapRead thrpt 3 2490831.642 ± 3942055.626 ops/s
MapReadLongTest.jdkConcurrentSkipListMapRead thrpt 3 1035536.300 ± 645028.817 ops/s
MapReadLongTest.jdkHashMapRead thrpt 3 2757844.972 ± 372380.896 ops/s
MapReadLongTest.jdkSyncHashMapRead thrpt 3 116349.828 ± 35263.728 ops/s
MapReadLongTest.pnxLong2ObjectNonBlockingMapRead thrpt 3 5801655.792 ± 4950593.359 ops/s
MapReadLongTest.pnxLong2ObjectTrieMapRead thrpt 3 2651341.065 ± 330637.698 ops/s
(比ConcurrentHashMap快132.9%)
256 long keys, 2 threads, write/read = 0/1 :
Benchmark Mode Cnt Score Error Units
MapReadLongTest.fastUtilLong2ObjectOpenHashMapRead thrpt 3 2036140.324 ± 2569303.018 ops/s
MapReadLongTest.fastUtilSyncLong2ObjectOpenHashMapRead thrpt 3 249825.366 ± 1082145.757 ops/s
MapReadLongTest.jdkConcurrentHashMapRead thrpt 3 1467582.543 ± 4666334.233 ops/s
MapReadLongTest.jdkConcurrentSkipListMapRead thrpt 3 274615.071 ± 340934.780 ops/s
MapReadLongTest.jdkHashMapRead thrpt 3 1538100.796 ± 3797169.855 ops/s
MapReadLongTest.jdkSyncHashMapRead thrpt 3 177881.333 ± 797864.546 ops/s
MapReadLongTest.pnxLong2ObjectNonBlockingMapRead thrpt 3 1903236.322 ± 101104.362 ops/s
MapReadLongTest.pnxLong2ObjectTrieMapRead thrpt 3 793226.461 ± 167108.484 ops/s
(比ConcurrentHashMap快29.7%)
256 long keys, 12 threads, write/read = 1/2 :
Benchmark Mode Cnt Score Error Units
MapReadWriteLongTest.fastUtilSyncLong2ObjectOpenHashMapRead2Write1 thrpt 3 35906.821 ± 32243.899 ops/s
MapReadWriteLongTest.jdkConcurrentHashMapRead2Write1 thrpt 3 488247.551 ± 137092.387 ops/s
MapReadWriteLongTest.jdkConcurrentSkipListMapRead2Write1 thrpt 3 219483.022 ± 150224.878 ops/s
MapReadWriteLongTest.jdkSyncHashMapRead2Write1 thrpt 3 26034.814 ± 6287.120 ops/s
MapReadWriteLongTest.pnxLong2ObjectNonBlockingMapRead2Write1 thrpt 3 2071133.489 ± 455302.753 ops/s
MapReadWriteLongTest.pnxLong2ObjectTrieMapRead2Write1 thrpt 3 199878.330 ± 22399.482 ops/s
MapReadWriteLongTest.scalaTrieMapRead2Write1 thrpt 3 228861.791 ± 12761.860 ops/s
(比ConcurrentHashMap快324.2%)
512 int keys, 8 threads, write/read = 1/2 :
Benchmark Mode Cnt Score Error Units
MapReadWriteIntTest.fastUtilSyncLong2ObjectOpenHashMapRead2Write1 thrpt 3 17538.200 ± 1040.242 ops/s
MapReadWriteIntTest.jdkConcurrentHashMapRead2Write1 thrpt 3 183612.654 ± 63912.804 ops/s
MapReadWriteIntTest.jdkConcurrentSkipListMapRead2Write1 thrpt 3 92152.849 ± 84604.770 ops/s
MapReadWriteIntTest.jdkSyncHashMapRead2Write1 thrpt 3 13889.279 ± 22683.491 ops/s
MapReadWriteIntTest.pnxLong2ObjectNonBlockingMapRead2Write1 thrpt 3 885445.319 ± 303816.798 ops/s
MapReadWriteIntTest.pnxLong2ObjectTrieMapRead2Write1 thrpt 3 86731.122 ± 9286.965 ops/s
MapReadWriteIntTest.scalaTrieMapRead2Write1 thrpt 3 96288.765 ± 7182.620 ops/s
(比ConcurrentHashMap快382.2%)
当然它也有缺点:
- 由于大量使用硬件cas指令,如果读写冲突十分剧烈,将会导致大量发热,CPU降频(这也是测试中误差较大的原因)
- Cliff Click在他的论文中声称无锁数据结构不可能实现事务的原子性,只能保证数据原子性(虽然没有给出证明),故NonBlockingMap不支持compute(IfAbsent/IfPresent)的原子性,仅保证最终一致性,若算子开销较大,这可能导致性能急剧下降