在有cache的单机系统中,通常有两种写策略:write through和write back。这两种写策略都是针对写命中(write hit)情况而言的:write through是既写cache也写main memory;write back是只写cache,并使用dirty标志位记录cache的修改,直到被修改的cache 块被替换时,才把修改的内容写回main memory。那么在写失效(write miss)时,即所要写的地址不在cache中,该怎么办呢?一种办法就是把要写的内容直接写回main memory,这种办法叫做no write allocate policy;另一种办法就是把要写的地址所在的块先从main memory调入cache中,然后写cache,这种办法叫做write allocate policy。
在有cache的多处理器系统中仍然会有写失效(write miss)的情况,而且no write allocate policy和write allocate policy也仍然都可以使用。需要讨论的只是在出现write miss这种情况后,snooping cache该如何处理的问题。假设执行写操作的是P1,监听的是P2的cache,那么无论P1执行写操作时是write hit还是write miss,P2的cache(即snooping cache)都会检查P1所写的地址是否在P2的cache中,假如执行的是write invalid策略,那么snooping cache或者把相应的块置为invalid,或者什么都不做(因为它没有相应的块),肯定不会把其它处理器所要写的块调入自己的cache中(因为没有任何意义)。
通过这样的分析,可以认为英文原版书中p597第二段的第一句话:”Another variant is loading the snooping cache on write misses”所表达的意思不准确,而且在整个第二段中仅有此处出现了一次snooping cache,其它的地方再也没有提到。所以,我认为准确的表述应该是:”Another variant is loading the cache on write misses”。
上文是转一个老师的。如果要看图,这个wiki里面的比较形象。
http://en.wikipedia.org/wiki/Cache_(computing)
实际上ARM手册上说得也很细,不过要花更多耐心来看。
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0488c/DDI0488C_cortex_a57_mpcore_r1p0_trm.pdf
Write-Back Read-Write-Allocate
This is expected to be the most common and highest performance memory type. Any read or
write to this memory type searches the cache to determine if the line is resident. If it is, the line
is read or updated. A store that hits a Write-Back cache line does not update main memory.
If the required cache line is not in the cache, one or more cache lines is requested from the L2
cache. The L2 cache can obtain the lines from its cache, from another coherent L1 cache, or
from memory. The line is then placed in the L1 cache, and the operation completes from the L1
cache.
Write-Back No-Allocate
Use Write-Back No-Allocate memory to access data that might be in the cache because other
virtual pages that are mapped to the same Physical Address are Write-Back
Read-Write-Allocate. Write-Back No-Allocate memory avoids polluting the caches when
accessing large memory structures that are used only one time. The cache is searched and the
correct data is delivered or updated if the data resides in one of the caches. However, if the
request misses the L1 or L2 cache, the line is not allocated into that cache. For a read that misses
all caches, the required data is read to satisfy the memory request, but the line is not added to
the cache. For a write that misses in all caches, the modified bytes are updated in memory.
Note
The No-Allocate allocation hint is only a performance hint. The processor might in some cases,
allocate Write-Back No-Allocat
6.4.4 Non-cacheable streaming enhancement
You can enable the CPUACTLR[24], Non-cacheable streaming enhancement bit, only if your
memory system meets the requirement that cache line fill requests from the multiprocessor are
atomic. Specifically, if the multiprocessor requests a cache line fill on the AXI master read
address channel, any given write request from a different master is ordered completely before
or after the cache line fill read. This means that after the memory read for the cache line fill
starts, writes from any other master to the same cache line are stalled until that memory read
completes. Setting this bit enables higher performance for applications with streaming reads
from memory types that do not allocate into the cache.
Because it is possible to build an AXI interconnect that does not comply (vi. 遵守;顺从,遵从;答应)with the specified
requirement, the CPUACTLR[24] bit defaults to disabled.
从总线到总线的内存操作可以不过cache line。这不就是DMA吗?
6.4.7 Preload instruction behavior
The multiprocessor supports the PLD, PLDW, and PRFM prefetch hint instructions. For Normal
Write-Back Cacheable memory page, the PLD, PLDW, and PRFM L1 instructions cause the line to be
allocated to the L1 data cache of the executing processor. The PLD instruction brings the line into
the cache in Exclusive or Shared state and the PLDW instruction brings the line into the cache in
Exclusive state. The preload instruction cache, PLDI, is treated as a NOP. PLD and PLDW instructions
are performance hints instructions (?)only and might be dropped in some cases.
performance hints instructions 怎么理解?因为hint是提示和暗示的意思。
这里先理解为“用于推测/暗示性能的指令”