- Latency devices(CPU cores)
- Throughput devices(GPU cores)
- Use the best match for the job (heterogeneity in mobile SOC
- CPU: Latency Oriented Design
- Powerful ALU
- Reduced operation latency
- Large caches
- convert long latency memory accesses to short latency cache accesses
- Sophisticated control
- Branch prediciton for reduced branch latency
- Data forwarding for reduced data latency
- GPU: Throughput Oriented Design
- Small caches
- To boost memory throughput
- Simple control
- No branch prediction
- No data forwarding
- Energy efficient ALUs
- Many long latency but heavily pipelined for high throughput
- Scalability
- Portability
- SPMD – Single Program, Multiple Data
- Threads within a block cooperate via shared memory, atomic operation, barrier synchronization
时间: 2024-10-27 08:40:57