7月5日 高性能计算方法论(HPC)#
Overview#
- Basic Theories for HPC
- Performance Analysis and Optimization Methodology
- Practical Optimization Strategies
- HPC Skill Tree
- How to Learn HPC/CS
Basic Theories for HPC#
Factor affecting performance:
- Algorithms
- Models
- Software
- Hardware
- Physics
Example: Large Matrix Multiplication(详情可看AIPP中的MPI优化和BLAS矩阵计算,GPU速度会比CPU跑得快)
Performance Analysis and Optimization Methodology#
- 斐波那契数列计算,编译器会有优化(O2、O3)。可以通过IDA反编译看看实际运算的代码。
- Maximize performance: Speed、Throughout、Latency(延迟)or Resource is limited(quota配额)
- black box Dominant component
Roofline Performance Mode:
Arithmetic Intensity(AI) = FLOP's/Bytes (this could judge the performance of program)
屋顶线可以判断 CPU 和缓存的使用情况。我们是的最终目的是为了让它达到拐点!
而2020后有深度学习模型来训练黑箱测整体性能
- Amadal’s law(水桶效应,补全最短的)
- Methods : Analysis in math; Hardware simulator; Profile: sampling some usage of a resource; Trace: collecting highly detailed data about the execution of a system.
- General Optimization Pipeline
Practical Optimization Strategies#
- Algorithm Optimization - Prefetch & Prediction
- Caching :stores results from previous executions ; Limited cache size.
- Lock - Free: Use atomic primitives(CAS Atomic_add)
Negative example: GIL in Python
- Load Balancing(make or cores to work)
- Reduce Precision(精度)
- Reduce Branching(skip list or like binary tree of branch)
- Vectorization(High-level: vectorized computation graph ; Instruction-level: SIMD instructions)
See in your lab2
- Optimize Memory Access Locality
- GEMM
- Blocking
- Loop Permutation(排列)
- Array Packing
See in your lab3
- Instruction / Data Alignment
eg: compiler could auto optimize.(例如结构体会内存自动对齐)
Discussion#
- Domain Specific Language
- Manual Optimization is indispensable
- Core Affinity(亲和力)(NUMA non-uniform memory access)
- Adapts general code to local machine
- Auto - learning eg. black-box method : TVM
- You can learn something about TPU and DPU and FPGA.
HPC skill tree#
- Linux: 操作系统相关知识、Linux基本结构、 Shell使用
- 集群运维和网络管理(分布式):NFS;
- 协作开发与版本控制
- 脚本自动化(Linux shell 或者 Python)
- 带依赖程序的手动编译链接
- 并行程序设计、测试和优化
- 功耗控制与调参
如何学习#
Last update:
2023年9月27日 10:52:36
Created: 2023年7月3日 16:54:46
Created: 2023年7月3日 16:54:46