High-Performance Artificial Intelligence Systems
- Course Sorts:Talent Cultivation Project for intelligence network technology and application / Project of Development on RISC-V Course
- Course Introduction:
-
Overview of the Course: Why and How to Develop High Performance AI Systems with Hardware-Software Co-Design - class1, class2, class3,
Overview of the Course
- Introduction to High Performance AI Systems - class1, class2, class3, class4, class5, class6,
- Performance Analysis for Deep Learning Systems - class1, class2, class3, class4, class5, class6, class7
- Performance Analysis for Deep Learning Systems - Profiling Deep Software Stacks with SOFA class1, class2, class3, class4, class5
- Discussion: Research Papers
課程章節 :
章節內容
參考資料 :
1. Amdahl’s Law in the Datacenter Era: A Market for Fair Processor Allocation
這篇獲得HPCA 2018的Best Paper Award,探討HPC的未來,雖然與AI關係較遠,還是頗值得關注。
2. Towards Pervasive and User Satisfactory CNN across GPU Microarchitecture
這篇獲得HPCA 2017 best paper,探討跨GPU的CNN效能問題
3. Large-Scale Hierarchical K-Means for Heterogeneous Many-Core Supercomputers
這是發表在SC 2018的論文,探討如何進行大規模的K-menas演算
4. Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures
這是發表在SC 2018的論文,剖析在SIMD架構上深度學習的計算工作
5. CosmoFlow: Using Deep Learning to Learn the Universe at Scale
這是發表在SC 2018的論文,探討如何用深度學習來研究宇宙學
6. Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search
這是即將發表在DAC 2019的Best Paper Award Candidate,探討如何自動搜尋及優化在FPGA上的類神經網路
7. RAPIDNN: In-Memory Deep Neural Network Acceleration Framework
探討如何從記憶體設計的角度加速深度學習網路
8. A Survey of Model Compression and Acceleration for Deep Neural Networks
探討如何利用壓縮的方式加速深度學習網路
9. A Survey of FPGA-Based Neural Network Inference Accelerator
探討如何設計FPGA上的深度學習網路推論加速器
10. TOWARDS FEDERATED LEARNING AT SCALE: SYSTEM DESIGN
這是Google解釋他所推出的大規模分散式深度學習的運作方法
11. SCALE-Sim: Systolic CNN Accelerator Simulator
這篇討論如何設計出一套能模擬Systolic Array的工具
12. DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING
這是較早提出利用壓縮的方式加速深度學習網路的經典論文
13. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
這篇廣泛收集各種以平行/分散式的深度學習,能夠用功讀完的話,功力一定大增
14. Performance Comparison of the Digital Neuromorphic Hardware SpiNNaker and the Neural Network Simulation Software NEST for a Full-Scale Cortical Microcircuit
Model
人工智慧不只有DNN,對於Spiking Neural Networks 有興趣的話,可以來研讀這篇論文
15. Horovod: fast and easy distributed deep learning in TensorFlow
這篇論文探討如何優化TensorFlow所建構的分散式深度學習
16. In-Datacenter Performance Analysis of a Tensor Processing Unit
這是Google發表在ISCA 2017,討論TPU架構和效能的論文
17. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
這篇是TensorFlow的出道之作,談論TensorFlow針對在異質分散式系統上進行大型機器學習的設計理念
18. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
提出在大型GPU叢集上降低通訊的軟體架構
Course Attachments
Overview of the Course
- 智慧聯網-RISC-V-高效能人工智慧系統-課程概述-1.pptx(2.29 MB)