High-Performance Artificial Intelligence Systems

Last Updated Date 2022-04-11

Course Sorts:Talent Cultivation Project for intelligence network technology and application ／ Project of Development on RISC-V Course
Course Introduction:

課程章節 :
章節內容

Overview of the Course: Why and How to Develop High Performance AI Systems with Hardware-Software Co-Design - class1, class2, class3, Overview of the Course
Introduction to High Performance AI Systems - class1, class2, class3, class4, class5, class6,
Performance Analysis for Deep Learning Systems - class1, class2, class3, class4, class5, class6, class7
Performance Analysis for Deep Learning Systems - Profiling Deep Software Stacks with SOFA class1, class2, class3, class4, class5
Discussion: Research Papers

參考資料 :
1. Amdahl’s Law in the Datacenter Era: A Market for Fair Processor Allocation
這篇獲得HPCA 2018的Best Paper Award，探討HPC的未來，雖然與AI關係較遠，還是頗值得關注。

2. Towards Pervasive and User Satisfactory CNN across GPU Microarchitecture
這篇獲得HPCA 2017 best paper，探討跨GPU的CNN效能問題

3. Large-Scale Hierarchical K-Means for Heterogeneous Many-Core Supercomputers
這是發表在SC 2018的論文，探討如何進行大規模的K-menas演算

4. Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures
這是發表在SC 2018的論文，剖析在SIMD架構上深度學習的計算工作

5. CosmoFlow: Using Deep Learning to Learn the Universe at Scale
這是發表在SC 2018的論文，探討如何用深度學習來研究宇宙學

6. Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search
這是即將發表在DAC 2019的Best Paper Award Candidate，探討如何自動搜尋及優化在FPGA上的類神經網路

7. RAPIDNN: In-Memory Deep Neural Network Acceleration Framework
探討如何從記憶體設計的角度加速深度學習網路

8. A Survey of Model Compression and Acceleration for Deep Neural Networks
探討如何利用壓縮的方式加速深度學習網路

9. A Survey of FPGA-Based Neural Network Inference Accelerator
探討如何設計FPGA上的深度學習網路推論加速器

10. TOWARDS FEDERATED LEARNING AT SCALE: SYSTEM DESIGN
這是Google解釋他所推出的大規模分散式深度學習的運作方法

11. SCALE-Sim: Systolic CNN Accelerator Simulator
這篇討論如何設計出一套能模擬Systolic Array的工具

12. DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING
這是較早提出利用壓縮的方式加速深度學習網路的經典論文

13. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
這篇廣泛收集各種以平行/分散式的深度學習，能夠用功讀完的話，功力一定大增

14. Performance Comparison of the Digital Neuromorphic Hardware SpiNNaker and the Neural Network Simulation Software NEST for a Full-Scale Cortical Microcircuit
Model
人工智慧不只有DNN，對於Spiking Neural Networks 有興趣的話，可以來研讀這篇論文

15. Horovod: fast and easy distributed deep learning in TensorFlow
這篇論文探討如何優化TensorFlow所建構的分散式深度學習

16. In-Datacenter Performance Analysis of a Tensor Processing Unit
這是Google發表在ISCA 2017，討論TPU架構和效能的論文

17. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
這篇是TensorFlow的出道之作，談論TensorFlow針對在異質分散式系統上進行大型機器學習的設計理念

18. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
提出在大型GPU叢集上降低通訊的軟體架構

Course Attachments

Update Time 2022-03-23 10:58:25

Download Times 0

Overview of the Course

智慧聯網-RISC-V-高效能人工智慧系統-課程概述-1.pptx(2.29 MB)

Update Time 2022-03-23 11:12:32

Download Times 0

計算機結構(異質性運算平台)

智慧聯網-RISC-V-高效能人工智慧系統-計算機結構異質性運算平台-1.rar(9.70 MB)

Course Q&A

Back to List

High-Performance Artificial Intelligence Systems

Course Attachments

Overview of the Course

計算機結構(異質性運算平台)

Course Q&A

114-117 Intellgent Chip and Design Allance

三角積分調變器之智能設計與優化

Moe Talent Cultivation Project for Advanced IC Design

前瞻製程元件庫數位設計精進