CSCE 990-005: Hardware Acceleration for Machine Learning
Instructor: Arman Roohi
Time: Tue and Thu 9:30 a.m. to 10:45 a.m.
Location: TBA
Track: Systems
Course Description:
Machine learning (ML) is currently widely used in much advanced artificial intelligence (AI) applications [1]. The breakthrough of the computation ability has enabled the system to compute complicated different ML algorithms in a relatively short time, providing real-time human-machine interaction such as face detection for video surveillance, advanced driver-assistance systems (ADAS), and image recognition early cancer detection [2, 3]. Among all those applications, a high detection accuracy requires complicated ML computation, which comes at the cost of high computational complexity. This results in a high requirement on the hardware platform. Currently, most applications are implemented on general-purpose compute engines, especially graphics processing units (GPUs). However, work recently reported from both industry and academy shows a trend on the design of application-specific integrated circuit (ASIC) for ML, especially in the field of deep neural network (DNN). This course gives an overview of the hardware accelerator design, the various types of ML acceleration, and the technique used in improving the hardware computation efficiency of ML computation, especially by non-von Neumann architectures using post-CMOS technologies, including spintronic, memristor.
Course Objectives:
• HW+ML for the compute-heavy deep neural network (DNN) models of machine learning
• Foundations of ML and DL algorithms
• Compute and memory behavior of DL workloads
o Pros/cons of different compute platforms (CPU/GPU)
• Custom HW Accelerators
o Minimizing computation, data movement, memory overhead
• Co-design of ML algorithms and accelerators
o E.g., model compression/retraining for fixed-point arithmetic
o E.g., memory access strategy to reduce data movement
• Cross-layer perspective: algorithmic, architectural, and circuit-level