iph.ar
  • home
  • Misc.
  • Gaming
  • (de)Engineering
    • Using LLMs to Simulate Wild Thinking and Convergence
    • Visualizing Song Structure with Self-Similarity Matrices
    • PyHPC – Self-Guided Parallel Programming Workshop with Python
      • 00 - How This Course Was Made
      • 01 - Multiprocessing
      • 02 - Multithreading and GIL
      • 03 - MPI with mpi4py
      • 04 - GPU with PyCUDA/Numba
      • 05 - Parallel Libraries
    • Mapping Consonants as Percussion: A Small Experiment with Whisper and Audio Analysis
    • Semantic Self-Similarity or How I Split a Conversation into Scenes Using Language Models
    • Modeling the Noise: Building a Tinnitus Generator in Python
    • Making A Humble OpenGL Rotating Cube
  • IT
  • home
  • Misc.
  • Gaming
  • (de)Engineering
    • Using LLMs to Simulate Wild Thinking and Convergence
    • Visualizing Song Structure with Self-Similarity Matrices
    • PyHPC – Self-Guided Parallel Programming Workshop with Python
      • 00 - How This Course Was Made
      • 01 - Multiprocessing
      • 02 - Multithreading and GIL
      • 03 - MPI with mpi4py
      • 04 - GPU with PyCUDA/Numba
      • 05 - Parallel Libraries
    • Mapping Consonants as Percussion: A Small Experiment with Whisper and Audio Analysis
    • Semantic Self-Similarity or How I Split a Conversation into Scenes Using Language Models
    • Modeling the Noise: Building a Tinnitus Generator in Python
    • Making A Humble OpenGL Rotating Cube
  • IT

PyHPC – Self-Guided Parallel Programming Workshop with Python

July 2025

This project is a self-paced workshop designed to explore parallel and high-performance computing (HPC) concepts and tools using the Python programming language.

It's inspired by typical content from introductory graduate-level HPC courses, but adapted to be practical, flexible, and free from academic bureaucracy.

Each chapter covers a different technique to exploit parallelism at the CPU, GPU, or cluster level, with simple examples and performance comparisons.

Workshop Structure

00 - How this course was made

  • Source
    • Workflow
      • Why

        01 - Multiprocessing and Concurrent Programming Fundamentals

        • HPC landscape overview: where different techniques fit
          • Introduction to the multiprocessing module
            • Using Process, Queue, Pipe, and Pool
              • Introduction to concurrent.futures for cleaner parallel execution
                • Examples of CPU-bound tasks with performance profiling
                  • Comparison with sequential execution
                    • Synchronization and locking considerations
                      • Profiling CPU-bound vs I/O-bound tasks
                        • Proposed exercises:
                          • Parallel prime number calculation
                            • Signal filtering comparison (sequential vs parallel)

                            02 - Multithreading, Async Programming, and the GIL

                            • What is the GIL (Global Interpreter Lock) in CPython
                              • Introduction to asyncio for I/O-bound tasks
                                • The threading module: using Thread, Lock, RLock, Event, Condition
                                  • When threading is useful vs when to avoid it
                                    • Practical differences between threading, asyncio, and multiprocessing
                                      • Advanced synchronization patterns
                                        • Proposed exercises:
                                          • Multiple threads reading files
                                            • Web scraping with asyncio
                                              • Concurrent data acquisition simulation

                                              03 - Distributed Computing with MPI and Modern Alternatives

                                              • Introduction to MPI and mpi4py
                                                • mpiexec and distributed execution
                                                  • Process communication: send, recv, broadcast, scatter, gather
                                                    • Modern alternatives: joblib and Ray for distributed computing
                                                      • Comparison: MPI vs modern distributed frameworks
                                                        • Proposed exercises
                                                          • Distributed FFT computation
                                                            • Monte Carlo simulation across processes
                                                              • Fallback: joblib parallel processing if MPI setup fails
                                                                • Required infrastructure: real cluster or local simulation

                                                                04 - GPU Programming with CUDA and ROCm

                                                                • Introduction to CUDA: concept of kernels and grids
                                                                  • PyCUDA vs Numba CUDA: similarities and differences
                                                                    • First kernel with numba.cuda
                                                                      • CuPy for NumPy-like GPU operations
                                                                        • ROCm/HIP alternatives for AMD GPUs
                                                                          • Memory management: host-device transfers, memory coalescing
                                                                            • Benchmark: operations on CPU vs GPU
                                                                              • Proposed exercises
                                                                                • 2D convolution on GPU
                                                                                  • Matrix multiplication with memory optimization
                                                                                    • Signal processing kernels

                                                                                    05 - Parallel Libraries and Performance Optimization

                                                                                    • Quick overview of parallel libraries:
                                                                                      • Numba with @jit(parallel=True)
                                                                                        • Dask for large structures and automatic parallelism
                                                                                          • JAX for scientific computing and automatic differentiation
                                                                                            • PyTorch and GPU usage (torch.cuda)
                                                                                            • Performance profiling tools: time, timeit, line_profiler, py-spy
                                                                                              • Numerical precision considerations in parallel computing
                                                                                                • Choosing the right tool for different problem types
                                                                                                  • Comparative benchmark: CPU vs GPU vs distributed
                                                                                                    • Proposed exercises
                                                                                                      • Complete signal processing pipeline using multiple approaches
                                                                                                        • Performance comparison: multiprocessing vs Numba vs GPU
                                                                                                          • Real-world optimization problem

                                                                                                          License

                                                                                                          MIT – free to use, copy, and modify.

                                                                                                          ⟵ Visualizing Song Structure with Self-Similarity Matrices Mapping Consonants as Percussion: A Small Experiment with Whisper and Audio Analysis ⟶
                                                                                                          • © 2025 iph.ar