iph.ar
  • home
  • Misc.
  • Gaming
  • (de)Engineering
    • Using LLMs to Simulate Wild Thinking and Convergence
    • Visualizing Song Structure with Self-Similarity Matrices
    • PyHPC – Self-Guided Parallel Programming Workshop with Python (WIP)
      • 00 - How This Course Was Made
      • 01 - Multiprocessing
      • 02 - Multithreading and GIL
      • 03 - MPI with mpi4py
      • 04 - GPU with PyCUDA/Numba
      • 05 - Parallel Libraries
    • Mapping Consonants as Percussion: A Small Experiment with Whisper and Audio Analysis
    • Semantic Self-Similarity or How I Split a Conversation into Scenes Using Language Models
    • Modeling the Noise: Building a Tinnitus Generator in Python
    • Making A Humble OpenGL Rotating Cube
    • Semantic Analysis Applied to "Sent From My Telephone" by Voice Actor
    • Longitudinal Sentiment Analysis of Personal Chat Logs
  • IT
  • home
  • Misc.
  • Gaming
  • (de)Engineering
    • Using LLMs to Simulate Wild Thinking and Convergence
    • Visualizing Song Structure with Self-Similarity Matrices
    • PyHPC – Self-Guided Parallel Programming Workshop with Python (WIP)
      • 00 - How This Course Was Made
      • 01 - Multiprocessing
      • 02 - Multithreading and GIL
      • 03 - MPI with mpi4py
      • 04 - GPU with PyCUDA/Numba
      • 05 - Parallel Libraries
    • Mapping Consonants as Percussion: A Small Experiment with Whisper and Audio Analysis
    • Semantic Self-Similarity or How I Split a Conversation into Scenes Using Language Models
    • Modeling the Noise: Building a Tinnitus Generator in Python
    • Making A Humble OpenGL Rotating Cube
    • Semantic Analysis Applied to "Sent From My Telephone" by Voice Actor
    • Longitudinal Sentiment Analysis of Personal Chat Logs
  • IT

PyHPC – Self-Guided Parallel Programming Workshop with Python (WIP)

July 2025

This project is a self-paced workshop designed to explore parallel and high-performance computing (HPC) concepts and tools using the Python programming language.

It's inspired by typical content from introductory graduate-level HPC courses, but adapted to be practical, flexible, and free from academic bureaucracy.

Each chapter covers a different technique to exploit parallelism at the CPU, GPU, or cluster level, with simple examples and performance comparisons.

Workshop Structure

00 - How this course was made

  • Source
  • Workflow
  • Why

01 - Multiprocessing and Concurrent Programming Fundamentals

  • HPC landscape overview: where different techniques fit
  • Introduction to the multiprocessing module
  • Using Process, Queue, Pipe, and Pool
  • Introduction to concurrent.futures for cleaner parallel execution
  • Examples of CPU-bound tasks with performance profiling
  • Comparison with sequential execution
  • Synchronization and locking considerations
  • Profiling CPU-bound vs I/O-bound tasks
  • Proposed exercises:
    • Parallel prime number calculation
    • Signal filtering comparison (sequential vs parallel)

02 - Multithreading, Async Programming, and the GIL

  • What is the GIL (Global Interpreter Lock) in CPython
  • Introduction to asyncio for I/O-bound tasks
  • The threading module: using Thread, Lock, RLock, Event, Condition
  • When threading is useful vs when to avoid it
  • Practical differences between threading, asyncio, and multiprocessing
  • Advanced synchronization patterns
  • Proposed exercises:
    • Multiple threads reading files
    • Web scraping with asyncio
    • Concurrent data acquisition simulation

03 - Distributed Computing with MPI and Modern Alternatives

  • Introduction to MPI and mpi4py
  • mpiexec and distributed execution
  • Process communication: send, recv, broadcast, scatter, gather
  • Modern alternatives: joblib and Ray for distributed computing
  • Comparison: MPI vs modern distributed frameworks
  • Proposed exercises
    • Distributed FFT computation
    • Monte Carlo simulation across processes
    • Fallback: joblib parallel processing if MPI setup fails
    • Required infrastructure: real cluster or local simulation

04 - GPU Programming with CUDA and ROCm

  • Introduction to CUDA: concept of kernels and grids
  • PyCUDA vs Numba CUDA: similarities and differences
  • First kernel with numba.cuda
  • CuPy for NumPy-like GPU operations
  • ROCm/HIP alternatives for AMD GPUs
  • Memory management: host-device transfers, memory coalescing
  • Benchmark: operations on CPU vs GPU
  • Proposed exercises
    • 2D convolution on GPU
    • Matrix multiplication with memory optimization
    • Signal processing kernels

05 - Parallel Libraries and Performance Optimization

  • Quick overview of parallel libraries:
    • Numba with @jit(parallel=True)
    • Dask for large structures and automatic parallelism
    • JAX for scientific computing and automatic differentiation
    • PyTorch and GPU usage (torch.cuda)
  • Performance profiling tools: time, timeit, line_profiler, py-spy
  • Numerical precision considerations in parallel computing
  • Choosing the right tool for different problem types
  • Comparative benchmark: CPU vs GPU vs distributed
  • Proposed exercises
    • Complete signal processing pipeline using multiple approaches
    • Performance comparison: multiprocessing vs Numba vs GPU
    • Real-world optimization problem

License

MIT – free to use, copy, and modify.

⟵ Visualizing Song Structure with Self-Similarity Matrices Mapping Consonants as Percussion: A Small Experiment with Whisper and Audio Analysis ⟶
  • © 2026 iph.ar