iph.ar
  • home
  • Misc.
  • Gaming
  • (de)Engineering
    • Using LLMs to Simulate Wild Thinking and Convergence
    • Visualizing Song Structure with Self-Similarity Matrices
    • PyHPC – Self-Guided Parallel Programming Workshop with Python
      • 00 - How This Course Was Made
      • 01 - Multiprocessing
        • 1.1 - HPC landscape overview
        • 1.2 - Introduction to the multiprocessing module
        • 1.3 - Using Process, Queue, Pipe, and Pool
        • 1.4 - Introduction to concurrent.futures
        • 1.5 - Examples of CPU-bound tasks with performance profiling
        • 1.6 - Comparison with sequential execution
        • 1.7 - Synchronization and locking considerations
        • 1.8 - Profiling CPU-bound vs I/O-bound tasks
        • Solution: Parallel prime number calculation
        • Solution: Signal filtering comparison
      • 02 - Multithreading and GIL
      • 03 - MPI with mpi4py
      • 04 - GPU with PyCUDA/Numba
      • 05 - Parallel Libraries
    • Mapping Consonants as Percussion: A Small Experiment with Whisper and Audio Analysis
    • Semantic Self-Similarity or How I Split a Conversation into Scenes Using Language Models
    • Modeling the Noise: Building a Tinnitus Generator in Python
    • Making A Humble OpenGL Rotating Cube
  • IT
  • home
  • Misc.
  • Gaming
  • (de)Engineering
    • Using LLMs to Simulate Wild Thinking and Convergence
    • Visualizing Song Structure with Self-Similarity Matrices
    • PyHPC – Self-Guided Parallel Programming Workshop with Python
      • 00 - How This Course Was Made
      • 01 - Multiprocessing
        • 1.1 - HPC landscape overview
        • 1.2 - Introduction to the multiprocessing module
        • 1.3 - Using Process, Queue, Pipe, and Pool
        • 1.4 - Introduction to concurrent.futures
        • 1.5 - Examples of CPU-bound tasks with performance profiling
        • 1.6 - Comparison with sequential execution
        • 1.7 - Synchronization and locking considerations
        • 1.8 - Profiling CPU-bound vs I/O-bound tasks
        • Solution: Parallel prime number calculation
        • Solution: Signal filtering comparison
      • 02 - Multithreading and GIL
      • 03 - MPI with mpi4py
      • 04 - GPU with PyCUDA/Numba
      • 05 - Parallel Libraries
    • Mapping Consonants as Percussion: A Small Experiment with Whisper and Audio Analysis
    • Semantic Self-Similarity or How I Split a Conversation into Scenes Using Language Models
    • Modeling the Noise: Building a Tinnitus Generator in Python
    • Making A Humble OpenGL Rotating Cube
  • IT

Solution: Signal filtering comparison

September 2025

Exercise: Signal filtering comparison (sequential vs parallel)

Objective: To understand the performance difference between sequential and parallel signal filtering using Python's multiprocessing module.

Instructions: - You are given a script signal_filter.py that simulates a simple signal filtering process. This script takes a list of numbers (representing a signal) and applies a simple moving average filter. - Modify the script to perform the filtering both sequentially and using multiprocessing. The signal should be divided into chunks for parallel processing. - Compare the execution time of the sequential and parallel versions for a signal of length 10,000. You can use the time module for timing.

Expected Learning Outcome: You should be able to compare the performance of sequential and parallel processing for a simple task and understand the basic principles of parallelization.

Solution

import time
import numpy as np
import multiprocessing as mp

# Heavier moving average filter
def moving_average(signal, window_size=1001):
    return np.convolve(signal, np.ones(window_size) / window_size, mode='valid')

def worker(signal_chunk, window_size):
    return moving_average(signal_chunk, window_size)

def parallel_filter(signal, window_size, num_procs):
    # Split into chunks
    chunks = np.array_split(signal, num_procs)
    with mp.Pool(processes=num_procs) as pool:
        results = pool.starmap(worker, [(chunk, window_size) for chunk in chunks])
    return np.concatenate(results)

def benchmark(signal_size, window_size=1001):
    signal = np.random.rand(signal_size).astype(np.float32)

    # Sequential
    t0 = time.time()
    moving_average(signal, window_size)
    seq_time = time.time() - t0

    print(f"\nSignal size {signal_size:,}:")
    print(f"  Sequential -> {seq_time:.4f} s")

    for procs in [2, 4, 8]:
        t0 = time.time()
        parallel_filter(signal, window_size, procs)
        par_time = time.time() - t0
        speedup = seq_time / par_time if par_time > 0 else float('inf')
        print(f"  {procs} proc -> {par_time:.4f} s (speedup: {speedup:.2f}x)")

if __name__ == "__main__":
    # Bigger sizes so parallelism has a chance to shine
    for size in [10_000, 100_000, 1_000_000, 10_000_000, 100_000_000]:
        benchmark(size, window_size=1001)

Test results

Signal size 10,000:
  Sequential -> 0.0092 s
  2 proc -> 0.2672 s (speedup: 0.03x)
  4 proc -> 0.2906 s (speedup: 0.03x)
  8 proc -> 0.4016 s (speedup: 0.02x)

Signal size 100,000:
  Sequential -> 0.0100 s
  2 proc -> 0.2511 s (speedup: 0.04x)
  4 proc -> 0.2880 s (speedup: 0.03x)
  8 proc -> 0.4311 s (speedup: 0.02x)

Signal size 1,000,000:
  Sequential -> 0.1100 s
  2 proc -> 0.3161 s (speedup: 0.35x)
  4 proc -> 0.3281 s (speedup: 0.34x)
  8 proc -> 0.4741 s (speedup: 0.23x)

Signal size 10,000,000:
  Sequential -> 1.1523 s
  2 proc -> 1.0050 s (speedup: 1.15x)
  4 proc -> 0.7846 s (speedup: 1.47x)
  8 proc -> 0.8552 s (speedup: 1.35x)

Signal size 100,000,000:
  Sequential -> 11.6183 s
  2 proc -> 8.1328 s (speedup: 1.43x)
  4 proc -> 5.1532 s (speedup: 2.25x)
  8 proc -> 4.5892 s (speedup: 2.53x)

Observations

Small signals (10³–10⁵)

Parallel is much slower because the cost of:

  • creating processes
    • splitting arrays
      • merging results

        dominates the actual computation.

        Medium signals (~10⁶)

        Parallel overhead is still heavy, so speedup is < 1×. Sequential is simpler and faster here.

        Large signals (10⁷–10⁸)

        Now the compute dominates the workload.

        • 2–4 processes give good speedup.
          • 8 processes adds more inter-process communication overhead, so gains taper off.

            Conclusions

            1. Parallelization pays off only when the task is “heavy” enough.

            For lightweight operations or small datasets, the overhead dwarfs any benefit.

            2. There’s a sweet spot in the number of processes.

            • With 4 processes, you already hit most of the parallel benefit.
              • Going from 4 → 8 processes gave diminishing returns (2.25× → 2.53×), showing overhead scaling.

                3. Improving per-core efficiency often matters more than “just adding cores.”

                • Algorithmic optimizations (vectorization, cache-friendly methods) may yield bigger gains.
                  • For CPU-bound tasks, better use of **SIMD / NumPy / Numba** often beats raw multiprocessing.

                    4. Rule of thumb for HPC students

                    If your workload fits in cache and finishes fast sequentially, don’t bother parallelizing. If it’s large enough that one core spends seconds to minutes crunching, then multiprocessing can give a worthwhile boost.

                    ⟵ Solution: Parallel prime number calculation
                    • © 2025 iph.ar