Solution: Signal filtering comparison
Exercise: Signal filtering comparison (sequential vs parallel)
Objective: To understand the performance difference between sequential and parallel signal filtering using Python's multiprocessing module.
Instructions:
- You are given a script signal_filter.py that simulates a
simple signal filtering process. This script takes a list of numbers
(representing a signal) and applies a simple moving average filter.
- Modify the script to perform the filtering both sequentially and using
multiprocessing. The signal should be divided into chunks for parallel
processing.
- Compare the execution time of the sequential and parallel versions for
a signal of length 10,000. You can use the time module for timing.
Expected Learning Outcome: You should be able to compare the performance of sequential and parallel processing for a simple task and understand the basic principles of parallelization.
Solution
import time
import numpy as np
import multiprocessing as mp
# Heavier moving average filter
def moving_average(signal, window_size=1001):
return np.convolve(signal, np.ones(window_size) / window_size, mode='valid')
def worker(signal_chunk, window_size):
return moving_average(signal_chunk, window_size)
def parallel_filter(signal, window_size, num_procs):
# Split into chunks
chunks = np.array_split(signal, num_procs)
with mp.Pool(processes=num_procs) as pool:
results = pool.starmap(worker, [(chunk, window_size) for chunk in chunks])
return np.concatenate(results)
def benchmark(signal_size, window_size=1001):
signal = np.random.rand(signal_size).astype(np.float32)
# Sequential
t0 = time.time()
moving_average(signal, window_size)
seq_time = time.time() - t0
print(f"\nSignal size {signal_size:,}:")
print(f" Sequential -> {seq_time:.4f} s")
for procs in [2, 4, 8]:
t0 = time.time()
parallel_filter(signal, window_size, procs)
par_time = time.time() - t0
speedup = seq_time / par_time if par_time > 0 else float('inf')
print(f" {procs} proc -> {par_time:.4f} s (speedup: {speedup:.2f}x)")
if __name__ == "__main__":
# Bigger sizes so parallelism has a chance to shine
for size in [10_000, 100_000, 1_000_000, 10_000_000, 100_000_000]:
benchmark(size, window_size=1001)
Test results
Signal size 10,000:
Sequential -> 0.0092 s
2 proc -> 0.2672 s (speedup: 0.03x)
4 proc -> 0.2906 s (speedup: 0.03x)
8 proc -> 0.4016 s (speedup: 0.02x)
Signal size 100,000:
Sequential -> 0.0100 s
2 proc -> 0.2511 s (speedup: 0.04x)
4 proc -> 0.2880 s (speedup: 0.03x)
8 proc -> 0.4311 s (speedup: 0.02x)
Signal size 1,000,000:
Sequential -> 0.1100 s
2 proc -> 0.3161 s (speedup: 0.35x)
4 proc -> 0.3281 s (speedup: 0.34x)
8 proc -> 0.4741 s (speedup: 0.23x)
Signal size 10,000,000:
Sequential -> 1.1523 s
2 proc -> 1.0050 s (speedup: 1.15x)
4 proc -> 0.7846 s (speedup: 1.47x)
8 proc -> 0.8552 s (speedup: 1.35x)
Signal size 100,000,000:
Sequential -> 11.6183 s
2 proc -> 8.1328 s (speedup: 1.43x)
4 proc -> 5.1532 s (speedup: 2.25x)
8 proc -> 4.5892 s (speedup: 2.53x)
Observations
Small signals (10³–10⁵)
Parallel is much slower because the cost of:
- creating processes
- splitting arrays
- merging results
dominates the actual computation.
Medium signals (~10⁶)
Parallel overhead is still heavy, so speedup is < 1×. Sequential is simpler and faster here.
Large signals (10⁷–10⁸)
Now the compute dominates the workload.
- 2–4 processes give good speedup.
- 8 processes adds more inter-process communication overhead, so gains taper off.
Conclusions
1. Parallelization pays off only when the task is “heavy” enough.
For lightweight operations or small datasets, the overhead dwarfs any benefit.
2. There’s a sweet spot in the number of processes.
- With 4 processes, you already hit most of the parallel benefit.
- Going from 4 → 8 processes gave diminishing returns (2.25× → 2.53×), showing overhead scaling.
3. Improving per-core efficiency often matters more than “just adding cores.”
- Algorithmic optimizations (vectorization, cache-friendly methods) may yield bigger gains.
- For CPU-bound tasks, better use of **SIMD / NumPy / Numba** often beats raw multiprocessing.
4. Rule of thumb for HPC students
If your workload fits in cache and finishes fast sequentially, don’t bother parallelizing. If it’s large enough that one core spends seconds to minutes crunching, then multiprocessing can give a worthwhile boost.