# Parallelization

PyFlowReg provides multiple parallelization backends for efficient batch processing implemented through a runtime registry system.

## Parallelization Backends

### Sequential

**Single-threaded processing, most memory-efficient**

```python
from pyflowreg.motion_correction import compensate_recording, OFOptions
from pyflowreg.motion_correction.compensate_recording import RegistrationConfig

config = RegistrationConfig(parallelization="sequential")
compensate_recording(options, config=config)
```

**When to use:**
- Debugging (easier to trace errors)
- Limited memory systems
- Small datasets

**Pros:**
- Lowest memory footprint
- Deterministic execution
- Simple error handling

**Cons:**
- Slowest processing
- No CPU parallelization

### Threading

**Parallel processing using Python threads**

```python
config = RegistrationConfig(
    parallelization="threading",
    n_jobs=8  # Number of worker threads
)
```

**When to use:**
- I/O-bound workloads
- Limited by disk/network speed
- Moderate CPU resources

**Pros:**
- Lower memory overhead than multiprocessing
- Good for I/O-bound operations
- Shared memory between threads

**Cons:**
- Limited by Python GIL for CPU-bound tasks
- Slower than multiprocessing for pure computation

### Multiprocessing

**Parallel processing using processes with shared memory (default)**

```python
config = RegistrationConfig(
    parallelization="multiprocessing",
    n_jobs=-1  # Use all available CPUs
)
```

**When to use:**
- CPU-bound workloads (recommended)
- Large multi-core systems
- Production processing

**Pros:**
- True parallel execution (bypasses GIL)
- Best performance for CPU-intensive tasks
- Shared memory for efficient data transfer

**Cons:**
- Higher memory overhead
- Slightly more complex error handling

## Choosing a Backend

PyFlowReg auto-selects the best available backend by default:

```python
# Auto-selection order: multiprocessing → threading → sequential
compensate_recording(options)  # Uses multiprocessing if available
```

Manual selection overrides auto-detection:

```python
config = RegistrationConfig(parallelization="threading")  # Force threading
```

## Backend Compatibility

Different optical flow backends support different parallelization modes due to their architecture and Python package compatibility.

### CPU Backend (`flowreg`)

The default NumPy/Numba backend supports all parallelization modes:

```python
options = OFOptions(flow_backend="flowreg")

# All of these work:
config = RegistrationConfig(parallelization="sequential")
config = RegistrationConfig(parallelization="threading")
config = RegistrationConfig(parallelization="multiprocessing")
```

### GPU Backends (`flowreg_torch`, `flowreg_cuda`)

GPU backends only support sequential execution due to package compatibility constraints:

```python
from pyflowreg.motion_correction import OFOptions
from pyflowreg.motion_correction.compensate_recording import RegistrationConfig

options = OFOptions(
    flow_backend="flowreg_torch",
    backend_params={"device": "cuda"}
)

# GPU backends require sequential
config = RegistrationConfig(parallelization="sequential")
```

**Automatic fallback:** If you request multiprocessing or threading with a GPU backend, PyFlowReg automatically falls back to sequential execution with a warning:

```python
options = OFOptions(flow_backend="flowreg_cuda")

# Requesting multiprocessing with GPU backend
config = RegistrationConfig(parallelization="multiprocessing")

# Warning: Backend 'flowreg_cuda' does not support 'multiprocessing' executor.
# Supported executors: ['sequential']. Falling back to 'sequential'.
compensate_recording(options, config=config)  # Uses sequential
```

## Configuration

### RegistrationConfig Parameters

```python
@dataclass
class RegistrationConfig:
    n_jobs: int = -1  # -1 = all cores, N = use N workers
    verbose: bool = False  # Verbose logging
    parallelization: Optional[str] = None  # None=auto, 'sequential', 'threading', or 'multiprocessing'
```

### Number of Workers

```python
config = RegistrationConfig(
    n_jobs=-1  # Use all CPU cores
)

# Or specify exact number:
config = RegistrationConfig(
    n_jobs=4  # Use 4 workers
)
```

**Guidelines:**
- `n_jobs=-1`: Use all available CPUs (recommended)
- `n_jobs=N`: Use N worker processes/threads
- Sequential ignores `n_jobs` (always 1)

### Buffer Size

The buffer size controls how many frames are read and processed per batch. This is configured in `OFOptions`, not `RegistrationConfig`:

```python
from pyflowreg.motion_correction import OFOptions

options = OFOptions(
    buffer_size=400,  # Frames per batch (default: 400)
    # ... other options
)
```

**Buffer size tradeoffs:**
- **Larger buffers** (500-1000):
  - Fewer I/O operations
  - Higher memory usage
  - Better for fast storage (SSD)
- **Smaller buffers** (50-100):
  - More frequent I/O
  - Lower memory usage
  - Better for limited RAM or slow storage (HDD)

### Complete Configuration

```python
from pyflowreg.motion_correction import compensate_recording, OFOptions
from pyflowreg.motion_correction.compensate_recording import RegistrationConfig

options = OFOptions(
    input_file="large_video.h5",
    output_path="results/",
    quality_setting="balanced",
    buffer_size=400  # Batch size from MATLAB
)

config = RegistrationConfig(
    parallelization="multiprocessing",  # Backend selection
    n_jobs=-1  # All CPU cores
)

compensate_recording(options, config=config)
```

## Default Behavior

If no config is provided, defaults are used:

```python
# These are equivalent:
compensate_recording(options)
compensate_recording(options, config=RegistrationConfig())
```

Default values:
- `n_jobs=-1` (all cores)
- `verbose=False`
- `parallelization=None` (auto-select)

## Performance Tips

### Memory Management

**For limited RAM systems:**
```python
# Reduce memory usage
config = RegistrationConfig(
    parallelization="threading",  # Lower memory than multiprocessing
    n_jobs=4  # Limit workers
)

options = OFOptions(
    buffer_size=50  # Smaller batches
)
```

**For high-RAM systems:**
```python
# Maximize throughput
config = RegistrationConfig(
    parallelization="multiprocessing",
    n_jobs=-1  # All cores
)

options = OFOptions(
    buffer_size=500  # Larger batches
)
```

### CPU Utilization

**Check CPU usage during processing:**
- Sequential: 100% on single core
- Threading: Distributed but limited by GIL
- Multiprocessing: Near 100% on all cores (best)

**Reserve cores for other tasks:**
```python
import os
n_cores = os.cpu_count()
config = RegistrationConfig(n_jobs=n_cores - 2)  # Leave 2 cores free
```

### I/O Optimization

**For SSD/fast storage:**
```python
# Multiprocessing with large buffers
config = RegistrationConfig(parallelization="multiprocessing")
options = OFOptions(buffer_size=500)
```

**For HDD/slow storage:**
```python
# Threading to avoid I/O contention
config = RegistrationConfig(parallelization="threading")
options = OFOptions(buffer_size=100)
```

## Examples

### Fast Preview Processing

```python
# Quick preview with minimal resources
options = OFOptions(
    input_file="video.h5",
    quality_setting="fast",
    buffer_size=50
)

config = RegistrationConfig(parallelization="sequential")
compensate_recording(options, config=config)
```

### Production Processing

```python
# Maximum performance for production
options = OFOptions(
    input_file="large_dataset.h5",
    quality_setting="balanced",
    buffer_size=500
)

config = RegistrationConfig(
    parallelization="multiprocessing",
    n_jobs=-1
)

compensate_recording(options, config=config)
```

### Memory-Constrained System

```python
# Optimize for limited RAM
options = OFOptions(
    input_file="video.h5",
    quality_setting="balanced",
    buffer_size=100
)

config = RegistrationConfig(
    parallelization="threading",
    n_jobs=4
)

compensate_recording(options, config=config)
```

## Backend Registration

PyFlowReg uses a runtime registry for parallelization backends. Backends auto-register on import:

```python
from pyflowreg._runtime import RuntimeContext

# Check available backends
available = RuntimeContext.get("available_parallelization", set())
print(f"Available backends: {sorted(available)}")
```

## Troubleshooting

### High Memory Usage

**Problem:** Memory usage exceeds available RAM

**Solutions:**
1. Reduce `buffer_size` in OFOptions
2. Switch to threading or sequential
3. Reduce `n_jobs`

```python
# Low-memory configuration
config = RegistrationConfig(
    parallelization="sequential",
    n_jobs=1
)
options = OFOptions(buffer_size=50)
```

### Slow Processing

**Problem:** Processing slower than expected

**Solutions:**
1. Use multiprocessing instead of threading
2. Increase `buffer_size` if RAM allows
3. Increase `n_jobs`
4. Use `quality_setting="fast"` for preview

```python
# Fast processing configuration
config = RegistrationConfig(
    parallelization="multiprocessing",
    n_jobs=-1
)
options = OFOptions(
    quality_setting="fast",
    buffer_size=500
)
```

### Process Hangs or Crashes

**Problem:** Processing hangs or workers crash

**Solutions:**
1. Switch to sequential for debugging
2. Reduce `n_jobs` to avoid resource contention
3. Check for memory issues
4. Verify input file integrity

```python
# Debug configuration
config = RegistrationConfig(
    parallelization="sequential"  # Easier to debug
)
```