Reflections on Cross-Subject EEG Generalization

EEG data is notoriously personal. Your brain's electrical signals during a mental task look subtly — and sometimes dramatically — different from everyone else's. Anatomy, electrode placement, attention levels, fatigue, and dozens of other factors all imprint themselves on the signal. This makes building generalizable BCI models genuinely hard.

When I set out to build a subject-independent event recognition model using a lightweight 1D neural network, the primary challenge wasn't the architecture itself — it was thinking carefully about what generalization actually means in this context and how to honestly evaluate it.

The Evaluation Problem

Most classification papers report accuracy on a held-out test set drawn from the same distribution as training. For EEG, this means the model has likely seen data from the same subjects it's being evaluated on. The numbers look great. The real-world performance, when deployed on a new user, does not.

Leave-one-subject-out (LOSO) cross-validation became my north star. The protocol: train on N-1 subjects, test on the held-out subject, repeat for every subject, average the results. This gives you an honest estimate of how your model performs on someone it has never encountered.

LOSO is uncomfortable because it exposes how well your model truly generalizes. A model that scores 99% on within-subject splits might score 60% under LOSO. That gap is the truth about your architecture.

Architectural Choices That Mattered

The architecture I ended up with was deliberately lightweight — not because compute was limited, but because simpler models generalize better when you have limited training data per subject and high inter-subject variability.

The key choices:

Depthwise separable convolutions — dramatically reduce parameter count without sacrificing receptive field. Fewer parameters means less capacity to overfit to subject-specific patterns.
Batch normalization — helps the model adapt to distribution shifts between subjects by normalizing activations layer by layer.
Temporal feature extraction — small 1D kernels to capture local oscillatory patterns, which are more likely to be subject-independent than global signal characteristics.
Dropout at multiple layers — standard regularization, but particularly important when the effective training set size is small (N-1 subjects).

What 95.35% Actually Means

Achieving 95.35% accuracy under LOSO validation felt significant. But it's worth contextualizing: the task is specific (SSVEP-based event recognition), the electrode montage is controlled, and the stimulus protocol is fixed. Real-world BCI deployment is harder.

What the result tells us is that the architectural approach works — the combination of depthwise separable convolutions, temporal feature extraction, and careful regularization extracts patterns that are genuinely shared across subjects, rather than memorizing individual signal idiosyncrasies.

What I Would Do Differently

Looking back, I would invest more in data augmentation during training. Techniques like adding subject-simulated noise, random temporal shifts, and channel dropout would have made the model more robust to the exact kinds of variability that cause LOSO performance to drop.

I would also explore domain adaptation methods — training a shared feature extractor with explicit alignment between subject distributions. This is a more principled approach than simply hoping regularization handles the inter-subject gap.

This work was published at UCICS 2026. The subject-independent architecture was designed during my final year at the HCI Lab, University of Rajshahi.