Architecture & Theory#
The Spiking Decision Transformer (SNN-DT) represents a paradigm shift from dense, matrix-multiplication-heavy Transformers to sparse, event-driven Spiking Neural Networks.
System Overview#
Traditional Decision Transformers (DTs) model trajectories \(\tau\) using causal masking and dense attention. SNN-DT preserves this sequence modeling capability but executes it in the spiking domain.
graph TD
subgraph "SNN-DT Block"
A[Spike Input] --> B[LIF Projection Q/K/V]
B --> C[Spiking Self-Attention]
C --> D[Dendritic Routing]
D --> E[LIF Feed-Forward]
E --> F[Output Spikes]
end
subgraph "Plasticity"
G[Local 3-Factor Rule] -.-> B
G -.-> E
end
Spiking Self-Attention (SSA)#
We replace the dot-product attention with a spike-based operation.
LIF Projections: Input spikes are projected via learnable weights into Query (\(Q\)), Key (\(K\)), and Value (\(V\)) spike trains.
Attention formulation: Instead of \(softmax(\frac{QK^T}{\sqrt{d}})\), we use an accumulation of coincidence detections regulated by the firing thresholds of the postsynaptic neurons.
LIF Neuron Dynamics#
The membrane potential \(u_t\) evolves according to:
where:
\(u_t\): Membrane potential at time \(t\).
\(\beta\): Decay factor (\(e^{-dt/\tau}\)).
\(S_t\): Discrete spike output (\(S_t \in \{0, 1\}\)).
Dendritic Routing and Sparsity#
To minimize energy, SNN-DT employs specific mechanisms to induce sparsity:
Phase-Coding: Encodes values in the timing of spikes, reducing the total number of spikes needed to represent continuous values.
Dendritic Routing: A gating mechanism that dynamically prevents activity propagation to irrelevant heads or tokens, effectively “pruning” the computation graph on-the-fly.
Learning with Plasticity#
We use a surrogate gradient method for end-to-end training, augmented by local plasticity:
This hybrid approach stabilizes training and improves generalization to unseen tasks (e.g., changes in gravity or friction in Gym environments).