Federated Online Adaptation

for Deep Stereo

CVPR 2024

Matteo Poggi
Fabio Tosi

University of Bologna
Paper Supplement Poster Video Code

Federated adaptation in challenging environments. When facing a domain very different from those observed at training -- e.g., nighttime images (a) -- stereo models suffer drops in accuracy (b). By enabling online adaptation (c) the network can improve its predictions, at the expense of decimating the framerate. In our federated framework, the model can demand the adaptation process to the cloud, to enjoy its benefits while maintaining the original processing speed (d).


"We introduce a novel approach for adapting deep stereo networks in a collaborative manner. By building over principles of federated learning, we develop a distributed framework allowing for demanding the optimization process to a number of clients deployed in different environments. This makes it possible, for a deep stereo network running on resourced-constrained devices, to capitalize on the adaptation process carried out by other instances of the same architecture, and thus improve its accuracy in challenging environments even when it cannot carry out adaptation on its own. Experimental results show how federated adaptation performs equivalently to on-device adaptation, and even better when dealing with challenging environments."


1 - Online Adaptation for Stereo

  • Full Adaptation: For any incoming stereo pair \(b_t\), the network predicts a disparity map (or multiple, depending on the design) according to current weights \(w_t\). Subsequently, it updates them by minimizing a loss function, typically the sum of multiple terms \(\ell_i\)

  • \( w_{t+1} \leftarrow w_t - \eta \triangledown \sum_i\ell_i(w_t,b_t) \)
  • Modular Adaptation: Tonioni et al. (CVPR 2019) introduced MADNet, a network made of 5 encoder-decoder blocks predicting disparity maps at different scales. For any adaptation step \(t\), a block \(i\) is sampled according to a probability distribution, then only the corresponding output is used to compute the loss and optimize the subset of weights \(w_t[i]\):

  • \( i = \text{sample}( \text{softmax}(H) ) \)
    \( w_{t+1}[i] \leftarrow w_t[i] - \eta \triangledown \ell_i(w_t,b_t)[i] \)

2 - Federated Adaptation

We define a set of active nodes \(A\), capable of adapting independently, and other listening clients \(C\) which demand the adaptation process to the former. The two categories are managed by a central server, in charge of receiving updated weights and distributing them to the listening nodes.

  • FedFULL: The server runs a loop during which it waits for updated weights transmitted by the active clients. Once it has received the updates from each active client, the server aggregates such updates by computing the average of the weights as in FedAvg (McMahan et al., PMLR 2017) and dispatches the updated model to clients \(C\). Clients \(A\) send their updates periodically, after they perform \(T\) steps of adaptation. This way, \(C\) receive updates to their weights and improve their accuracy, without actively running any GPU-intensive extra computation. However, significant data traffic between \(A\), the server, and \(C\) is introduced, proportional to the number of parameters in the stereo network, the number of clients, and the updates interval \(T\).

  • FedMAD: At each adaptation step, the client keeps track of the blocks it updates which could be some or all of them. Then, it samples a single block according to a probability distribution of the most updated blocks, sends it solely to the server, and decays its number of updates. On the server side, averaging is performed only for the subset of blocks received.

Qualitative Results

KITTI - Residential sequence

DrivingStereo - Rainy sequence

DSEC - Night#4 sequence


		  author    = {Poggi, Matteo and Tosi, Fabio},
		  title     = {Federated Online Adaptation for Deep Stereo},
		  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
		  month     = {June},
		  year      = {2024},