On the trade-off between flatness and optimization in distributed learning