Contextual stochastic optimization has become popular for designing efficient data-driven algorithms. However, most approaches struggle to scale with large combinatorial problems. Recently, policies encoded as neural networks with a combinatorial optimization layer have demonstrated their efficiency on various large-scale combinatorial applications when trained in a decision-focused manner. Until now, such policies have been trained using supervised learning, often based on Fenchel-Young losses, and considered as heuristics. We instead focus on risk minimization. By leveraging the connection between Fenchel-Young losses and Bregman divergences, we introduce a deep learning-compatible algorithm and show that it converges to the optimum of the empirical risk minimization problem in certain settings. This new approach has two advantages. First, numerical experiments show that it learns better policies. Second, we prove some generalization properties that provide upper bounds on the optimality gap, which hold in expectation.