Simon Bussy

Founder & CEO @Califrais
PhD in Machine Learning

Linkedin icon Facebook icon Instagram icon Youtube icon

About me

I'm founder & CEO of Califrais, the startup that decarbonizes the food supply chain with AI. Califrais has a foot in both the supply chain and food industries: two sectors with a huge environmental impact yet very little technological progress. It's particularly exciting to bring AI innovations to those area!

Thanks to our LabCom LOPF (Large-scale Optimization of Product Flows), a unique collaborative structure with multiple academic research labs and with the support of our historical sponsors CNRS and Sorbonne Université, our mission is to invent AI-supported technological solutions to optimize large-scale food flows. We've deployed our technology in the largest fresh produce market in the world : Rungis. In this first use case, we proved that our solutions reduce food waste by a factor of 2 and CO2 emissions by 7.

I received a PhD in Machine Learning from Sorbonne University prepared at LPSM in 2019. During my PhD, I was insterested in problems related to prognosis studies in high dimension, with a particular emphasis on the survival analysis framework and the underlying applications to personalized medicine.

Honours

Doctor Norbert Marx Award 2019   (French Statistical Society)
PhD Thesis Award Daniel Schwartz 2020   (French Biometric Society)

Research interests

Statistical Learning

Statistical learning theory deals with the problem of finding a predictive function based on data.

The information era has witnessed an explosion in the collection of data in a variety of fields such as medicine, biology, marketing and finance. With it have come new theoretical and algorithmic challenges, that I find fascinating. Statistical learning is a field that precisely provides a theoretical framework for the design and analysis of predictive algorithms.

High-dimensional Statistics

Problems in which the ambient dimension is of the same order or substantially larger than the sample size.

High-dimensional statistics has become the focus of increasing attention in the modern era of big data. Today, massive data sets (with potentially thousands of variables) play an important role in almost every branch of modern human activity. I am interested in the development of new statistical methods to separate the signal from the noise.

Survival Analysis

Set of methods for analyzing data where the outcome variable is the time until the occurrence of an event of interest.

Survival analysis is the analysis of time-to-event data. Such data describe the length of time from a time origin to an endpoint of interest. I am particularly interested in designing new methods for medical applications such as prospective cohort studies with longitudinal data in high-dimensional settings.

Time series

Problems in which the data is a series of points ordered in time and the goal is usually to make a forecast for the future.

Roughly speaking, time series forecasting is the use of a model to predict future values based on previously observed values. Time series are widely used for non-stationary data, like economic, weather, stock price, or retail sales. I am particularly interested in designing new methods to model longitudinal data in high-dimensional settings.

Reinforcement learning

Problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment.

Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize a cumulative reward in a particular situation. It can be employed in various applications to find the best possible behavior or path to take in a specific situation. I am interested in designing new methods in this context as well as new applications.

Deep learning

Models composed of multiple layers to learn data representations with multiple levels of abstraction.

Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features. These methods have dramatically improved the state-of-the-art in various applications, and the research perspectives are still huge.

Papers

FLASH: a Fast joint model for Longitudinal And Survival data in High dimension

V.T. Nguyen, A. Fermanian, A. Guilloux, A. Barbieri, S. Zohar, A.S. Jannot, S. Bussy

Biometrics (2024)

Paper

Online Inventory Problems: Beyond the i.i.d. Setting with Online Convex Optimization

M. Hihat, S. Gaïffas, G. Garrigos, S. Bussy,

NeurIPS (2023)

Paper

Binacox: automatic cut-point detection in high-dimensional Cox model with applications in genetics

S. Bussy, M.Z. Alaya, A. Guilloux, A.S. Jannot

Biometrics (2021)

Paper GitHub

Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework

S. Bussy, R. Veil, V. Looten, A. Burgun, S. Gaïffas, A. Guilloux, B. Ranque, A.S. Jannot

BMC Medical Research Methodology (2019)

Paper GitHub

Binarsity: a penalization for one-hot encoded features in linear supervised learning

M.Z. Alaya, S. Bussy, S. Gaïffas, A. Guilloux

Journal of Machine Learning Research (2019)

Paper GitHub

Trajectories of Biological Values and Vital Parameters: An Observational Cohort Study of Adult Patients with Sickle Cell Disease Hospitalized for a Non-Complicated Vaso-Occlusive Crisis

R. Veil, S. Bussy, V. Looten, J.B. Arlet, J. Pouchot, A.S. Jannot, B. Ranque

Journal of Clinical Medicine (2019)

Paper GitHub

C-mix: A high-dimensional mixture model for censored durations, with applications to genetic data

Doctor Norbert Marx 2019 Award

S. Bussy, A. Guilloux, S. Gaïffas, A.S. Jannot

Statistical Methods in Medical Research (2018)

Paper GitHub

Other manuscripts

  • PhD manuscript, Laboratoire de Probabilités, Statistique et Modélisation (LPSM, UMR 8001)

    Supervised by A. Guilloux, A.S. Jannot, S. Gaïffas. Paris - France, October 2015-October 2018

    Introduction of high-dimensional interpretable machine learning models and their applications
  • Research internship, Centre de Mathématiques Appliquées of École Polytechnique,

    Supervised by A.S. Jannot, S. Gaïffas, A. Guilloux. Palaiseau - France, April-September 2015

    New machine learning techniques for medicine
  • Kaggle in class, Dreem startup, ENS Paris-Saclay,

    Paris - France, January 2015 - Mars 2015

    Prediction of slow oscillation from EEG signals
  • Research project on reinforcement learning, ENS Paris-Saclay,

    Supervised by Emilie Kaufmann. Paris - France, October 2014 - January 2015

    A movie recommendation system based on Multi-action bandits
  • Data scientist intership, Orange Silicon Valley,

    San Francisco - United States, February - August 2014

    Machine learning models to predict startups valuation trends
  • Deep learning reseach project, Telecom SudParis,

    Supervised by Jérémie Jakubowicz. Évry - France, October 2013 - January 2014

    Layer-wise training of deep generative models
  • Reseach project, Telecom SudParis,

    Supervised by Wojciech Pieczynski. Évry - France, April - June 2013

    Optimal unsupervised segmentation of pairwise Markov process

Talks, Teaching & Supervision

Education

  • 2018-2019

    Postdoctoral research position, INSERM

    We introduced a prognostic method called lights to deal with the problem of joint modeling of longitudinal data and censored durations in a high-dimensional context.

  • 2015-2018

    PhD in Statistics, Sorbonne University

    At the intersection between theory and applications, my work was focused on the design and analysis of statistical methods for high-dimensional problems, with a particular emphasis on survival analysis settings.

  • 2014-2015

    MSc in Machine Learning (MVA), ENS Paris-Saclay, Mention Très Bien

    Harmonic analysis, wavelet analysis and signal processing, optimization, information theory and pattern recognition, statistical learning and high dimensional statistics, kernel methods, reinforcement learning, graphical models, computer vision.

  • 2011-2014

    MSc in Statistics & Applied Mathematics, Télécom SudParis

    Course (grade): Maths (A), data analysis (A), probability & statistics (A), data mining (A+), numerical analysis (A), optimization (A), information theory (A+), stochastic processes (A+), Queuing theory (A+), Databases Management (A).

download cv (fr)

Contact

simon.bussy@califrais.fr simon.bussy@gmail.com
  • Califrais' Machine Learning Lab
  • 4 rue Martel
  • 75010 Paris