Simon Bussy

Leading Machine Learning Research @Califrais

I am leading the Machine Learning Research Lab of Califrais, the startup I co-founded which revolutionizes wholesale, fresh & local produce procurement for restaurants thanks to advanced tech tools & automation, and eco-friendly mind. I manage the web-dev team and data science lab. Califrais has a foot in both the supply chain and food industries: two industries with a huge environmental impact yet very little technological progress. It's particularly exciting to be able to bring AI innovations to those industries!
My research focuses on high-dimensional statistics, Machine Learning and Reinforcement Learning.

I received my PhD in Statistics at LPSM (Sorbonne University) on January 16th 2019 under the supervision of Pr. Agathe Guilloux, Dr. Anne-Sophie Jannot (MCU-PH) and Pr. Stéphane Gaïffas. During my PhD, I was insterested in problems related to prognosis studies in high dimension, with a particular emphasis on the survival analysis framework and the underlying applications to personalized medicine.
I received the Norbert Marx 2019 price from SFdS for my work on the C-mix model.

download cv (fr)

Education

  • 2015-2018

    PhD in Machine Learning, UPMC - Sorbonne Universities

    At the intersection between theory and applications, my work focuses on the design and analysis of statistical methods for high-dimensional problems, with a particular emphasis on survival analysis settings. I had the chance to collaborate on various exciting projects.

  • 2014-2015

    ENS Paris-Saclay, MVA Master, Machine Learning, Mathematics, Statistics, Computer Science (Mention Très Bien)

    Harmonic analysis, wavelet analysis and signal processing, optimization, information theory and pattern recognition, statistical learning and high dimensional statistics, compressed sensing, bayesian networks, kernel methods, reinforcement learning, graphical models, computer vision.

  • 2011-2014

    Télécom SudParis, Engineer's degree, Statistics & Data Mining

    Course (grade): Maths (A), data analysis (A), probability & statistics (A), data mining (A+), numerical analysis (A), optimization (A), information theory (A+), stochastic processes (A+), Queuing theory (A+), Databases Management (A).

Research interests

Statistical Learning

Statistical learning theory deals with the problem of finding a predictive function based on data.

The information era has witnessed an explosion in the collection of data in a variety of fields such as medicine, biology, marketing and finance. With it have come new theoretical and algorithmic challenges, that I find fascinating. Statistical learning is a field that precisely provides a theoretical framework for the design and analysis of predictive algorithms.

High-dimensional Statistics

Problems in which the ambient dimension is of the same order or substantially larger than the sample size.

High-dimensional statistics has become the focus of increasing attention in the modern era of big data. Today, massive data sets (with potentially thousands of variables) play an important role in almost every branch of modern human activity. I am interested in the development of new statistical methods to separate the signal from the noise.

Survival Analysis

Set of methods for analyzing data where the outcome variable is the time until the occurrence of an event of interest.

Survival analysis is the analysis of time-to-event data. Such data describe the length of time from a time origin to an endpoint of interest. I am particularly interested in designing new methods for medical applications such as prospective cohort studies with longitudinal data in high-dimensional settings.

Time series

Problems in which the data is a series of points ordered in time and the goal is usually to make a forecast for the future.

Roughly speaking, time series forecasting is the use of a model to predict future values based on previously observed values. Time series are widely used for non-stationary data, like economic, weather, stock price, or retail sales. I am particularly interested in designing new methods to model longitudinal data in high-dimensional settings.

Reinforcement learning

Problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment.

Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize a cumulative reward in a particular situation. It can be employed in various applications to find the best possible behavior or path to take in a specific situation. I am interested in designing new methods in this context as well as new applications.

Deep learning

Models composed of multiple layers to learn data representations with multiple levels of abstraction.

Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features. These methods have dramatically improved the state-of-the-art in various applications, and the research perspectives are still huge.

Papers

Binacox: automatic cut-point detection in high-dimensional Cox model with applications in genetics

S. Bussy, M.Z. Alaya, A. Guilloux, A.S. Jannot

Preprint (2019)

Paper GitHub

Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework

S. Bussy, R. Veil, V. Looten, A. Burgun, S. Gaïffas, A. Guilloux, B. Ranque, A.S. Jannot

BMC Medical Research Methodology (2019)

Paper GitHub

Binarsity: a penalization for one-hot encoded features in linear supervised learning

M.Z. Alaya, S. Bussy, S. Gaïffas, A. Guilloux

Journal of Machine Learning Research (2019)

Paper GitHub

Trajectories of Biological Values and Vital Parameters: An Observational Cohort Study of Adult Patients with Sickle Cell Disease Hospitalized for a Non-Complicated Vaso-Occlusive Crisis

R. Veil, S. Bussy, V. Looten, J.B. Arlet, J. Pouchot, A.S. Jannot, B. Ranque

Journal of Clinical Medicine (2019)

Paper GitHub

C-mix: A high-dimensional mixture model for censored durations, with applications to genetic data

S. Bussy, A. Guilloux, S. Gaïffas, A.S. Jannot

Statistical Methods in Medical Research (2018)

Doctor Norbert Marx 2019 price

Paper GitHub

Other manuscripts

  • PhD manuscript, Laboratoire de Probabilités, Statistique et Modélisation (LPSM, UMR 8001)

    Supervised by A. Guilloux, A.S. Jannot, S. Gaïffas. Paris - France, October 2015-October 2018

    Introduction of high-dimensional interpretable machine learning models and their applications
  • Research internship, Centre de Mathématiques Appliquées of École Polytechnique,

    Supervised by A.S. Jannot, S. Gaïffas, A. Guilloux. Palaiseau - France, April-September 2015

    New machine learning techniques for medicine
  • Kaggle in class, Dreem startup, ENS Paris-Saclay,

    Paris - France, January 2015 - Mars 2015

    Prediction of slow oscillation from EEG signals
  • Research project on reinforcement learning, ENS Paris-Saclay,

    Supervised by Emilie Kaufmann. Paris - France, October 2014 - January 2015

    A movie recommendation system based on Multi-action bandits
  • Data scientist intership, Orange Silicon Valley,

    San Francisco - United States, February - August 2014

    Machine learning models to predict startups valuation trends
  • Deep learning reseach project, Telecom SudParis,

    Supervised by Jérémie Jakubowicz. Évry - France, October 2013 - January 2014

    Layer-wise training of deep generative models
  • Reseach project, Telecom SudParis,

    Supervised by Wojciech Pieczynski. Évry - France, April - June 2013

    Optimal unsupervised segmentation of pairwise Markov process

Talks, Teaching & Supervision

Contact

simon.bussy@califrais.com simon.bussy@gmail.com
  • Califrais Machine Learning Lab
  • 6 Rue Jean Jaurès
  • 93170 Bagnolet