Binacox: automatic cut-point detection in high-dimensional Cox model with applications in genetics
S. Bussy, M.Z. Alaya, A. Guilloux, A.S. Jannot
PhD in Statistics
Co-founder & CTO @Califrais
Research fellow @inserm
I am leading the Machine Learning Research Lab of Califrais,
the startup which revolutionizes wholesale, fresh & local
produce procurement thanks to advanced tech tools & automation, and eco-friendly
mind. I manage the web-dev team and data science lab.
Califrais has a foot in both the
supply chain and food industries: two sectors with a huge environmental impact yet very little
technological progress. It's particularly exciting to bring AI innovations to those
My research focuses on high-dimensional statistics, Machine Learning and Reinforcement Learning.
I am very exciting to launch the LabCom LOPF (Large-scale Optimization of Product Flows) in March 2021, in partnership with the LPSM, CNRS and Sorbonne Université. Our goal will be to scale up our AI technology at the Rungis market level, and beyond.
I received a PhD in Statistics from Sorbonne University prepared at LPSM in 2019 under the supervision of Pr. Agathe Guilloux, Dr. Anne-Sophie Jannot (MCU-PH) and Pr. Stéphane Gaïffas. During my PhD, I was insterested in problems related to prognosis studies in high dimension, with a particular emphasis on the survival analysis framework and the underlying applications to personalized medicine.
Doctor Norbert Marx Award 2019
(French Statistical Society)
PhD Thesis Award Daniel Schwartz 2020 (French Biometric Society)
Statistical learning theory deals with the problem of finding a predictive function based on data.
The information era has witnessed an explosion in the collection of data in a variety of fields such as medicine, biology, marketing and finance. With it have come new theoretical and algorithmic challenges, that I find fascinating. Statistical learning is a field that precisely provides a theoretical framework for the design and analysis of predictive algorithms.
Problems in which the ambient dimension is of the same order or substantially larger than the sample size.
High-dimensional statistics has become the focus of increasing attention in the modern era of big data. Today, massive data sets (with potentially thousands of variables) play an important role in almost every branch of modern human activity. I am interested in the development of new statistical methods to separate the signal from the noise.
Set of methods for analyzing data where the outcome variable is the time until the occurrence of an event of interest.
Survival analysis is the analysis of time-to-event data. Such data describe the length of time from a time origin to an endpoint of interest. I am particularly interested in designing new methods for medical applications such as prospective cohort studies with longitudinal data in high-dimensional settings.
Problems in which the data is a series of points ordered in time and the goal is usually to make a forecast for the future.
Roughly speaking, time series forecasting is the use of a model to predict future values based on previously observed values. Time series are widely used for non-stationary data, like economic, weather, stock price, or retail sales. I am particularly interested in designing new methods to model longitudinal data in high-dimensional settings.
Problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment.
Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize a cumulative reward in a particular situation. It can be employed in various applications to find the best possible behavior or path to take in a specific situation. I am interested in designing new methods in this context as well as new applications.
Models composed of multiple layers to learn data representations with multiple levels of abstraction.
Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features. These methods have dramatically improved the state-of-the-art in various applications, and the research perspectives are still huge.
R. Veil, S. Bussy, V. Looten, J.B. Arlet, J. Pouchot, A.S. Jannot, B. Ranque
Journal of Clinical Medicine (2019)
Doctor Norbert Marx 2019 Award
S. Bussy, A. Guilloux, S. Gaïffas, A.S. Jannot
Statistical Methods in Medical Research (2018)
Supervised by A. Guilloux, A.S. Jannot, S. Gaïffas. Paris - France, October 2015-October 2018
Supervised by A.S. Jannot, S. Gaïffas, A. Guilloux. Palaiseau - France, April-September 2015
Paris - France, January 2015 - Mars 2015
Supervised by Emilie Kaufmann. Paris - France, October 2014 - January 2015
San Francisco - United States, February - August 2014
Supervised by Jérémie Jakubowicz. Évry - France, October 2013 - January 2014
Supervised by Wojciech Pieczynski. Évry - France, April - June 2013
We introduced a prognostic method called lights to deal with the problem of joint modeling of longitudinal data and censored durations in a high-dimensional context.
At the intersection between theory and applications, my work was focused on the design and analysis of statistical methods for high-dimensional problems, with a particular emphasis on survival analysis settings.
Harmonic analysis, wavelet analysis and signal processing, optimization, information theory and pattern recognition, statistical learning and high dimensional statistics, kernel methods, reinforcement learning, graphical models, computer vision.
Course (grade): Maths (A), data analysis (A), probability & statistics (A), data mining (A+), numerical analysis (A), optimization (A), information theory (A+), stochastic processes (A+), Queuing theory (A+), Databases Management (A).