About

This page describes my research, background, and hobbies.

Bio

I hold a Bachelor’s degree in Computer Engineering and work at the European Organisation for Nuclear Research (CERN). I use deep learning to address problems in high-energy physics including particle track reconstruction and multi-jet classification.

Research Interests - “ML4Science”

My interests lie in developing strategies for using the cornerstone of artificial intelligence to advance the natural sciences. I seek to work on Bayesian statistics and machine learning theory to build approaches that can deal with the growing scale, complexity, and variety of datasets in a computationally efficient manner.

I am researching graph-based approaches to particle track reconstruction (similar to the TrackML Challenge on Kaggle) - specifically the representation of 3D point cloud data as a (lower-dimension) graph followed by training a graph neural network on it, possibly conditioned on additional physical information (meta data). Problems in high-energy physics and science in general prove to be a rich testbed for statistical machine learning and Bayesian inference. It is exciting to see a growing focus on making this area more practical especially as optimization toolkits and features are released within popular frameworks.

I’ve built projects to explore a variety of domains; a few examples of these include a framework to prototype chatbots based on Jack the Reader (ACL, 2018), an automated assessment of basic Blender (3D Modeling) Assignments, and a dynamic, automated workflow for an award-winning cybersecurity tool.

Research Motivation

In 1, Mjolsness and DeCoste describe the general framework of scientific discovery: a loop involving observatory studies, hypothesis generation, experimental design, iterative testing, and feedback. Almost as if in accordance with their predictions dating two decades ago, machine learning has provided a toolkit for accelerating the scientific method. Simultaneously, it has also drawn from disciplines like quantum mechanics that provide foundational principles for techniques including functional analysis, energy-based models and derivatives of the Boltzmann distribution. John Tukey once said, “The best part about being a statistician is you get to play in everyone’s backyard.”

Deep learning has established a unique perspective to approach data-intensive stages in the scientific process with its pros and cons. While neural networks boost resource-efficiency there is the issue of interpretability since the model behaves as a ‘black box’, failing to yield a consistent rationale supporting its predictions. For science, where a formal underpinning is a prerequisite for accepting hypotheses, this poses a non-trivial issue. Science often resorts to the Bayesian perspective incorporating prior knowledge often to improve on simply “throwing” more layers and data at a model. There are proposals of physics-informed learning: conditioned on physical constraints 2 or utilising ‘physics-based’ loss functions 3. A known constraint for ML in the physical sciences is also the intractable nature of the likelihood function in complex, high-dimensional spaces. This begets strategies grounded in Bayesian statistics for instance those aimed at closely approximating the likelihood by means of sampling from the posterior using approximate Bayesian computation. Alternatively, there is a class of likelihood-free inference techniques which comes with its own set of constraints. In 4 and 5, for instance, the authors propose a neural network as a surrogate model to learn the joint likelihood ratio over a latent variable obtained from the simulation, demonstrating that this can serve as a more sample-efficient technique for certain classes of problems with an intractable likelihood.

There has been a resurgence of graph-based techniques in machine learning possibly linked to the increase in graph-structured data and compute. A recent, comprehensive review is offered by researchers at DeepMind in 6 that argues the capacity of graphs for effectively modeling inter-object relationships. As part of my work on particle track reconstruction graphical representations can encode entity-relationships into lower dimensional latent spaces; 7 presents a survey of approaches to this end. I believe, as computational feasibility no longer remains a bottleneck, that graph-based models present a promising approach to learn complex relationships over large datasets.

Knowledge Transfer: DJ Unicode

I am passionate about the need for knowledge transfer and actively work with a student-run organisation that I co-founded - DJ Unicode. Unicode was born of the need for skill development at the grassroots level in addition to the need for a rapport between college freshmen, sophomores, and juniors. Our aim is to devote time and effort towards programming and mentoring, eschewing both ‘bureaucratic’ practices and administrative logjams.

Unicode started 5 teams of 6 sophomores mentored by 4 juniors and a senior, initially restricted within the Computer Department. They worked on 5 projects (3 web, 2 app) all of which are standalone, tested, and available on Github. Today, we are a thriving community of 70+ members (including ~30 women), have teams winning hackathons, international internship offers, and selections for Google Summer of Code 2018!

Personal

  • Look up my tech articles published in the Open Source for You (OSFY) Magazine

  • I (like to think that I) am an artist.

  • Apart from cooking, and biking, I spend time:
  • I enjoy participating in hackathons where you are likely to find me scrounging food around midnight. I’m partial to a steaming cup of sweet, milky tea (also termed ‘cutting chai’ by the Indian streetside tea stalls).

  • I like to run the occasional marathon.

Sites

Feel free to browse through some of my older posts on Blogger and The CCDev Blog.