About

Bio

I am a Ph.D. student at NYU Data Science working with the Center for Social Media and Politics and collaborating with folks at Oxford University. My research deals with controlling misinformation on social networks using tools from simulation-based inference and causality. I use probabilistic programs to simulate user behavior and information propagation on social networks.

2022

  • SimPPL has been accepted to be part of the NYU Tech Venture Workshop, 2022!
  • I will present our work on studying the causal effects of Twitter’s interventions on Donald Trump’s tweets at the first Stanford Trust and Safety Research Conferenc in September 2022
  • My work with Oxford and CSMAP on studying the effects of influence operations on content ranking and recommendations was accepted for an Oral presentation (top 3/26 papers) at the AI for Agent-based Models Workshop at ICML 2022 – see y’all in Baltimore!
  • I’m interning at Twitter’s Civic Integrity team with Curren Pangler and Mridu Atray, working on misinformation detection in tweets for Summer 2022
  • After a wonderful 4 months working on SimPPL we showcased our product at Demo Day and launched a pilot with the VTDigger local news org. based in Vermont
  • Our workshop “AI for Everyone: Learnings from the Local News Challenge” was accepted at Computation + Journalism 2022, at Columbia University! I will present SimPPL at the session.
  • After remotely collaborating for nearly a year, I am spending May 2022 in the UK with Oxford’s Torr Vision Group visiting Prof. Philip Torr and Prof. Atilim Gunes Baydin to work on social networks, disinformation, and recommendation systems. One year into the original offer, we finally beat the pandemic to make it an in-person collaboration :)
  • I’ve successfully passed my Depth-qualifying Exam!

2021

  • My team, SimPPL, is part of the NYC Media Lab’s AI and Local News Challenge!
  • Early work on SimPPL: simulating interventions in social networks was invited for a presentation to
    • Twitter’s Cortex Org.
    • Facebook’s Probability Org.
  • I received an offer to be a Data Scientist Intern with Microsoft’s Azure ML Team
  • My (preliminary) work on COVID-modeling with probabilistic programs with a MS student Noah Kasmanoff has been accepted as a poster at PROBPROG 2021 at MIT
  • I received a generous grant from Google Research India to teach a (independent) 13-week Unicode ML Summer Course in Summer 2021 to students from Tier II and Tier III universities in India. Much obliged to my TAs from Unicode Research for their hard work and commitment!
    • We had a fantastic Demo Day showcasing student projects at the Unicode ML Summer Course!
  • I’m co-lead organizer (with fellow CDS Ph.D. student Angelica Chen) for the 2022 NYU AI School, expanding to over 300 students this year!
  • I’m teaching a weekend class (Jan - May 2021) at Unicode Research on Advanced Statistics based on notes from the amazing Cosma Shalizi (videos on YouTube)
  • I delivered a talk on probabilistic programming to the Flatiron Computational Biology Group
  • I’m co-organizing the NYU AI School 2021
  • I’m attending the Nordic Probabilistic AI School, 2021
  • I’ve been accepted to the ML Summer School, Taipei, 2021
  • My internship research “Open Domain Trending Hashtag Recommendation for Videos” with Adobe Research was accepted at the International Symposium for Multimedia, 2021

2019 - 2020

2018: CERN Technical Student

  • I won the Google Cloud Award at the Deep Learning Indaba for our work on ML x Particle Physics using DeepJet
  • I was selected as one of the youngest attendees and presented DeepJet at the ML Summer School, Madrid
  • I placed 2nd in the CodaLab Challenge at the ML for High-energy Physics School organised by Yandex at Oxford University
  • I was one of the youngest finalists for the $150,000 Reliance Dhirubhai Scholarship for pursuing an MBA degree at Stanford University (declined)
  • I presented our work on DeepJet at the CERN IML Workshop and other Working Group Meetings at CERN

SimPPL: Social Network Simulation

I am building SimPPL - a social network simulator to demonstrate how to combine heterogeneous datasets in a principled manner so as to create an expressive model of online social networks that is conditioned on real-world data. It is part of my ongoing research on misinformation control to highlight the applications of such a tool towards understanding the diffusion of information and the evolution of beliefs on platforms like Facebook and Twitter. There are a rich set of downstream applications of such a simulator, including interventions to curb misinformation spread, and the causal modeling of online user behavior.

I am collaborating with the Torr Vision Group at Oxford on the applications of such a simulation tool on estimating the effects of coordinated inauthentic behavior on content recommendations in social networks!

Probabilistic Programming

Here are a set of tutorials I delivered at my weekly reading group sessions with some students I mentor at a research group I founded in 2019 called Unicode Research. This is a research effort as part of the larger organization I co-founded in 2017, called DJ Unicode.

I hold a Bachelor’s degree in Computer Engineering and worked at Adobe Research on trending hashtag recommendation (accepted at ISM 2021) and the European Organisation for Nuclear Research (CERN) on particle physics. I’ve worked on graph neural networks, machine learning for high-energy physics, recommender systems, machine learning in cybersecurity, and have a working knowledge of language models and deep convolutional neural networks.

Causal Effects of Interventions on Misinformation

I am interested in examining the impact of interventions taken by social networks to limit the spread of misinformation. In recent work with Jim Bisbee at NYU CSMAP, I extend the analysis of Sanderson et. al, 2021 to analyse the causal effects of warning labels and tweet removal on Twitter, Facebook, Instagram, and Reddit. The results will be presented at the Stanford Internet Observatory’s first Trust and Safety Research Conference, 2022.

Probabilistic Programming and COVID-19 Models

This project offers an exposition of COVID-19 modeling techniques based on the ideas and problem setup highlighted in Wood et al., (2020). We define a generative model corresponding to our intuition about epidemiological modeling using the probabilistic programming framework Pyro and apply probabilistic inference to draw insights into controlling the COVID-19 pandemic through interventions. In particular, we estimate the confidence intervals for the outbreak parameters to ensure that a predetermined goal is achieved. We are not epidemiologists; the sole aim of this study is to serve as a guide to generative modeling, not to draw inference about real-world impact of policy-making for COVID-19.

Recommender Systems, Graph Neural Networks

Recommendations determine the type, ranking, and placement of most content appearing on our screen ranging from social networks to e-commerce sites and advertisements. A laser focus on personalization has led to a plethora of issues from bias and lack of interpretability to filter bubbles and echo chambers. My work at Adobe dealt with a zero-shot prediction problem, building a production-ready graph attention-network based system and a novel hashtag matching algorithm that, in combination, effectively matched trending hashtags with relevant videos for improving content discovery via all Adobe products. Part of our motivation was to develop a tool to address the content discovery problem exacerbated by poorly designed recommendations. Furthermore, I am using open-source recommendation systems in SimPPL: A Social Network Simulator with Probabilistic Programs in order to simulate content spread and shilling attacks (bad actors using fake reviews to boost virality) and stimulate research on detection and control.

ML x Particle Physics, Graph Neural Nets

I’ve also developed strategies using the cornerstone of artificial intelligence to advance the natural sciences. I used to work on graph-based approaches to particle track reconstruction (similar to the TrackML Challenge on Kaggle) - specifically using the representation of 3D point cloud data as a (lower-dimension) graph followed by training a graph neural network on it, possibly conditioned on additional physical information (meta data). Problems in high-energy physics and science in general prove to be a rich testbed for statistical machine learning and Bayesian inference. It is exciting to see a growing focus on making this area more practical especially as optimization toolkits and features are released within popular frameworks.

Natural Language Processing, 3D Modeling, Cybersecurity

As part of my Bachelor’s thesis, my teammates and I designed a framework to prototype chatbots with context-based question-answering models based on Jack the Reader (ACL, 2018).

I have also built an automated assessment tool for Blender (3D Modeling) Assignments, and a dynamic, automated workflow for an award-winning cybersecurity tool, IllusionBlack.

Knowledge Transfer: DJ Unicode

I am passionate about knowledge transfer actively working with a student-run organisation that I co-founded. Unicode was born of the need for skill development at the grassroots level in addition to the need for a rapport between college freshmen, sophomores, and juniors at universities that don’t offer such opportunities by means of the coourse structure. Our aim is to extend the ‘summer-of-code’ workflow to the rest of the year helping our students to build a strong foundational understanding of software development. I’m leading the expansion of our mentorship into teaching math and statistics for machine learning through comprehensive reading groups on standard texts in the subject.

Unicode started in 2017 with 15-20 students separated into 5 teams based on their projects. Today, we are a thriving community of 200+ members, with teams winning hackathons, students receiving international internship offers, multiple selections for Google Summer of Code each year, and alumni at Ivy League universities and FAANG companies in the USA!

Unicode Research

I founded a research arm within Unicode, focused on doing collaborative research in statistics and machine learning, with particular emphasis on AI for social good. This includes extensions of projects by students in our ML Summer Course, and ideas by Unicode students and collaborators. I was joined by Dr. Akash Srivastava from the MIT-IBM AI Lab to help teach the students about deep generative models in a palatable fashion, introducing them to probabilistic machine learning.

Ongoing research projects at Unicode Research include estimating the causal effect of mentorship on student career outcomes, social network analysis using probabilistic machine learning, and other topics in deep generative modeling.

Personal

  • Look up my tech articles published in the Open Source for You (OSFY) Magazine

  • I (like to think that I) am an artist.

  • Apart from cooking, and biking, I spend time:
  • I enjoy participating in hackathons where you are likely to find me scrounging food around midnight. I’m partial to a steaming cup of sweet, milky tea (also termed ‘cutting chai’ by the Indian streetside tea stalls).

  • I like to run the occasional marathon.

Sites

Feel free to browse through some of my older posts on Blogger and [defunct] The CCDev Blog.