Skip to content
View asatpathy314's full-sized avatar

Block or report asatpathy314

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
asatpathy314/README.md

email: {lastname}(dot){firstname}001(at)gmail(dot)com

who am i

i'm a cs and math double major at uva (class of 2027). i really love solving puzzles of all kinds. i do work across a variety of different fields, everything from typical full-stack work at Forge (where i'm director of engineering) to replicating mech interp papers for my research role at uva.

i've published at emnlp 2024 and aaai 2026, and my current research is on calibrating work on introspective awareness in large language models.

i used to be an avid ctf player and you may find some of my writeups in the archive on my blog and in my github repository graveyard. i still think they're really fun but other things are more interesting to me.

i'll be interning at google in sunnyvale this summer so if you're in the area feel free to send me an email and we can connect and grab a coffee.

projects

  1. sparse malicious finetuning - how small amounts of malicious supervised fine-tuning (SFT) can flip safety-aligned LMs from refusal to compliance. this result essentially got published in a much more robust form by anthropic while we were working on it, but it was still a fun project.
  2. spmspmmul - the code and writeup for how we wrote a really fast sparse-matrix multiplication kernel
  3. zero-shot-realignment - we replicate the results from here and here onto gemma-4-e4b-it for the first time.

Pinned Loading

  1. zero-shot-realignment zero-shot-realignment Public

    Studying the transferability of EM corrections on small model organisms.

    Python

  2. gender-bias-interpretability gender-bias-interpretability Public

    We try to answer if RLHF removes or suppresses gender-bias circuits.

    Jupyter Notebook

  3. open-introspection open-introspection Public

    Rigorous replication of Anthropic's recent introspection experimentation (Lindsey, 2025). We replicate results to Llama-3.3-70B-instruct, and Llama-3.3-405B-instruct.

    Jupyter Notebook

  4. spmspmmul spmspmmul Public

    PoC for highly efficient GPU-accelerated sparse-matrix-sparse-matrix multiplication.

    Cuda

  5. habichuela.dev habichuela.dev Public

    Portfolio website.

    Astro

  6. sparse-malicious-sft sparse-malicious-sft Public

    We study how small amounts of malicious supervised fine-tuning (SFT) can flip safety-aligned LMs from refusal to compliance.

    Python