software engineer @ tabapay working on all things payments.
interested in cross-lingual llms.
graduated early with a 4.0 GPA
trained an embedding model from scratch via sgns + pytorch. apiserver communicates with vector services via grpc. more training runs in progress.
at some point, i may completely overhaul the model and train it to align semantically similar sentences from different languages into similar vector spaces. this would allow users to search various documents in whatever language they would like.
achieves uniform compression and fertility across across 10 diverse scripts. training to achieving 5.0 compression in progress.
leverages pinecone, langchain with gpt3.5 turbo, gmail api, youtube transcription api, and stripe api integrations.
generated hinglish code-mixed datasets using pos tagging with stanza/spacy, performed sft + dpo for alignment. working on additional tests right now.
uses VADER sentiment analysis models, daily web scrapers using selenium, yfinance api, xgboost and random forest ensemble models
big shoutout to Dr. Ajay Bansal and his PhD student James Smith for their support in elevating this project.
Journal paper - Nominated by AIKE 2023 chairs for the IJSC 2024 edition
Long paper - AIKE 2023, Published in IEEE Xplore
Long paper - AIKE 2022, Published in IEEE Xplore
Short paper - ICSC 2022, Published in IEEE Xplore
1. love the llm ~= os analogy
2. start enabling your tools/projects to be easily used by agents.
3. less humans clicking around to set up your tool, more agents doing it.
incredible video on the before and after of the deepseek moment in january'25
loved the quote 'build so that you get to wake up tomorrow and build again'
around the same time this paper came out, i was playing around with a project i called "interspersed bilingual decoder" that was similar in its random replacement of certain POS. clearly, they have much stronger evals.
great summary on the existing state of multilingual llms. his RomanSetu paper is probably the most fun i've had reading a paper
video from last year that i rewatched recently. it inspired me to build polyglot.
takeaway: being skeptical is essential in science
p.s. i wish i took Dr. Rao's courses during my time @ asu
would love to run the experiments on a code-switched mix of two low-resource languages, plot language pairs against one another to quantify the "universality" of entrainment.
loved the discussion on how tariffs can breed globally uncompetitive products
an essay on the importance of creating substantial work that i found particularly eye-opening
i attended this live lecture in san francisco, it was incredibly informative
his wisdom on focus, meditation, and serving his people is inspiring