Venkat Ramaraju

software engineer @ tabapay working on all things payments.

interested in cross-lingual llms.

experience

swe iii @ tabapay jan'23 - present
sde @ amazon lab126 jun'22 - jan'23
swe @ redhat may'21 - jun'22
researcher @ compact x-ray free electron lab jan'21 - may'21
teaching assistant, cse 110 @ asu 3 semesters

education

bachelors in computer science @ arizona state university aug'18 - dec'21
graduated early with a 4.0 GPA

projects

polydb: a vector database written from scratch in go
trained an embedding model from scratch via sgns + pytorch. apiserver communicates with vector services via grpc. more training runs in progress.

at some point, i may completely overhaul the model and train it to align semantically similar sentences from different languages into similar vector spaces. this would allow users to search various documents in whatever language they would like.
polyglot: a multilingual tokenizer implemented from scratch in go via the byte-pair encoding algorithm.
achieves uniform compression and fertility across across 10 diverse scripts. training to achieving 5.0 compression in progress.
dataquest.ai: an authenticated ai natural language querying tool for documents, datasets, videos, emails, etc.
this application implements rate limiting, request caching, and connection pooling from scratch.
leverages pinecone, langchain with gpt3.5 turbo, gmail api, youtube transcription api, and stripe api integrations.
interspersed bilingual decoder (ibd): a code-mixing decoder-only model fine tuned on top of LLama-3.1.
generated hinglish code-mixed datasets using pos tagging with stanza/spacy, performed sft + dpo for alignment. working on additional tests right now.
agora: stock recommender based on public sentiment
uses VADER sentiment analysis models, daily web scrapers using selenium, yfinance api, xgboost and random forest ensemble models

big shoutout to Dr. Ajay Bansal and his PhD student James Smith for their support in elevating this project.

papers

Forecasting Stock Market Performance: An Ensemble Learning-Based Approach
Journal paper - Nominated by AIKE 2023 chairs for the IJSC 2024 edition
Forecasting Stock Market Performance: An Ensemble Learning-Based Approach
Long paper - AIKE 2023, Published in IEEE Xplore
A Sentiment Analysis Based Stock Recommendation System
Long paper - AIKE 2022, Published in IEEE Xplore
Agora: Introducing the Internet's Opinion to Traditional Stock Analysis and Prediction
Short paper - ICSC 2022, Published in IEEE Xplore

interesting (recent) reads & listens

Andrej Karpathy: Software Is Changing (Again)
1. love the llm ~= os analogy
2. start enabling your tools/projects to be easily used by agents.
3. less humans clicking around to set up your tool, more agents doing it.
The DeepSeek Documentary on Liang Wenfeng, R1 and What's Next
incredible video on the before and after of the deepseek moment in january'25
How Zepto Became India's Fastest Growing Startup
loved the quote 'build so that you get to wake up tomorrow and build again'
In-context Mixing (ICM): Code-mixed Prompts for Multilingual LLMs
around the same time this paper came out, i was playing around with a project i called "interspersed bilingual decoder" that was similar in its random replacement of certain POS. clearly, they have much stronger evals.
Multilingual Machine Translation & Evaluation for Indian Languages - @prajdabre at RespAI Lab
great summary on the existing state of multilingual llms. his RomanSetu paper is probably the most fun i've had reading a paper
Let's build the GPT Tokenizer
video from last year that i rewatched recently. it inspired me to build polyglot.
Do you think that ChatGPT can reason?
takeaway: being skeptical is essential in science
p.s. i wish i took Dr. Rao's courses during my time @ asu
Measuring Entrainment in Spontaneous Code-switched Speech
would love to run the experiments on a code-switched mix of two low-resource languages, plot language pairs against one another to quantify the "universality" of entrainment.
bg2 pod
loved the discussion on how tariffs can breed globally uncompetitive products
Make Something Heavy
an essay on the importance of creating substantial work that i found particularly eye-opening
Sarah Paine: "The War For India" on Dwarkesh Patel's podcast
i attended this live lecture in san francisco, it was incredibly informative
Narendra Modi on Lex Fridman's podcast
his wisdom on focus, meditation, and serving his people is inspiring

Page 1 of 2