Venkat Ramaraju
engineer @ tabapay working on all things payments.
interested in cross-lingual llms.
experience
- swe iii @ tabapay jan'23 - present
- sde @ amazon lab126 jun'22 - jan'23
- swe @ redhat may'21 - jun'22
- researcher @ compact x-ray free electron lab jan'21 - may'21
- teaching assistant, cse 110 @ asu 3 semesters
education
-
bachelors in computer science @ arizona state university
aug'18 - dec'21
graduated early with a 4.0 GPA
projects
some large efforts, some weekend hacking projects
-
polydb: a vector database written from scratch in go
trained an embedding model from scratch via sgns + pytorch. apiserver communicates with vector services via grpc. more training runs in progress.
at some point, i may completely overhaul the model and train it to align semantically similar sentences from different languages into similar vector spaces. this would allow users to search various documents in whatever language they would like.
-
flowcast: an xgboost model that predicts 15-minute net bike flow for lyft bike stations in the bay area
1. trained an xgboost model on 4 years of lyft bike rides to predict net bike flow throughout the day for each station based on weather, day/time and other signals.
2. achieved a MAE of 1.07 on the validation set.
3. built a fullstack app (fastapi + react) to interactively run model inference.
4. need to increase feature vectors, perhaps adding information about ongoing events in the area of the station. -
polyglot: a multilingual tokenizer implemented from scratch in go via the byte-pair encoding algorithm
achieves uniform compression and fertility across across 10 diverse scripts. training to achieving 5.0 compression in progress.
-
venkbot: a personal agentic toolkit to automate mundane tasks in my life
whatsapp chat (twilio) hits my self-hosted server; an llm-backed dispatcher invokes the right mix of bespoke tools + mcps to perform the task.
-
dataquest.ai: an authenticated ai natural language querying tool for documents, datasets, videos, emails, etc.
this application implements rate limiting, request caching, and connection pooling from scratch.leverages pinecone, langchain with gpt3.5 turbo, gmail api, youtube transcription api, and stripe api integrations.
-
whaletracker: realtime whale trade tracker with polymarket websockets
dynamically discovers and subscribes to new markets, computes a spread asymmetry pressure metric, stores data in redis, sends email alerts on threshold breach
-
fb-finetuned: finetuned gpt-oss-20b on 1.5 yrs of my facebook texts to learn my texting style
1. built an dataset generation agent with langgraph + ollama (llamab3.2b)
2. finetuned gpt-oss-20b with peft + lora (shoutout unsloth!)
3. will create a stt pipeline soon with the finetuned model -
zillow-bot: a bot that emails you new weekly zillow postings based on your search criteria
1. rapid api for access to zillow data
2. s3 to store reports
3. weekly cronjob set up with github actions for workflow automation -
interspersed bilingual decoder (ibd): a code-mixing decoder-only model fine tuned on top of LLama-3.1.
generated hinglish code-mixed datasets using pos tagging with stanza/spacy, performed sft + dpo for alignment.
-
agora: stock recommender based on public sentiment
uses VADER sentiment analysis models, daily web scrapers using selenium, yfinance api, xgboost and random forest ensemble models
big shoutout to Dr. Ajay Bansal and his PhD student James Smith for their support in elevating this project.
papers
-
Forecasting Stock Market Performance: An Ensemble Learning-Based Approach
Journal paper - Nominated by AIKE 2023 chairs for the IJSC 2024 edition
-
Forecasting Stock Market Performance: An Ensemble Learning-Based Approach
Long paper - AIKE 2023, Published in IEEE Xplore
-
A Sentiment Analysis Based Stock Recommendation System
Long paper - AIKE 2022, Published in IEEE Xplore
-
Agora: Introducing the Internet's Opinion to Traditional Stock Analysis and Prediction
Short paper - ICSC 2022, Published in IEEE Xplore