AVIKSHITH
REDDY
YELAKONDA
DATA SCIENCE · MACHINE LEARNING · AI
DATA SCIENTIST · ML ENGINEER · APPLIED AI

PRODUCTION ML · ANALYTICS · GENAI SYSTEMS

I build production-grade machine learning, analytics, and generative AI systems that turn messy data into decision-ready products: predictive models, RAG pipelines, experimentation platforms, and monitored deployment workflows.

Python SQL LLMs RAG ML Pipelines MLOps
Dallas, Texas MS Computer Science — AI/ML, SMU
Portrait of Avikshith Yelakonda

From Data to Production Decision Systems

overview

I bridge analytics, predictive modeling, and applied AI to ship systems that are measurable, production-aware, and useful to real teams.

My work centers on three strengths: predictive modeling for decisioning, LLM and RAG systems for knowledge workflows, and data/ML platforms that make models repeatable, observable, and deployable.

What I Deliver
  • Business problem framing tied to measurable KPIs and decision outcomes
  • Rigorous modeling, experimentation, retrieval, and evaluation pipelines
  • Deployment-ready systems with monitoring, reproducibility, and iteration loops
Python SQL Data Science Machine Learning LLMs & RAG Docker MLflow

Predictive Modeling

  • Classification, regression, forecasting, segmentation, and imbalanced decisioning problems
  • Feature engineering, validation, calibration, explainability, and statistically grounded evaluation

Applied AI Systems

  • LLM applications, RAG pipelines, semantic retrieval, OCR workflows, recommendation, and NLP systems
  • Strong focus on grounded outputs, evaluation, and user-facing automation rather than demo-only prototypes

Production Delivery

  • ETL and feature pipelines with Spark, Databricks, Airflow, SQL, and cloud platforms
  • Model deployment, experiment tracking, monitoring, and CI/CD with Docker, MLflow, and cloud tooling

Selected Impact

impact

Causal Impact Measurement

Built experimentation and uplift modeling workflows to identify high-response segments and quantify true incremental impact.

+65% relative conversion lift · p < 0.001

Mortgage Risk Modeling

Improved regulated decision support by building supervised models for borrower behavior and portfolio risk analytics.

+14% default prediction accuracy

Lead Scoring Performance

Handled imbalanced conversion data with calibrated ranking, explainability, and business-focused prioritization.

ROC‑AUC ≈ 0.683 · 2.35× top-decile lift

Forecasting & Analytics Delivery

Developed forecasting and analytics systems with evaluation diagnostics, reporting layers, and stakeholder-ready outputs.

MAPE ≈ 20.8% · 35% reporting effort reduced

Skills

stack

Predictive Modeling & Decision Science

Core

Translate business questions into modeling problems with clear metrics, interpretable outputs, and measurable operational impact.

Methods
Regression Classification Segmentation Forecasting Uplift / Causal A/B Testing
Strengths
Feature engineering Model evaluation Calibration Explainability (SHAP)

LLM, RAG & Applied AI Systems

Core

Build user-facing AI systems with retrieval, ranking, generation, and evaluation loops that prioritize grounded outputs and practical usefulness.

Methods
RAG Embeddings Semantic Search Ranking Fine-tuning Prompt Engineering
Strengths
NLP pipelines Citation-grounded retrieval RAG evaluation

Data Platforms & MLOps

Core

Build reproducible data and model pipelines with monitoring, lineage, and deployment practices that support reliable production delivery.

Systems
ETL/ELT Spark Airflow Databricks Snowflake
Ops
Docker MLflow CI/CD Monitoring

Core Languages

Python SQL R Bash C++ SAS

ML / AI Stack

scikit-learn PyTorch TensorFlow Hugging Face FAISS LlamaIndex

Data & Cloud Platforms

AWS Azure GCP Databricks Snowflake BigQuery

Analytics & Reporting

Power BI Tableau Looker Matplotlib Seaborn Excel

ML Skill Analysis

signals
evidence-backed profile

This section ranks the strongest ML signals in my profile by scanning portfolio project content, resume text, and GitHub repository metadata. It highlights where my portfolio shows the most depth, not just a list of tools.

Building skill map from portfolio, resume, and GitHub signals...

Top skill clusters

Scores are weighted by repeated evidence across projects, experience, and recent repositories.

Signal cloud

A word-cloud style view of the highest-frequency ML themes appearing across the portfolio footprint.

Why these rank high

Each theme is tied back to concrete portfolio, resume, or repository evidence.

Projects

work

A full portfolio of machine learning, analytics, and applied AI systems spanning predictive modeling, experimentation, recommendation, OCR, time-series forecasting, and retrieval-augmented generation. Each project is framed around a business problem, a technical approach, and an operational outcome.

Reinforcement Learning · Q-Learning · Interactive

Dino RL Sandbox

Built an interactive reinforcement-learning sandbox that makes Q-learning legible to non-specialists through environment controls, policy updates, and live value-map visualization.

Architecture
Grid World → State/Action → Reward Shaping → Q-Learning Update → Value Map
NLP · RAG · Reviews Intelligence

Executive Review Intelligence (LLM + RAG)

Designed a reviews intelligence system that clusters recurring issues, attaches grounded evidence, scores outputs with RAGAS, and routes executive alerts so teams can move from raw feedback to action quickly.

Architecture
Reviews → Cleaning → Embeddings → Topic Clustering → RAG Evidence → RAGAS Eval → Slack Alerts
OCR · CV · Automation

AI Expense Claim Processing (OCR + automation)

Automated a high-friction expense workflow with OCR, field extraction, confidence scoring, and human review, reducing manual handling while improving auditability and downstream analytics readiness.

Architecture
Receipts → OCR → Field Extraction → Validation → API → Review UI
LLM · RAG · Finance

Financial LLM Chatbot

Built a finance-focused LLM assistant that combines fine-tuning with retrieval over SEC filings to deliver cited answers, explain key metrics, and support analyst research workflows.

Architecture
Filings → Chunking → Embeddings → FAISS → LLM → Cited Answer
RAG · Education · Search

Teaching Assistant RAG

Built a multi-course teaching assistant that uses hybrid retrieval and course-isolated knowledge stores to deliver citation-grounded answers with lower hallucination risk and better student self-service.

Architecture
Syllabus/Notes → TF‑IDF + Embeddings → Vector Store → RAG → Streamlit UI
Experimentation · Causal Inference

A/B Testing & Uplift Modeling

Built an experimentation workflow for incremental-impact measurement, combining power analysis, uplift modeling, and segment targeting to identify the customers most likely to convert under treatment.

Architecture
Experiment Data → Power/Lift → Uplift Model → Segment Targeting
Forecasting · Retail

Walmart Demand Forecasting

Compared classical and deep-learning forecasting approaches for short-horizon retail demand planning, using feature engineering and error diagnostics to support staffing and inventory decisions.

Architecture
Sales Data → Feature Engineering → Model Compare → Forecast → Error Tracking
Classification · Lead Scoring

Lead Conversion Modeling (Consumer Affairs)

Built a lead-scoring pipeline for imbalanced conversion data, using feature engineering, calibrated XGBoost, and SHAP-based interpretation to prioritize outreach by expected business value.

Architecture
Leads → Feature Engineering → Model → SHAP → Ranked Outreach
A/B Testing · Marketing Mix

A/B Testing for Marketing Campaign Optimization

Evaluated marketing-channel performance with statistical testing and time-aware regression analysis to inform budget allocation and choose the higher-return campaign strategy.

Architecture
Channel Spend → Experiment → Statistical Test → ROI Decision
Ranking · Recommenders

Movie Recommendation & Ranking

Built a recommendation and ranking workflow over user-item interactions to generate personalized Top-N suggestions and support experimentation around engagement and discovery quality.

Architecture
User Events → Feature Engineering → Ranking Model → Top‑N Recommendations
Churn · Segmentation

Customer Churn & Segmentation

Combined churn prediction and segmentation to help retention teams identify at-risk customers, prioritize intervention, and tailor actions to higher-value behavioral segments.

Architecture
Customer Data → Feature Engineering → Churn Model → Segmentation → Retention Playbooks
Risk · Fraud Detection

Fraud Detection Modeling

Developed an imbalance-aware fraud detection pipeline that generates review-ready risk scores, helping teams surface suspicious transactions while controlling false positives.

Architecture
Transactions → Feature Engineering → Fraud Model → Risk Scoring → Review Queue
Healthcare · EHR Analytics

EHR Analytics & Predictive Modeling

Standardized noisy EHR data into modeling-ready datasets and built predictive baselines that support clinical risk analysis and more reliable downstream healthcare analytics.

Architecture
EHR Data → Cleaning → Feature Engineering → Predictive Model → Clinical Insights
Time Series · Finance

Stock Market Time‑Series Analysis

Analyzed equity time series with indicator engineering and visual diagnostics to compare trend, volatility, and portfolio behavior across multiple public-market assets.

Architecture
Price Data → Indicators → Diagnostics → Visualization
Clustering · Healthcare

Heart Disease Risk Clustering (Health Care)

Applied dimensionality reduction and clustering to uncover patient-risk groupings in partially labeled clinical data, enabling more targeted downstream analysis.

Architecture
Clinical Data → PCA → Clustering → Risk Groups
Matching · Ranking

Tenant Matcher

Built a preference-based property matching system that scores candidate listings, ranks them for fit, and automates recommendation delivery to renters.

Architecture
Preferences → Scoring → Ranking → Email Delivery
Data Collection · Scraping

Job Web Scraping (job_web)

Created reusable scraping pipelines that collect job-market listings, normalize them into structured datasets, and support downstream labor-market analysis.

Architecture
Sites → Scraper → Normalization → Dataset

Experience

timeline

Data Scientist — PioneerSoft (Client: Freddie Mac)

Jul 2025 – Present · United States
  • Own end-to-end ML delivery for regulated mortgage and housing finance use cases, spanning data ingestion, feature engineering, model training, validation, deployment support, and audit-ready lifecycle documentation.
  • Build predictive models for loan performance, credit risk, and portfolio behavior using Python and SQL, applying feature engineering, cross-validation, backtesting, and drift checks to improve reliability in production decision support.
  • Develop NLP and LLM workflows including RAG, embeddings, and structured extraction to convert policy memos, underwriting notes, and servicing documents into searchable, measurable analytics assets.
  • Engineer scalable data and feature pipelines with quality gates, lineage, and governance controls to support repeatable experiments and stable production training.
  • Operationalize models with MLflow, Docker, CI/CD, and cloud tooling; create monitoring dashboards for performance, data drift, and compliance reporting.
  • Partner with risk, compliance, and business stakeholders to turn regulatory requirements into explainable, production-ready ML outputs.
Python SQL ML Systems LLMs RAG MLOps MLflow Docker CI/CD Data Pipelines Cloud

Data Analyst — Yogin

May 2021 – May 2023 · India · Hybrid
  • Delivered end-to-end analytics and ML workflows that transformed multi-source operational data into curated datasets, models, and dashboards for product and business teams.
  • Built forecasting, segmentation, and classification models with Python and SQL; applied feature engineering, hypothesis testing, and evaluation to improve predictive performance and trust in results.
  • Designed ETL pipelines and automated reporting layers that improved data consistency, reduced manual effort, and enabled more scalable analysis.
  • Built KPI dashboards in Power BI and Tableau to turn analytical findings into decision-ready narratives for non-technical stakeholders.
  • Prototyped early NLP capabilities including text classification, topic modeling, and entity extraction, integrating them into practical analytics workflows.
Python SQL Forecasting Machine Learning ETL Pipelines Power BI Tableau NLP

Additional Leadership & Operations Experience

Southern Methodist University · 2023 – 2025 · Part-time
  • Supported high-trust operations across campus roles that required confidentiality, responsiveness, and reliable execution under time pressure.
  • Served in student-facing leadership positions that strengthened communication, documentation discipline, and cross-functional coordination.
  • Applied the same professional habits used in technical work: ownership, prioritization, stakeholder support, and calm handling of urgent issues.
Communication Leadership Problem Solving Stakeholder Support Operations Execution

Education & Professional Development

profile

M.S. Computer Science — Artificial Intelligence & Machine Learning

Southern Methodist University · Dallas · Aug 2023 – May 2025

GPA: 3.5 / 4.0. Graduate focus in applied machine learning, AI systems, and production-oriented data science.

B.Tech Computer Engineering

CVR College of Engineering · Hyderabad · 2019 – 2023

Built the technical foundation in software, analytics, and machine learning that shaped later AI/ML specialization.

Education Leadership Recognition Certifications Events

Leadership Roles

  • Board Member, SMU International Student Board; represented student interests and supported community programming.
  • Program Coordinator for SMU residential leadership initiatives; improved peer engagement and operational coordination.
  • Coordinator Head for a college annual fest; drove cross-team planning and execution at scale.
  • Coordinator for Street Cause at CVR College of Engineering; organized volunteers and community-facing initiatives.

Recognition

  • Nominee, Unsung Hero Award — SMU RLSH Year-End Student Banquet.

Certifications

British Airways — Data Science (Forage, May 2025) Deloitte AU — Data Analytics (Forage, May 2025) Quantium — Data Analytics (Forage, May 2025) Tata — Data Visualisation (Forage, May 2025) Data Analyst Certification — OneRoadmap (Apr 2025) Google Project Management — Coursera

Hackathons & Events

Hackathon — SMU Career Development Programme (Oracle) — CVR

Let's Connect

reach

I’m pursuing Data Scientist, ML Engineer, and Applied AI roles where I can own analytics and machine learning systems from raw data through deployment-ready outputs.