Hi, my name is
Daksh Sharma.
I build |
AI-native Data Engineer with 2+ years specialising in cloud-native ETL/ELT pipelines, Snowflake, and ML-powered analytics. Currently building intelligent data systems at DS Group, Noida. First-Class MSc in Data Analytics from National College of Ireland.
01.About Me
I'm a data engineer and analyst based in Delhi, India — recently back after three years working across Dublin's FMCG and tech scene, and now diving into the Indian market headfirst.
My work spans cloud-native Snowflake pipelines (Snowpipe, CDC Streams, Cortex AI agents), machine learning (LightGBM, XGBoost, BERT, Neural Networks), and no-code/low-code BI tools that let non-technical teams actually interact with their data and not just receive a report about it. If a business analyst can answer their own questions without raising a ticket, I've done my job.
When I'm not at a keyboard I'm on a badminton court, travelling somewhere new, or working through a brain teaser that has no business taking this long to solve.
02.Experience
03.Featured Projects
End-to-end platform with 6 ML models. XGBoost price predictor R²=0.9988, RMSE=₹16.59. LightGBM category classifier 74% accuracy across 24 classes. TF-IDF recommendation system on 8,011 products. Full Snowflake ETL from GCS, 4-page Streamlit dashboard with Cortex AI.
View Project103K records from Snowflake Marketplace. LightGBM price predictor R²=0.9804, RMSE=¥6,739. Sell-speed classifier 62.1% accuracy, +9.3pp above baseline. Cortex AI agent with 4-stage architecture, rolling memory, verified query cache. 7-tab Streamlit dashboard.
View ProjectSelf-initiated tool for DS Group's Snowflake migration. 4-layer architecture: ACCOUNT_USAGE → CORE views → RECOMMENDATIONS → Streamlit + Task. 5 business rules with HIGH/MEDIUM/LOW severity. Daily 9AM HTML email report. Presented to engineering leadership.
View ProjectMSc dissertation. Novel consensus-labelling: tweets agreed on by both VADER and RoBERTa used as pseudo ground-truth for BERT fine-tuning. Best model 83.95% accuracy, F1=74.47%. 18 hyperparameter combinations tested per model. Supervisor: Mrs. Harshani Nagahamulla.
View ProjectLast-Mile Delivery Simulation
SimPy discrete-event simulation with Floyd-Warshall shortest paths and TSP via PuLP ILP. 193 parcels simulated; 65.8% same-day delivery over 10 days. MSc — Modelling, Simulation & Optimisation.
Plant Leaf Disease Detection — Custom CNN
Custom 3-block CNN from scratch in TensorFlow/Keras. Binary healthy/diseased classifier across 3,601 images from 11 plant species. 256×256 RGB, Adam optimiser, TensorBoard. MSc group project.
Nurse Roster Optimisation
OR-Tools CP-SAT constraint solver. 12 constraints including EU EWTD 48-hour limit, team depletion prevention, night-shift continuity, weekend equity. All 10 nurses within 1 shift of optimal. Interview assessment task.
04.Technical Arsenal
Languages
Data Platforms
Data Engineering
AI & Machine Learning
BI & Visualisation
Tools & Methods
05.Education
Dissertation: Opinion Mining on Ukraine-Russia War Tweets — novel consensus-labelling, 83.95% BERT accuracy. Modules: ML, Simulation & Optimisation, Data Mining, Research Methods.
Statistical inference, probability theory, regression analysis, time series, forecasting, operations research, and data analysis using R.
16 Certifications
06. What's Next?
Get In Touch
Open to data engineering, analytics, and ML engineering roles worldwide. Whether you have a question, an opportunity, or just want to say hi — my inbox is always open.