Hamza Zaman

Data Analyst ↔ Data Scientist — SQL • Power BI • Python • ML/NLP/LLMs

Snapshot

Work authorisation

Eligible to work in the UK (open to relocation)

Location

London, United Kingdom

Email

hamzazaman04@gmail.com

LinkedIn

linkedin.com/in/hamza-zaman-data

GitHub

github.com/hamza-zaman

Tech stack for commercial work

Analyst Core

  • SQL (T-SQL, window funcs, CTEs)
  • Power BI (DAX, Power Query/M)
  • Excel (Power Query, Pivot, VBA basics)
  • Python for analysis (pandas, NumPy)

DS / ML

  • scikit-learn, XGBoost/LightGBM
  • Feature engineering, CV, ROC/AUC
  • Time-series & forecasting
  • Experiment tracking (MLflow)

Modern AI / NLP

  • LLMs (open-source: Llama/Mistral)
  • RAG pipelines (LangChain/LlamaIndex)
  • Vector DBs (FAISS / pgvector / Chroma)
  • Prompt engineering, eval & guardrails

Data Eng & Ops

  • APIs (REST/GraphQL), ETL in Python
  • Azure SQL, star schemas
  • Git, CI/CD (GitHub Actions), Docker (basic)
  • Azure ML / SageMaker (exposure)

Soft skills that ship value

Stakeholder communication & expectation settingKPI design, experimentation, and measurementData storytelling for non-technical audiencesPrioritisation & delivery under time constraintsData quality, governance, and GDPR awarenessCoaching & enabling self-serve analytics

Outcomes

Reporting time: 3h → 15m (automation, Power BI + Python ETL)

Decisions 40% faster (API-powered dashboards: monday.com, GA, Hootsuite, Cvent)

£50k+ savings (spend/variance insights in Power BI)

Revenue +£10k/week (GA Trainline analysis, process fixes)

95% availability during COVID (demand forecasting)

Selected projects

Car Insurance Claim Prediction

Python, XGBoost
  • Gradient boosting model to estimate claim risk for personal auto policies.
  • Feature engineering improved premium accuracy and profitability.

London Fire Brigade Incident Analytics

Python, scikit-learn
  • Classification & clustering to flag false alarms; ~20% cost premium exposed.
  • Actions cut avoidable call-outs by ~33% (deployment planning).

Osteoporosis Fracture-Risk Prediction

Python, scikit-learn
  • Benchmarked KNN, Random Forest, Logistic Regression, and SVM models; KNN (k=7) achieved the highest accuracy at 88% on the balanced dataset.
  • Top predictive factors identified from Random Forest feature importance were BMD and Age.

Master’s Thesis — NLP Lip-Reading

TensorFlow, Seq2Seq (GRU-attention)
  • Optimised on 45k LRS2 sentences (TPU); phoneme-viseme features.
  • ~3% WER and 0.92 BLEU (~+15 pp vs baseline).