John Rowe
Principal Data Engineer | Machine Learning Engineer
Atlanta, Georgia, United States
With 13+ years of experience in large-scale data systems, cloud data platforms, and ML deployment, I specialize in building enterprise-grade data infrastructure that powers critical business decisions. My expertise spans R&D, healthcare, and finance domains, where I've architected solutions processing petabytes of data across multi-cloud environments.
Experience
Principal Data Engineer
GSK
California, United States
September 2024 – Present
Leading cloud data platform architecture and ML deployment initiatives for enterprise-scale data systems.
- •Architected cloud data platform with Azure Synapse, Data Lake, Databricks, Delta Lake, ADLS Gen2, and Event Hubs, enabling scalable data processing at petabyte scale
- •Designed and implemented 40+ automated ETL/ELT pipelines using PySpark, Databricks, ADF, Airflow, dbt, and Logic Apps, reducing manual processing time by 95%
- •Built real-time streaming systems with Kafka, Spark Structured Streaming, Flink, and Event Hubs for mission-critical data ingestion and processing
- •Established lakehouse architecture with Delta Lake, Delta Live Tables, Parquet, Hive Metastore, partitioning, and Z-Ordering for optimized query performance
- •Optimized high-performance data processing frameworks using PySpark, Koalas, UDFs, and vectorized operations, achieving 65% reduction in processing time
- •Implemented containerization and DataOps practices with Docker, Kubernetes, Helm, Terraform, and CI/CD pipelines for automated deployments
- •Deployed ML models to production using MLflow, Azure ML, Databricks ML, scikit-learn, TensorFlow, and PyTorch with comprehensive monitoring
- •Established comprehensive monitoring and governance with Prometheus, Grafana, ELK, CloudWatch, Datadog, Databricks metrics, Azure Purview, Great Expectations, IAM/RBAC, and encryption/KMS
- •Mentored 12+ engineers and led PoCs on vector databases, orchestrators, and next-generation lakehouse tools
Senior Data Engineer
The World Bank
Global
September 2012 – June 2024
Built enterprise data infrastructure supporting global development initiatives across 70+ countries.
- •Architected enterprise data warehouse integrating 60+ cross-regional data sources using Redshift, BigQuery, Snowflake, and custom Go/Python connectors
- •Developed ETL frameworks on Airflow, AWS Glue, Dataflow, and Databricks, achieving 65% reduction in processing time and 95% automation of manual workflows
- •Designed and managed data lakes and MDM systems on S3, GCS, Delta Lake, and BigLake at petabyte scale for global project tracking
- •Created real-time dashboards with Power BI, Looker, and Data Studio for lending risk analysis, climate metrics, and project KPIs used by 100+ regulators
- •Implemented comprehensive data governance and lineage with Lake Formation, Dataplex, Great Expectations, and Collibra for compliance across 70+ countries
- •Deployed production ML models on SageMaker, Vertex AI, and Databricks ML for risk scoring, forecasting, and project evaluation, improving accuracy by 25%
- •Conducted mentorship, code reviews, and global workshops to uplift regional technical teams across multiple continents
Leadership & Highlights
Multi-Cloud Migration
Migrated legacy systems to multi-cloud infrastructure (Azure/AWS/GCP) with zero downtime, enabling scalable and resilient data operations.
Pipeline Monitoring & Automation
Designed comprehensive monitoring, alerting, and automated recovery systems for big data pipelines, reducing incident response time by 60%.
Agile & DevOps Transformation
Led agile transformation for GSK's data engineering group, establishing DevOps and CI/CD practices that improved deployment velocity by 3x.
Real-Time Tracking System
World Bank Technology Excellence Award (2020) – First real-time tracking system for global development indicators across 70+ countries.
Skills & Platforms
Languages & Platforms
Big Data & Databases
Cloud Technologies
Data Engineering
DevOps & Automation
Visualization
AI/ML Engineering
Data Governance
Business & Leadership
Certifications
Azure Data Engineer Associate
Microsoft
2024
AWS Certified Data Analytics – Specialty
AWS
2023
Google Cloud Professional Data Engineer
Google Cloud
2023
Databricks Certified Spark Developer
Databricks
2024
Education
Bachelor's Degree in Computer Science
Madison Media Institute
2006 – 2011
Publications, Speaking & Awards
Publications
Scaling Streaming Analytics for Global Health
ACM SIGMOD, 2022
Co-authored research paper on scalable streaming analytics systems for global health data processing.
Speaking & Professional Development
Regular Speaker
PyData, DataEngConf, Open Source Summit, 2020
Regular speaker at industry conferences on data engineering, streaming analytics, and open source technologies.
Data Ethics & Security Workshops
GSK/World Bank Workshops, 2024
Conducted workshops on data ethics, security, and emerging technologies for technical teams.
Awards
Technology Innovator Award
GSK, 2025
Recognized for innovation in cloud data platform architecture and ML deployment systems.
Technology Excellence Award
World Bank, 2020
First real-time tracking system for global development indicators across 70+ countries.
Contact
Get in Touch
Interested in collaborating on data platform architecture, ML systems, or large-scale analytics? Let's connect.