John Rowe

John Rowe

Principal Data Engineer | Machine Learning Engineer

Atlanta, Georgia, United States

With 13+ years of experience in large-scale data systems, cloud data platforms, and ML deployment, I specialize in building enterprise-grade data infrastructure that powers critical business decisions. My expertise spans R&D, healthcare, and finance domains, where I've architected solutions processing petabytes of data across multi-cloud environments.

13+
Years Experience
Multi-Cloud: Azure · AWS · GCP
Cloud Platforms
40+
Automated Pipelines
70+ Countries
Global Reach

Experience

Principal Data Engineer

GSK

California, United States

September 2024Present

Leading cloud data platform architecture and ML deployment initiatives for enterprise-scale data systems.

  • Architected cloud data platform with Azure Synapse, Data Lake, Databricks, Delta Lake, ADLS Gen2, and Event Hubs, enabling scalable data processing at petabyte scale
  • Designed and implemented 40+ automated ETL/ELT pipelines using PySpark, Databricks, ADF, Airflow, dbt, and Logic Apps, reducing manual processing time by 95%
  • Built real-time streaming systems with Kafka, Spark Structured Streaming, Flink, and Event Hubs for mission-critical data ingestion and processing
  • Established lakehouse architecture with Delta Lake, Delta Live Tables, Parquet, Hive Metastore, partitioning, and Z-Ordering for optimized query performance
  • Optimized high-performance data processing frameworks using PySpark, Koalas, UDFs, and vectorized operations, achieving 65% reduction in processing time
  • Implemented containerization and DataOps practices with Docker, Kubernetes, Helm, Terraform, and CI/CD pipelines for automated deployments
  • Deployed ML models to production using MLflow, Azure ML, Databricks ML, scikit-learn, TensorFlow, and PyTorch with comprehensive monitoring
  • Established comprehensive monitoring and governance with Prometheus, Grafana, ELK, CloudWatch, Datadog, Databricks metrics, Azure Purview, Great Expectations, IAM/RBAC, and encryption/KMS
  • Mentored 12+ engineers and led PoCs on vector databases, orchestrators, and next-generation lakehouse tools
Azure SynapseDatabricksDelta LakeADLS Gen2PySparkKafkaAirflowDockerKubernetesMLflowTerraform

Senior Data Engineer

The World Bank

Global

September 2012June 2024

Built enterprise data infrastructure supporting global development initiatives across 70+ countries.

  • Architected enterprise data warehouse integrating 60+ cross-regional data sources using Redshift, BigQuery, Snowflake, and custom Go/Python connectors
  • Developed ETL frameworks on Airflow, AWS Glue, Dataflow, and Databricks, achieving 65% reduction in processing time and 95% automation of manual workflows
  • Designed and managed data lakes and MDM systems on S3, GCS, Delta Lake, and BigLake at petabyte scale for global project tracking
  • Created real-time dashboards with Power BI, Looker, and Data Studio for lending risk analysis, climate metrics, and project KPIs used by 100+ regulators
  • Implemented comprehensive data governance and lineage with Lake Formation, Dataplex, Great Expectations, and Collibra for compliance across 70+ countries
  • Deployed production ML models on SageMaker, Vertex AI, and Databricks ML for risk scoring, forecasting, and project evaluation, improving accuracy by 25%
  • Conducted mentorship, code reviews, and global workshops to uplift regional technical teams across multiple continents
RedshiftBigQuerySnowflakeAirflowAWS GlueS3SageMakerPower BILookerGoPython

Leadership & Highlights

Multi-Cloud Migration

Migrated legacy systems to multi-cloud infrastructure (Azure/AWS/GCP) with zero downtime, enabling scalable and resilient data operations.

Zero downtime migration

Pipeline Monitoring & Automation

Designed comprehensive monitoring, alerting, and automated recovery systems for big data pipelines, reducing incident response time by 60%.

60% faster incident response

Agile & DevOps Transformation

Led agile transformation for GSK's data engineering group, establishing DevOps and CI/CD practices that improved deployment velocity by 3x.

3x deployment velocity

Real-Time Tracking System

World Bank Technology Excellence Award (2020) – First real-time tracking system for global development indicators across 70+ countries.

Technology Excellence Award 2020

Skills & Platforms

Languages & Platforms

GoPythonJavaScalaBashSQLPySparkC++

Big Data & Databases

PostgreSQLMySQLSQL ServerSnowflakeRedshiftBigQueryHadoopSparkDatabricks

Cloud Technologies

Azure: ADF, Synapse, Delta Lake, Data Lake, ADLSv2AWS: EC2, S3, Glue, Lambda, SageMakerGCP: BigQuery, Dataflow, Vertex AI

Data Engineering

ETL/ELT Pipeline ArchitectureBatch & Streaming Ingestion (Kafka, Kinesis)OLAP/OLTP SystemsData Modeling (Star/Snowflake)Data WarehousingDistributed Data Management

DevOps & Automation

DockerKubernetesJenkinsTerraformCI/CDAirflowPrefectGitAzure DevOps

Visualization

Power BITableauLookerSuperset

AI/ML Engineering

MLflowTensorFlowPyTorchFeature EngineeringProduction ML WorkflowsModel Monitoring

Data Governance

Data Quality FrameworksHIPAA/GDPR ComplianceData CatalogingMetadata ManagementAzure PurviewGreat Expectations

Business & Leadership

Stakeholder ManagementTechnical MentorshipTeam LeadershipCross-Domain CollaborationRequirements Analysis

Certifications

Azure Data Engineer Associate

Microsoft

2024

AWS Certified Data Analytics – Specialty

AWS

2023

Google Cloud Professional Data Engineer

Google Cloud

2023

Databricks Certified Spark Developer

Databricks

2024

Education

Bachelor's Degree in Computer Science

Madison Media Institute

20062011

Publications, Speaking & Awards

Publications

Scaling Streaming Analytics for Global Health

ACM SIGMOD, 2022

Co-authored research paper on scalable streaming analytics systems for global health data processing.

Speaking & Professional Development

Regular Speaker

PyData, DataEngConf, Open Source Summit, 2020

Regular speaker at industry conferences on data engineering, streaming analytics, and open source technologies.

Data Ethics & Security Workshops

GSK/World Bank Workshops, 2024

Conducted workshops on data ethics, security, and emerging technologies for technical teams.

Awards

Technology Innovator Award

GSK, 2025

Recognized for innovation in cloud data platform architecture and ML deployment systems.

Technology Excellence Award

World Bank, 2020

First real-time tracking system for global development indicators across 70+ countries.

Contact

Get in Touch

Interested in collaborating on data platform architecture, ML systems, or large-scale analytics? Let's connect.