Senior Cloud Architect Job Details

Job Description

Job Title: Senior Cloud Architect

City: Mountain View

State/Province: California

Posting Start Date: 3/13/26

Wipro Limited (NYSE: WIT, BSE: 507685, NSE: WIPRO) is a leading technology services and consulting company focused on building innovative solutions that address clients’ most complex digital transformation needs. Leveraging our holistic portfolio of capabilities in consulting, design, engineering, and operations, we help clients realize their boldest ambitions and build future-ready, sustainable businesses. With over 230,000 employees and business partners across 65 countries, we deliver on the promise of helping our customers, colleagues, and communities thrive in an ever-changing world. For additional information, visit us at www.wipro.com.

Job Description:

Job Description

Job Description: Cloud Architect (GPU/TPU Infrastructure)

Location: [Mountain View, CA]

Experience Level: 10–15+ Years

Engineering Function: Cloud Infrastructure / AI & Data Engineering

Role Objective

You will be the lead architect responsible for designing scalable, high-performance cloud infrastructure optimized for AI/ML workloads. Your goal is to architect environments that maximize the compute efficiency of NVIDIA H100/B200 (GPUs) and Google Cloud TPUs, ensuring low-latency communication and high-throughput data pipelines for enterprise-scale AI.

Key Responsibilities

Cluster Design: Architect and deploy large-scale GPU/TPU clusters using Kubernetes (GKE/EKS) or specialized orchestrators like Slurm.
High-Performance Networking: Design the interconnect fabric (e.g., InfiniBand, RoCE v2, or Google’s ICI) to prevent "communication bottlenecks" during distributed training.
Storage Optimization: Implement high-speed data solutions (e.g., Lustre, Weka, or GPFS) to feed massive datasets to accelerators without starving the processors.
Cost & Capacity Orchestration: Balance performance vs. cost by implementing "Spot" instance strategies, autoscaling, and resource quotas to prevent $100k+ overruns.
Framework Integration: Optimize the infrastructure for AI frameworks like PyTorch, JAX, and TensorFlow, ensuring proper driver/library (CUDA, cuDNN) compatibility.

Technical Requirements & Skills

Category	Requirements
Compute	Expertise in NVIDIA HGX/DGX architectures and Google TPU v5p/Trillium pods.
Orchestration	Mastery of Kubernetes (specifically Device Plugins for GPUs) and Terraform/Ansible for "Infrastructure as Code."
Networking	Deep understanding of RDMA (Remote Direct Memory Access) and non-blocking Clos topologies.
AI Workloads	Familiarity with Distributed Training techniques (Data Parallelism, Model Parallelism, Pipeline Parallelism).
Cloud Platforms	Professional Certifications in GCP with a focus on high-performance compute (HPC) instances.

Experience Screening:

Distributed Training at Scale: Proven experience managing jobs across 128+ GPUs or multiple TPU pods.
Telemetry & Monitoring: Experience setting up Prometheus/Grafana dashboards specifically for GPU metrics (utilization, memory bandwidth, thermal throttling).
Security: Implementing "Confidential Computing" and secure data enclaves for sensitive AI training data.

͏

Technical Interview Scorecard: GPU/TPU Cloud Architect

1. Compute & Accelerator Architecture

Focus: Understanding the "metal" and how the OS interacts with it.

The Question: "Explain the architectural difference between an NVIDIA H100 GPU and a Google TPU v5p. In what scenarios would you recommend one over the other for a client?"
What to look for: Mentions of HBM3 (High Bandwidth Memory), systolic arrays (TPU) vs. Streaming Multiprocessors (GPU), and the difference between CUDA (vendor-locked) and JAX/XLA (portable/optimized for TPUs).
Red Flag: Treating a GPU/TPU like a standard CPU instance that just "runs faster."

2. Distributed Training & Interconnects

Focus: Networking is almost always the bottleneck in AI.

The Question: "A client’s LLM training job is showing high 'GPU Wait' times during the All-Reduce step. How do you diagnose and fix this at the infrastructure level?"
What to look for: Discussion of RDMA (Remote Direct Memory Access), InfiniBand vs. RoCE v2, and ensuring a non-blocking Clos Topology. They should mention checking for "noisy neighbors" on the network or incorrect NIC-to-GPU mapping.
Red Flag: Suggesting more RAM or a faster CPU; these rarely fix inter-node communication lag.

3. Orchestration & Scheduling

Focus: Kubernetes is the standard, but it wasn't built for AI.

The Question: "How do you handle 'Gang Scheduling' in a Kubernetes environment for a job that requires 64 GPUs across 8 nodes?"
What to look for: Familiarity with tools like Kueue, Volcano, or Slurm. They should explain that in AI, all pods must start simultaneously; if one node fails to spin up, the entire job must wait or fail to avoid wasting compute.
Red Flag: Assuming standard K8s Horizontal Pod Autoscaling (HPA) works for deep learning jobs.

4. Storage & Data I/O

Focus: Feeding the beast.

The Question: "An H100 can process data at massive speeds. How do you design the storage layer to ensure the GPU isn't 'starving' for data?"
What to look for: Knowledge of GPUDirect Storage (GDS), parallel file systems like Lustre or WekaIO, and the use of local NVMe SSDs for caching intermediate checkpoints.
Red Flag: Suggesting standard S3/Object storage for direct training without a caching or high-speed middle layer.

Scoring Rubric

Score	Level	Description
1-2	Novice	Understands Cloud (EC2/S3) but treats GPUs as "black boxes." No RDMA knowledge.
3	Intermediate	Can set up a GPU node and run a container; understands CUDA versions.
4	Advanced	Understands multi-node scaling, InfiniBand, and the impact of the software stack (NCCL/RCCL).
5	Expert	Can design a 1024-GPU "AI Factory" from scratch, including power, cooling, and high-speed fabric.

Mandatory Skills: Cloud Engineering GCP .

Experience: 8-10 Years .

The expected compensation for this role ranges from $100,000 to $180,000 .

Final compensation will depend on various factors, including your geographical location, minimum wage obligations, skills, and relevant experience. Based on the position, the role is also eligible for Wipro's standard benefits including a full range of medical and dental benefits options, disability insurance, paid time off (inclusive of sick leave), other paid and unpaid leave options.

Applicants are advised that employment in some roles may be conditioned on successful completion of a post-offer drug screening, subject to applicable state law.

Wipro provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. Applications from veterans and people with disabilities are explicitly welcome.

Reinvent your world. We are building a modern Wipro. We are an end-to-end digital transformation partner with the boldest ambitions. To realize them, we need people inspired by reinvention. Of yourself, your career, and your skills. We want to see the constant evolution of our business and our industry. It has always been in our DNA - as the world around us changes, so do we. Join a business powered by purpose and a place that empowers you to design your own reinvention.

Information at a Glance

Get Job Alerts

Receive notifications when we have open roles and get other relevant career news

Register >

Join Us

Explore open roles that match your interests and skills

Search Jobs >

If you encounter any suspicious mail, advertisements, or persons who offer jobs at Wipro, please email us at helpdesk.recruitment@wipro.com. Do not email your resume to this ID as it is not monitored for resumes and career applications.

Any complaints or concerns regarding unethical/unfair hiring practices should be directed to our Ombuds Group at ombuds.person@wipro.com.

We are an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, caste, creed, religion, gender, marital status, age, ethnic and national origin, gender identity, gender expression, sexual orientation, political orientation, disability status, protected veteran status, or any other characteristic protected by law.

Wipro is committed to creating an accessible, supportive, and inclusive workplace. Reasonable accommodation will be provided to all applicants including persons with disabilities, throughout the recruitment and selection process. Accommodations must be communicated in advance of the application, where possible, and will be reviewed on an individual basis. Wipro provides equal opportunities to all and values diversity.