Job Description
Role Purpose
The purpose of the role is to create exceptional architectural solution design and thought leadership and enable delivery teams to provide exceptional client engagement and satisfaction.
͏
As the AI Systems Architect, you’ll own the end-to-end design and delivery of production-grade agentic and Generative AI systems. This is a highly hands-on role requiring deep architectural insight, coding proficiency, and an obsession with performance, scalability, and reliability. You’ll architect secure, cost-efficient AI platforms on AWS, guide developers through complex debugging and optimization, and ensure all systems are observable, governed, and production-ready.
͏
- Architect Production AI Systems: Design robust overall architectures for agentic systems (planning, reasoning, tool-calling), GenAI/RAG pipelines, and evaluation workflows. Create detailed design documents including flow/UML/sequence diagrams and AWS deployment topologies. Additionally, ensure architectures support advanced LLM training and inference workflows, incorporating distributed strategies for scalability.
- Optimize for Cost & Performance: Model throughput, latency, concurrency, autoscaling, CPU/GPU sizing, and vector index performance to ensure scalable, efficient deployments. Include optimization for multi-node GPU clusters and distributed training efficiency to reduce compute overhead.
- Lead Debugging & Stability Efforts: Conduct deep-dive debugging, fix critical defects, and resolve production incidents; pair-program with developers to improve code quality and performance. Apply MLOps-driven stability practices, leveraging configuration management and automated recovery for high availability.
- Standardize Agentic Frameworks: Build reference implementations using Semantic Kernel (preferred), LangGraph, AutoGen, or CrewAI with strong schema validation, grounding, and memory management.
- Implement Observability & Monitoring: Set up distributed tracing, metrics, and logging via OpenTelemetry and Datadog. Standardize dashboards, alerts, and incident response workflows.
- Govern Evaluation & Rollouts: Build test and evaluation frameworks—golden sets, A/B experiments, regression suites, and controlled rollouts—to ensure consistent quality across releases.
- Establish Engineering Standards: Create reusable SDKs, connectors, CI/CD templates, and architecture review checklists to promote consistency across teams.
- Cross-Functional Leadership: Collaborate with product, data, and SRE teams for capacity planning, DR strategies, and post-incident RCA reviews. Mentor engineers to strengthen design and reliability practices
͏
- Education: Bachelor’s/Master’s from a top-tier institute (IIT/Tier-1) in Computer Science, AI, or related field.
- 7–10 years in software/AI engineering, including 4+ years in GenAI application development and 2+ years architecting agentic AI systems.
- Expert in Python 3.11+ (asyncio, typing, packaging, profiling, pytest).
- Hands-on experience with Semantic Kernel, LangGraph, AutoGen, or CrewAI.
- Proven delivery of GenAI/RAG systems on AWS Bedrock or equivalent vector-based platforms (OpenSearch Serverless, Pinecone, Redis).
- Deep understanding of AWS ecosystem: EKS, Bedrock, S3, SQS/SNS, RDS, ElastiCache, Secrets Manager, IAM/Okta, Kong API Gateway
͏
Experience: 8-10 Years .
Reinvent your world. We are building a modern Wipro. We are an end-to-end digital transformation partner with the boldest ambitions. To realize them, we need people inspired by reinvention. Of yourself, your career, and your skills. We want to see the constant evolution of our business and our industry. It has always been in our DNA - as the world around us changes, so do we. Join a business powered by purpose and a place that empowers you to design your own reinvention.