← Back to all jobs

Senior Data & AI Platform Engineer

CoinDCX logo

CoinDCX

📍 Bengaluru, Karnataka, India💰Competitive🕐 Posted
Data Engineer
pythonpysparkspark-sqlkafkaairflowmlflowdatabricksdelta-lakelangchainlanggraph
Apply

Job Description

About Us

At CoinDCX, our mission is clear – to make crypto and blockchain accessible to every Indian and enable them to participate in the future of finance.

As India's first crypto unicorn valued at $2.45B, we are reshaping the financial ecosystem by building safe, transparent, and scalable products that power adoption at scale.

We believe that change starts together. It begins with bold ideas, relentless execution and people who want to build what's next. If you're driven by purpose and thrive in environments where your work defines the next chapter of an industry, you'll feel right at home here.

The Role

Operating a premier crypto exchange means moving at the absolute speed of the market. Crypto never sleeps, risk patterns evolve continuously, and malicious actors iterate by the minute. At CoinDCX, our data engineering foundation is already highly mature—processing billions of events daily via Databricks and Kafka. We aren't looking for someone to build basic data pipelines. We are looking for an exceptional engineer to construct the CoinDCX AI Value Platform.

This horizontal infrastructure layer will transform our massive data footprint into automated, intelligent action. You will build the frameworks, model registries, and context stores that allow both classic machine learning models and state-of-the-art Agentic AI systems to execute critical workflows safely—spanning real-time account takeover (ATO) containment, algorithmic crypto withdrawal risk-tiering, referral abuse detection, and AI-assisted wealth intelligence.

Responsibilities

  • Engineer the CoinDCX Entity 360 & Semantic Layers
  • Architect and optimize the Entity 360 Platform—specifically unifying disparate data streams into high-performance, real-time context stores including User 360, Wallet 360 (On-chain/Off-chain balance states), and Token 360
  • Build and govern a centralized Semantic and Metrics Layer to guarantee that data models, internal engines, and AI agents reference identical, deterministic definitions for core crypto metrics (e.g., active trader, malicious wallet cluster, referral loop, and crypto deposit/withdrawal (CDW) eligibility)
  • Standardize Exchange-Scale MLOps & Lifecycle Tracking
  • Own the deployment and standardization of MLflow (Model Registry, Tracking, Recipes) across CoinDCX to catalog, version, and deploy predictive models safely into our 24/7 production environment
  • Set up automated evaluation pipelines and tracing frameworks via MLflow LLM Tracking to capture live inputs/outputs, monitor data and feature drift, and benchmark model accuracy against real-world crypto market fluctuations
  • Build Agentic AI & Advanced LLM Infrastructure
  • Design and scale the data-routing backends required for Multi-Agent Systems (using LangGraph, CrewAI, or similar frameworks) to automate intricate compliance and operational journeys—such as auto-summarizing AML cases, evaluating token listing/delisting intelligence, and executing smart customer support agent routing
  • Build low-latency Retrieval-Augmented Generation (RAG) data systems. Optimize data chunking strategies, embed generation, vector database indexing (via Databricks Vector Search), and semantic caching to eliminate hallucination vectors within customer-facing applications
  • Leverage & Fuel Core Feature Stores
  • Build and maintain low-latency Feature Stores that pull directly from live Databricks (PySpark, Spark SQL, Delta Lake) environments to serve unified real-time signals to downstream transaction-monitoring and threat-detection models
  • Interface seamlessly with active Kafka/MSK, Auto Loader, and Change Data Capture (CDC) architectures to ensure downstream AI applications scale effortlessly without impacting existing core ledger or reporting SLAs
  • Implement Web3 Governance & Guardrails: enforce institutional-grade security guardrails directly into the platform layout including automated PII tokenization, wallet-masking policies, and rigorous access control via Unity Catalog

Requirements

  • 4+ years of intensive platform or data engineering experience
  • SDE-2 or early SDE-3 level candidate with elite programming fundamentals, massive learning velocity, and zero fear of shifting paradigms
  • Expert-level mastery of Python, PySpark, and Spark SQL optimization
  • Intimate understanding of how distributed memory management works and how to manipulate massive datasets efficiently
  • Direct experience with Kafka/MSK and Apache Airflow (or Databricks Workflows) for complex, high-dependency system workflows
  • Practical implementation experience with MLflow for production model lifecycles
  • Strong conceptual or practical exposure to Vector Architectures and LLM coordination abstractions (LangChain, LangGraph, or LlamaIndex)

Nice to Have

  • Prior exposure to high-integrity transactional spaces—such as order matching engines, double-entry ledgers, blockchain nodes, risk compliance systems, or real-time payment gateways
  • FinTech or Crypto context experience

Success Metrics

  • Fully standardize and operationalize MLflow pipelines across the team, bringing the first set of live account takeover (ATO) and referral abuse detection models under structured lifecycle management
  • Successfully ship the production data layers for User 360 and Wallet 360, cleanly feeding real-time context to upstream decision engines
  • Deploy the automated data ingestion, vector indexation, and evaluation framework for digital customer support or internal intelligence agent
  • Ensure all new AI Value Platform integrations dock cleanly into our billion-event stream without introducing data lag or compromising the stability of our transactional core

Hiring Process

  • Application Review – Assessment for skills, alignment, and intent
  • Recruiter Connect – A short conversation to understand you better
  • Functional Round(s) – Deep dive into your approach, craft, and problem-solving
  • Assignment / Simulation Round – A take-home task or live problem-solving exercise to understand how you think and execute in real scenarios
  • Culture & Values Discussion – A conversation to understand our ways of working and how you thrive best
  • Founder Conversation (Optional) – For certain roles and senior levels, you may meet our founders to explore strategic alignment and long-term fit

Work Location

This role is based out of our Bangalore office. We operate as a work-from-office organization where collaboration, speed and trust come alive when teams share the same space.

Benefits

  • Flexible perks to match your lifestyle
  • Unlimited Wellness Leaves
  • Mental Wellness Support – Access to therapy and wellness resources
  • Bi-weekly learning and growth opportunities

Unchain Data provides Web3 data job aggregation as a common good. Jobs are posted by third parties and are not individually verified. Always exercise caution: never download software requested during a hiring process, avoid clicking unfamiliar links in interviews, make sure to verify URLs are legit, and use trusted meeting tools like Google Meet or Zoom.

Hiring Web3 data talent?

Get expert help sourcing, evaluating, and onboarding data professionals.