Data Engineer

BTSE

📍 Hong Kong, Hong Kong SAR💰Competitive🕐 Posted June 3, 2026

Data EngineerRemote

pythonsqlkafkawebsockets

Apply

Job Description

About BTSE

BTSE Group is a global leader in fintech and blockchain technology, anchored by three core business pillars: Exchange, Payments, and Infrastructure Development. Serving over 100 corporate clients worldwide, we provide white-label exchange and payment solutions. Our offerings encompass everything from exchange infrastructure hosting and development to custody, wallets, payments, blockchain integration, trading, and more.

About The Opportunity

We're building an AI-powered research platform for institutional investors. Our platform turns vast amounts of market, alternative, and proprietary data into actionable intelligence — powered by AI agents that depend on clean, reliable, real-time data to do their job.

We need someone to own data. Not manage a team that does data. Own it — from finding the right sources, to getting them flowing, to making sure they stay healthy at scale.

Today we ingest from hundreds of sources. That number is growing fast. The sources are diverse: real-time market feeds, regulatory filings, on-chain blockchain data, news, social sentiment, alternative datasets, and proprietary client data. Some are free APIs. Some are $10K/month enterprise contracts. Some are clients pushing their own data into our platform. Every one of them is different, and most of them will break in ways you don't expect.

You'll evaluate vendors, negotiate deals, build integrations, monitor quality, track costs, and make the call on what's worth paying for. When something breaks at 2 AM, you'll know why before the alert finishes firing.

This is an end-to-end ownership role. No handoffs.

Responsibilities

Build and maintain integrations with a large and growing number of external data sources — APIs, WebSockets, file drops, streams, scrapers, and formats you haven't seen yet
Evaluate and compare data vendors across quality, reliability, coverage, cost, and terms of service
Negotiate contracts and manage commercial relationships with data providers
Design and operate high-throughput ingestion pipelines handling mixed workloads (real-time, near-real-time, batch, event-driven)
Build monitoring that tells you — before anyone else — when data is late, wrong, incomplete, or drifting
Manage data quality at scale: anomaly detection, cross-source validation, schema drift detection, gap filling
Handle both structured data (time-series, tabular) and unstructured data (documents, text, images) with appropriate extraction and storage
Track costs per source, usage per consumer, and ROI — recommend what to keep, upgrade, or cancel
Build tooling that makes adding the next data source faster than the last one
Use AI tools aggressively in your daily work — for code generation, testing, documentation, anomaly analysis, and anything else that makes you faster

Requirements

You've done this before:

5+ years building data pipelines that run in production, 24/7, with real SLAs
Deep hands-on experience with SQL databases and time-series data
Python as your primary language, comfortable with async programming
You've integrated with dozens of external APIs and dealt with the reality of unreliable vendors, changing schemas, rate limits, and bad documentation
You've built monitoring and alerting for data systems — not as an afterthought but as part of how you work

You Think About The Whole Picture:

You don't just connect to an API. You think about what happens when it goes down, when the schema changes, when the data is wrong, when the bill doubles
You understand that data has a cost and a value, and not every source is worth keeping
You've worked with data vendors commercially — contracts, pricing tiers, usage negotiations

You Use AI Daily:

AI coding tools are part of your workflow today, not something you're curious about
You can articulate specifically how AI makes you faster and where it doesn't help
You'd be frustrated if you couldn't use AI in your work

Nice to Have

Experience with financial or crypto market data
Experience with streaming systems (Kafka or similar) at scale
Vector database or embedding pipeline experience
Experience with unstructured data extraction (PDFs, documents, NLP)

Benefits

Senior individual contributor role with full ownership of the data domain
Direct access to leadership — no bureaucracy, fast decisions
AI tools provided and encouraged across all work
Remote-friendly, async-first
Compensation commensurate with experience

Unchain Data provides Web3 data job aggregation as a common good. Jobs are posted by third parties and are not individually verified. Always exercise caution: never download software requested during a hiring process, avoid clicking unfamiliar links in interviews, make sure to verify URLs are legit, and use trusted meeting tools like Google Meet or Zoom.

Similar Jobs