Data Scientist AI Data LLM Specialist

Eclipse

📍 Remote💰Est.$90k - $102k🕐 Posted November 11, 2025

Data ScientistRemoteethereuml2

pythonpandasnumpyscikit-learnspacyhugging-face

Apply

Job Description

About Us

Eclipse is building an AI agent-first marketplace that connects intelligence with real-world tasks, starting with data collection and labeling. We are a team backed by top investors including Polychain, Tribe Capital, Placeholder, and DBA.

About the Role

We are seeking a Data Scientist to establish the foundation for how our data is labeled, processed, and prepared for consumption by next-generation Large Language Models (LLMs). Your work will be critical in transforming our raw data collections into valuable, AI-ready datasets.

Responsibilities

Develop Data Labeling Strategies: Design and document a formal data annotation strategy, including clear, scalable, and efficient guidelines for labeling our data
Define and enforce quality metrics, including inter-annotator agreement
Optimize datasets for LLM Consumption: Research and prototype optimal data formats, structures, and pre-processing steps for fine-tuning and training LLMs
Establish automated processes to analyze data quality and provide feedback to improve data collection workflows
Collaborate closely with engineering team to implement data processing pipelines

Requirements

Proven experience as a Data Scientist or Machine Learning Engineer with focus on data quality
Strong understanding of data labeling methodologies and annotation platforms
Demonstrated experience preparing datasets for LLM training
Proficiency in Python and data science libraries (Pandas, NumPy, Scikit-learn, spaCy, Hugging Face)
Experience using APIs/SDKs to automate data annotation
Excellent communication skills with ability to create clear technical documentation

Nice to Have

Experience with audio data processing
Familiarity with data annotation tools
Knowledge of MLOps principles
Experience with large language model data curation and RLHF pipelines

What We Offer

Join a team that believes blockchains should be fast and highly usable. You'll do high-impact work to enhance Ethereum's scalability, with opportunities for flexible, collaborative work across synchronous and asynchronous environments.

Unchain Data provides Web3 data job aggregation as a common good. Jobs are posted by third parties and are not individually verified. Always exercise caution: never download software requested during a hiring process, avoid clicking unfamiliar links in interviews, make sure to verify URLs are legit, and use trusted meeting tools like Google Meet or Zoom.

Similar Jobs