← Back to all jobs

Data Scientist AI Data LLM Specialist

Eclipse logo

Eclipse

📍 Remote💰Est.$90k - $102k🕐 Posted
Data ScientistRemoteethereuml2
pythonpandasnumpyscikit-learnspacyhugging-face
Apply

Job Description

About Us

Eclipse is building an AI agent-first marketplace that connects intelligence with real-world tasks, starting with data collection and labeling. We are a team backed by top investors including Polychain, Tribe Capital, Placeholder, and DBA.

About the Role

We are seeking a Data Scientist to establish the foundation for how our data is labeled, processed, and prepared for consumption by next-generation Large Language Models (LLMs). Your work will be critical in transforming our raw data collections into valuable, AI-ready datasets.

Responsibilities

  • Develop Data Labeling Strategies: Design and document a formal data annotation strategy, including clear, scalable, and efficient guidelines for labeling our data
  • Define and enforce quality metrics, including inter-annotator agreement
  • Optimize datasets for LLM Consumption: Research and prototype optimal data formats, structures, and pre-processing steps for fine-tuning and training LLMs
  • Establish automated processes to analyze data quality and provide feedback to improve data collection workflows
  • Collaborate closely with engineering team to implement data processing pipelines

Requirements

  • Proven experience as a Data Scientist or Machine Learning Engineer with focus on data quality
  • Strong understanding of data labeling methodologies and annotation platforms
  • Demonstrated experience preparing datasets for LLM training
  • Proficiency in Python and data science libraries (Pandas, NumPy, Scikit-learn, spaCy, Hugging Face)
  • Experience using APIs/SDKs to automate data annotation
  • Excellent communication skills with ability to create clear technical documentation

Nice to Have

  • Experience with audio data processing
  • Familiarity with data annotation tools
  • Knowledge of MLOps principles
  • Experience with large language model data curation and RLHF pipelines

What We Offer

Join a team that believes blockchains should be fast and highly usable. You'll do high-impact work to enhance Ethereum's scalability, with opportunities for flexible, collaborative work across synchronous and asynchronous environments.

Unchain Data provides Web3 data job aggregation as a common good. Jobs are posted by third parties and are not individually verified. Always exercise caution: never download software requested during a hiring process, avoid clicking unfamiliar links in interviews, make sure to verify URLs are legit, and use trusted meeting tools like Google Meet or Zoom.

Hiring Web3 data talent?

Get expert help sourcing, evaluating, and onboarding data professionals.