Data Scientist AI Data LLM Specialist
Eclipse
Job Description
About Us
Eclipse is building an AI agent-first marketplace that connects intelligence with real-world tasks, starting with data collection and labeling. We are a team backed by top investors including Polychain, Tribe Capital, Placeholder, and DBA.
About the Role
We are seeking a Data Scientist to establish the foundation for how our data is labeled, processed, and prepared for consumption by next-generation Large Language Models (LLMs). Your work will be critical in transforming our raw data collections into valuable, AI-ready datasets.
Responsibilities
- Develop Data Labeling Strategies: Design and document a formal data annotation strategy, including clear, scalable, and efficient guidelines for labeling our data
- Define and enforce quality metrics, including inter-annotator agreement
- Optimize datasets for LLM Consumption: Research and prototype optimal data formats, structures, and pre-processing steps for fine-tuning and training LLMs
- Establish automated processes to analyze data quality and provide feedback to improve data collection workflows
- Collaborate closely with engineering team to implement data processing pipelines
Requirements
- Proven experience as a Data Scientist or Machine Learning Engineer with focus on data quality
- Strong understanding of data labeling methodologies and annotation platforms
- Demonstrated experience preparing datasets for LLM training
- Proficiency in Python and data science libraries (Pandas, NumPy, Scikit-learn, spaCy, Hugging Face)
- Experience using APIs/SDKs to automate data annotation
- Excellent communication skills with ability to create clear technical documentation
Nice to Have
- Experience with audio data processing
- Familiarity with data annotation tools
- Knowledge of MLOps principles
- Experience with large language model data curation and RLHF pipelines
What We Offer
Join a team that believes blockchains should be fast and highly usable. You'll do high-impact work to enhance Ethereum's scalability, with opportunities for flexible, collaborative work across synchronous and asynchronous environments.
Unchain Data provides Web3 data job aggregation as a common good. Jobs are posted by third parties and are not individually verified. Always exercise caution: never download software requested during a hiring process, avoid clicking unfamiliar links in interviews, make sure to verify URLs are legit, and use trusted meeting tools like Google Meet or Zoom.
Similar Jobs
Data Analytics AI Engineer
Fireblocks · New York, NY, USA
Senior Machine Learning Engineer
Robinhood · Menlo Park, CA (Hybrid)
Machine Learning Engineer
Robinhood · Bellevue, WA
Senior Data Scientist, Product (Crypto)
Robinhood · Menlo Park, CA / New York, NY (Hybrid)
Machine Learning Engineer - Fraud Risk
Rain · New York, NY, USA
Hiring Web3 data talent?
Get expert help sourcing, evaluating, and onboarding data professionals.