Role Overview
This position is for a Web Scraping Engineer focused on designing, refining, and operating scalable web scraping solutions that feed essential data pipelines. The role is fully remote, based in India, with core working hours from 11am to 8pm IST—flexible within reason to support work-life balance. You'll be responsible for overcoming anti-bot measures, extracting high-quality data, and ensuring systems remain stable and efficient over time.
Key Responsibilities
- Improve and refactor legacy scraping code to enhance stability, readability, and long-term maintainability.
- Apply software engineering best practices, including modular design, clean coding standards, and peer review processes.
- Implement advanced evasion techniques such as rotating user agents, managing cookies and headers, and leveraging both residential and data center proxies.
- Extract data from dynamic websites, handling JavaScript-heavy content and complex page structures.
- Collaborate with data analysts and internal teams to define data needs, validate outputs, and refine collection strategies.
- Develop monitoring tools and alerting systems to detect failures quickly and enable rapid response.
- Continuously assess system performance, identifying and resolving bottlenecks affecting speed or reliability.
- Recommend improvements in tooling, architecture, and methodology to strengthen data acquisition capabilities.
- Stay informed about changes in web technologies, bot detection methods, and emerging data extraction approaches.
- Document processes and support internal users in effectively integrating scraped data into reporting and analytics workflows.
Required Qualifications
- Proven experience—3+ years—using tools like Selenium, Playwright, or Puppeteer for large-scale web scraping.
- Solid grasp of HTTP protocols, RESTful APIs, HTML parsing, and how browsers render content, including TLS/SSL interactions.
- Hands-on expertise in browser fingerprint spoofing, request signature manipulation, and avoiding detection mechanisms.
- Deep familiarity with managing session states, cookies, headers, and proxy rotation strategies.
- Experience setting up logging, metrics collection, and alerting to maintain system uptime and performance.
- Strong problem-solving skills to diagnose and optimize scraper efficiency, scalability, and resilience.
- Clear communication abilities in English, both with technical peers and non-technical collaborators.
Technology Environment
Work will involve Selenium, Playwright, Puppeteer, HTTP clients, REST APIs, HTML parsing libraries, TLS/SSL handling, browser fingerprint manipulation, request signature control, proxy rotation (residential and data center), and monitoring/alerting systems.
Work Environment & Culture
This is a remote role open to candidates in India, with standard availability between 11am and 8pm IST. Flexibility within this window is supported. The team values ownership, transparency, and continuous learning. Growth is determined by impact, not tenure. You’ll work in an environment that prioritizes trust, personal development, and meaningful contributions.
Benefits
- Competitive salary and comprehensive benefits package
- Vacation time and parental leave
- Reimbursement for learning and professional development
- Regular team events and opportunities for connection
- Focus on work-life balance and personal well-being
- Support for skill mastery and self-driven growth
Equal Opportunity
We are committed to providing equal employment opportunities regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, marital status, disability, gender identity, veteran status, or any other protected characteristic. We foster an inclusive workplace built on respect and fairness.