I. POSITION INTRODUCTION
We are looking for a professional Data Scraping specialist, capable of operating a large-scale data collection system, ensuring stability, accuracy and efficiency.
1. Professional Scraping System Development
Technical Requirements:
System Architecture:
- Design cross-platform Python crawling scripts
- Build scalable systems
- Develop parallel crawling solutions
- Manage large, multi-threaded data streams
Technologies:
- Scrapy, BeautifulSoup
- Selenium
- Asyncio, Multiprocessing
- Proxy management
- IP rotation techniques
2. Data Processing and Normalization
Processing Methods:
- Develop API data cleaning processes
- Data transformation algorithms
- Integrity checks
- Remove noisy data
Tools:
- Pandas
- Data validation techniques
- Machine Learning preprocessing
3. Database Management
Specialized Skills:
Advanced SQL:
- Complex queries
- Performance optimization
4. Monitoring & Optimization
Strategy:
- Manage scraping system operations.
- Track scraping performance
- Challenge handling:
- IP blocking
- Speed limiting
- CAPTCHA
II. PROFESSIONAL REQUIREMENTS
Education
- Bachelor's degree (GPA > 3.0)
- Major:
- Data science
- Computer engineering
- Data related fields
- English: TOEIC > 700 of IELTS >5.5
Technical Skills
Python Ecosystem
- Asyncio, Multiprocessing
- Data cleaning techniques
- Machine Learning preprocessing
- Advanced error handling
Database & Big Data
- SQL (Intermediate to Advanced)
- NoSQL database management
- PySpark
- Data warehousing
In-depth Experience
- Minimum 1-2 years
- Project implementation:
- Web scraping
- Automatic data processing
- Big data crawling
III. SOFT SKILLS
System analysis
Problem solving
Independent & team working
Time management
Logical thinking
IV. NICE TO HAVE EXPERIENCES
Big Data experience
Data pipeline design
Working with diverse APIs
Professional certifications
Creativity and initiative in proposing ideas
Chia sẻ
Bình luận