Integrity: Doing the right things for the right reasons
Agility: Adapting and thriving in a dynamic environment
Teamwork: Combining our strengths to do amazing things
Passion: Channeling enthusiasm to drive excellence
Creativity: Unleashing curiosity to defy the norm
About the role:
As a Software Data Engineer at 1010data, you will be responsible for designing, maintaining, and optimizing large-scale automated ELT processes. Working actively with data scientists and analysts specializing in enterprise data warehousing, you will leverage industry-standard data orchestration tools as well as in-house proprietary scheduling and automation tools to create efficient and reliable ELT jobs which support 1010data’s product offerings and data warehousing needs for our customers. As we incorporate more cloud technologies into our processes, you will be at the forefront of exploring and defining best practices, and helping us transition our products to be more scalable.
As part of the onboarding process, you will learn about 1010data’s proprietary technology stack. Our query engine, query language, database, and data storage layer were all developed and fine-tuned in-house over the lifetime of the company. ELT processes heavily rely on these components, whether they are written in Python and Airflow , K, or our proprietary data orchestration tools. You will be formally trained in the latter as a new 1010data employee. The concepts should be familiar to anyone with exposure to database techniques like normalization/indexing/partitioning, MapReduce, columnar database architecture and distributed systems.
This role is not sponsorable.
What you will take on:
- Taking end-to-end ownership of data products and custom solutions for our clients
- Coordinating with the systems, core, data science, and analytics teams to build and maintain data products and custom solutions for our clients
- Designing and writing automated scripts to preprocess terabytes of data from our partners/clients
- Designing and writing new enterprise-scale ELT/ETL workflows from scratch in Python using Airflow, Docker, Kubernetes, AWS, etc.
- Modifying/redesigning legacy ELT/ETL processes to leverage cutting-edge open source and proprietary technologies
- Ensuring quality, reliability and uptime for critical automated processes
- Migrating our products and processes into the cloud while drastically reducing our in-house data center footprint
What you already have:
- At least 1-2 years of professional experience programming in Python
- Exposure to ETL/ELT pipeline automation
- Exposure to basic database concepts
- Good understanding of Data Engineering, NoSQL databases and database design, distributed systems and/or information retrieval
- Knowledge of Apache Airflow
- Familiarity with functional/vector programming
- DBA experience
- Ability to plan and collect requirements for projects, and interact with the analyst and data science teams
- STEM Bachelor’s required, graduate degree is a big plus
Recently named Best Alternative Data Provider by the HFM European Quant Awards, 1010data transforms Big Data into Smart Insights to activate the high-definition enterprise that can anticipate and respond to change. Our time series-driven collaborative analytics, consumer intelligence and alternative data solutions enable over 900 clients to achieve improved business performance, efficiency and growth quicker, with less risk. The world’s foremost companies, including Sam’s Club, Dollar General, Procter & Gamble, Coca Cola, GSK, 3M, Bank of America and J.P. Morgan, consider 1010data the partner of choice for optimizing company health, mastering consumer touchpoints and digitally transforming operations. 1010data delivers on the promise of Big Data, and we’re just getting started.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.