We are looking for an experienced Cloud Data Engineer to join our Data Team to design, implement & maintain our Data Lake and Data Warehouse within AWS. This is an exciting chance to be able to both help design and implement the Data Architecture for TPConnects. A strong background in AWS microservices such as AWS Glue, S3 storage, Athena, Dynamo, DMS, Quicksight, Redshift, Firehose, Kinesis steams, Sagemaker, Lambda etc. is required.
TPConnects is an IATA NDC Dual Level 4 certified IT vendor and travel aggregator providing an offer and order management platform which can be consumed by travel agents and third parties wishing to sell air and land content. Domain knowledge of existing travel technology providers (e.g. Amadeus and Sabre) is a plus, however, this is not essential.
- Can start within a 1 month notice period
- Fluent in English with strong verbal and written communication skills
- Degree in Computer Science or related field
- 2+ years of experience designing, managing, and implementing data lakes and warehouses in AWS
- A strong understanding of the data engineering stack within AWS and which tool is best for different challenges
- Experience with AWS Glue, Lambda, Redshift, Kinesis Streams etc.
- 3+ years’ of python experience for data engineering
- Experience using python spark with parquet files and how to optimise them (partitioning, bucketing, broadcasting etc.)
- Understanding of CDC pipelines & STAR schema design
- Advanced SQL and Postgres knowledge
- Experience in automating database quality and validation checks
- Ability to document and catalogue datasets for downstream consumers
- Some experience with writing terraform code to deploy & maintain python glue jobs
- Experience with data modelling at the schema level
- Implementing production level machine learning models using Amazon Sagemaker
- NoSQL experience (e.g. DynamoDB, MongoDB etc.)
- Experience creating data APIs for both consumption and ingestion
- Helping design, implement & maintain the data lake/warehouse architecture.
- Using python to change raw data to parquet files, transform and then clean them.
- Working with data scientists, analysts and developers to create a pipeline which works well for all parties.
- The ability to solve data quality issues at the fundamental level by working with developers.
- Bringing in new datasets with the ability to get the most value out of them.
- Working with consumers to create requirements to feed into outputs in the data warehouse