AWS Data Pipeline

AWS Data Pipeline is IAAS (infrastructure-as-a-service), data pipeline is a web service, it supports data transformation in AWS storage like Amazon RDS, Amazon DynamoDB, Amazon Redshift.

Data Pipeline can be scheduled, and it will get run on schedule time, also it will run in frequency (A schedule can be executed every 5 minutes or 1 hour, every at once, or weekly or monthly or yearly).

AWS Data Pipeline you create, update and delete using the console or AWS CLI

Creation:

aws datapipeline create-pipeline –name TEST_PROCESS –unique-id TEST_PROCESS –tags key=ApplicationGroup,value=Test key=CostCenter,value=Test key=TEST,value=C.IT.1234567.1234 key=FundingSource,value=Test key=Test,value=1234 key=AIN,value=999 key=Landscape,value=TEST

Pipeline definition

aws s3 ls s3://test-t/datapipeline/test_service/jsonpath/test_process.json

Activate new datapipeline

aws datapipeline activate-pipeline –pipeline-id df-00123456ASDFGLKJHGAS

Deactivate existing datapipeline

aws datapipeline deactivate-pipeline –pipeline-id df-00123456ASDFGLKJHGAS

Delete:

Console:

List the Pipelines > Select the pipeline > Click Actions > Delete > Confirmation Click Delete.

CLI:

aws datapipeline delete-pipeline –pipeline-id df-00123456ASDFGLKJHGAS