You can configure an S3 Data Connector to import files from an AWS Bucket, optionally filtering what is imported to only consider particular folders or particular file types.
In order to do this, you will require:
- the name of the bucket
- the code for the region that it is hosted in (eg. "eu-central-1")
- an Access key ID
- a Secret access key
Configuration in Amazon
To find the first two values, go to https://s3.console.aws.amazon.com/s3/buckets/ and locate the bucket that you want to import from.
The bucket name that you selected is the first piece of information required ("my-documents" in the screenshot above).
The region code is displayed under Region in the Bucket overview section, it is the last section displayed (the hyphen-delimited code; "eu-central-1" in the screenshot above).
To create the authentication credentials (the Access key ID and Secret access key), go to https://console.aws.amazon.com/iam/home#/users and click on Add user.
Enter a user name that relates to the import task (eg. "s3-data-connector-users" or "curiosity") and ensure that Programmatic access is enabled -
Click Next: Permissions and then Add user to group and ensure that the bucket that you want to import files from is selected -
Click Next: Tags and optionally enter a Notes field value to document the purpose of this access key for future reference -
Click Next: Review and then Create user.
The success screen will reveal the Access key ID and Secret access key values that are required for the S3 Import configuration in Curiosity.
Within Curiosity, it will be necessary to configure a Data Connector. To do so, click the menu button at the top left, then click Connect Sources and then S3 Bucket.
Click + Add, enter descriptive names for the "Name" and "Source", and paste in the Bucket, Access Key ID, Secret Key, and Region Name values -
Optionally enter names of folders and / or content types by typing individual entries into the text boxes and clicking the + button to record the value and present a text box for the next value.
The default schedule for the import task is for it to run at 1am every night. For information about specifying a different schedule and about running tasks outside of their schedule, click here.