mlx.data.core.AWSFileFetcher.__init__#
- AWSFileFetcher.__init__(self: mlx.data._c.core.AWSFileFetcher, bucket: str, endpoint: str = '', region: str = '', prefix: os.PathLike = '', local_prefix: os.PathLike = '', ca_bundle: str = '', virtual_host: bool = False, verify_ssl: bool = True, connect_timeout_ms: int = 1000, num_retry_max: int = 10, num_connection_max: int = 25, buffer_size: int = 104857600, num_threads: int = 4, num_prefetch_max: int = 1, num_prefetch_threads: int = 1, num_kept_files: int = 0, access_key_id: str = '', secret_access_key: str = '', session_token: str = '', expiration: str = '', verbose: bool = False) None #
Make an AWSFileFetcher to fetch files from S3.
- Parameters:
bucket (str) – The S3 bucket to use.
endpoint (str) – The endpoint to use.
region (str) – The region to use.
prefix (str) – The remote prefix to use for all files requested. (default: ‘’)
local_prefix (str) – The local cache directory to save the downloaded files in. (default: ‘’)
ca_bundle (str) – The path to a certificate authority file for establishing SSL/TLS connections. The environment variable AWS_CA_BUNDLE can also be used instead. (default: ‘’)
virtual_host (bool) – Whether to use virtual hosted style urls (ie the bucket in the domain part of the url). (default: false)
verify_ssl (bool) – Whether we should verify SSL certificates. (default: true)
connect_timeout_ms (int) – Assume it is a timeout after that many milliseconds. (default: 1000)
num_retry_max (int) – How many times should we attempt to fetch a file before deciding that we failed. The retry strategy is exponential backoff. (default: 10)
num_connection_max (int) – Specifies the maximum number of HTTP connections to the server. (default: 25)
buffer_size (int) – Fetch the files in parts of that size. (default: 100MB)
num_threads (int) – How many parts to fetch in parallel for each file. (default: 4)
num_prefetch_max (int) – How many files to prefetch from the prefetch list. (default: 1)
num_prefetch_threads (int) – How many files to prefetch in parallel from the prefetch list. (default: 1)
num_kept_files (int) – How many files to keep in the local cache. If 0 we keep everything however if the files are larger than our local disk this should be set to a positive number. (default: 0)
access_key_id (str) – Set the AWS access key id to authenticate to the remote service. (default: ‘’)
secret_access_key (str) – Set the AWS secret access key id to authenticate to the remote service. (default: ‘’)
session_token (str) – Set the AWS session token to authenticate to the remote service. (default: ‘’)
expiration (str) – A date string defining the expiration of the authentication credentials (default: ‘’)
verbose (bool) – Defines whether the file fetcher should write information messages to the standard output. (default: false)