Configuring rclone for Cloud Storage Transfers
Introduction
Globusis the recommended tool for data transfer, but it is not compatible with all cloud storage locations. In particular, Globus cannot connect to Dropbox, which is ASU’s main cloud storage platform. Another powerful transfer tool that can be used is rclone.
Before you can transfer with rclone
, the different endpoints that you want to transfer data to and from need to be configured. This is a one time step for each endpoint, but it can be confusing. This guide will walk you through configuring cloud storage platforms for transfer to and from the RC supercomputers.
Configurations
Configuring rclone for Google Drive
Google Drive can be accessed from the supercomputer using rclone
. It is highly recommended that these steps are followed from a Virtual Desktop session in our web portal.
module load rclone/1.58.1
rclone config
This will now lead to a multi-step interactive configuration process to generate or update the configuration file $HOME/.config/rclone/rclone.conf
. These steps are well documented by the rclone devs but shown here as well for the configuration of a shared drive rc-drive
. For reference, these prompt steps will be partitioned.
Some users may wish to use a virtual desktop session as provided by our webapp, as one of the steps (prompt 10) will provide an authentication link and may attempt to open a browser window. This link may also be opened locally on the user’s computer. The guide assumes that the user is in a virtual desktop session.
Creating a Client ID and a Client Secret through the Google API Console
The Green Info Box.
For those that are ready for production, rerun rclone config
and edit the previous remote’s client_id
and client_secret
(Prompts 4 and 5). The steps below must be followed first (taken from the rclone Google Drive docs). Note that you only need one of these for all shared drives.
Here is how to create your own Google Drive client ID for rclone
:
Log into the Google API Console with your Google account. It doesn’t matter what Google account you use. (It need not be the same account as the Google Drive you want to access)
Select a project or create a new project.
Under “ENABLE APIS AND SERVICES” search for “Drive”, and enable the “Google Drive API”.
Click “Credentials” in the left-side panel (not “Create credentials”, which opens the wizard), then “Create credentials”, then “OAuth client ID”. It will prompt you to set the OAuth consent screen product name, if you haven’t set one already.
Choose an application type of “Desktop App”, and click “Create”. (the default name is fine)
It will show you a client ID and client secret. Use these values in
rclone config
to add a new remote or edit an existing remote.
Creating a Client ID and Secret through the Google API Console
Prompt 4 – Client ID – Response: Client ID from the Green Info Box or None
Prompt 5 – Client Secret – Response: Client Secret from the Green Info Box or None
Prompt 12 – Choose Shared Drive – Response: <integer_associated_with_shared_drive>
Prompt 1 – New Remote – Response: n
The first prompt looks like this:
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q>
We respond with n
, as the rclone configuration does not yet exist for our shared drive.
Prompt 2 – Name Remote – Response: <your-project-name>
The second prompt:
name>
This is arbitrary, but it’s wise to use the shared drive’s name. In this case, rc-drive
.
Prompt 3 – Choose Cloud – Response: drive
The third prompt lists all the available cloud backends. Currently, there are 46 enumerated options, but for simplicity, only the relevant option (# 17) is shown (N.B. in the future the number associated with Google Drive may change):
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
...
17 / Google Drive
\ "drive"
...
Storage>
drive
or 17
are both proper responses.
Prompt 4 – Client ID – Response: Client ID from the Green Info Box or None
This prompt and the next involve creating an application on Google Cloud and may lead to improved throughput. Note that the steps for this are documented above in the green Info Box and also in the video documentation.
Google Application Client Id
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_id>
So the response here is to paste in the Client ID.
Prompt 5 – Client Secret – Response: Client Secret from the Green Info Box or None
This prompt and the previous involve creating an application on Google Cloud and may lead to improved throughput. Note that the steps for this are documented above in the green Info Box and also in the video documentation.
Google Application Client Secret
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_secret>
So the response here is to paste in the Client Secret.
Prompt 6 – Remote Permissions (Scopes) – Response: 1
Another big prompt here, but rclone
needs full access to all files. We will specify that rclone
is limited to our shared drive by Prompt 13. Scopes are addressed here by the rclone devs and defined here by Google.
Scope that rclone should use when requesting access from drive.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / Full access all files, excluding Application Data Folder.
\ "drive"
2 / Read-only access to file metadata and file contents.
\ "drive.readonly"
/ Access to files created by rclone only.
3 | These are visible in the drive website.
| File authorization is revoked when the user deauthorizes the app.
\ "drive.file"
/ Allows read and write access to the Application Data folder.
4 | This is not visible in the drive website.
\ "drive.appfolder"
/ Allows read-only access to file metadata but
5 | does not allow any access to read or download file content.
\ "drive.metadata.readonly"
scope>
drive
or 1
are both sufficient responses.
Prompt 7 – Root Folder ID – Response: None
We leave this next prompt’s response blank, as we want to work from the root of our shared drive.
ID of the root folder
Leave blank normally.
Fill in to access "Computers" folders. (see docs).
Enter a string value. Press Enter for the default ("").
root_folder_id>
We accept the default empty string (just hit the return key).
Prompt 8 – Service Account File – Response: None
Another advanced Google Cloud application feature, we ignore. Service accounts may be used to automate certain tasks for users.
Service Account Credentials JSON file path
Leave blank normally.
Needed only if you want use SA instead of interactive login.
Enter a string value. Press Enter for the default ("").
service_account_file>
We accept the default empty string (just hit the return key).
Prompt 9 – Enter Advanced Config – Response: n
The basic configuration has been done by this point, and rclone
offers to make it longer. The advanced configuration is optional and unrecommended.
Edit advanced config? (y/n)
y) Yes
n) No
y/n>
n
is the recommended response.
Prompt 10 – Auto Config – Response: n
We now inform rclone
that we are on the supercomputer. Responding y
here will lead to a remote browser session, which is not an issue if using a virtual desktop session as provided by our webapp. With ssh
, n
is recommended and assumed instead.
Remote config
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine or Y didn't work
y) Yes
n) No
y/n>
We respond with n
.
Prompt 11 – Configure Shared Drive – Response: y
This prompt configures rclone
to only associate with a shared drive. Note that rclone
refers to the shared drive as a team drive
.
Configure this as a team drive?
y) Yes
n) No
y/n>
y
is the recommended response.
Prompt 12 – Choose Shared Drive – Response: <integer_associated_with_shared_drive>
Assuming y
was passed to Prompt 11, rclone
retrieves a list of shared drives (referred to as team drives) from the user’s Google Drive.
Fetching team drive list...
Choose a number from below, or type in your own value
1 / rc-drive
\ "0XXXXXXXXXXXXXXXXXX"
Enter a Team Drive ID>
To choose the shared drive noted in the example, either the integer associated with the shared drive or the alphanumeric string may be used as responses, that is 1
or 0XXXXXXXXXXXXXXXXXX
.
Prompt 13 – Summary – Response: y
rclone
then summarizes the configuration and asks for confirmation. The shared drive is specified by its alphanumeric ID from Prompt 12, and the access tokens are saved in the JSON
structure token
.
--------------------
[rc-drive]
type = drive
scope = drive
token = {"access_token":"XXX","token_type":"Bearer","refresh_token":"1//XXX","expiry":"2019-10-15T16:38:26.358522694-07:00"}
team_drive = 0XXXXXXXXXXXXXXXXXX
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d>
y
confirms the summary.
Prompt 14 – Finished – Response: q
The configuration is complete, and rclone
loops back to the first prompt but with an existing configuration.
Current remotes:
Name Type
==== ====
rc-drive drive
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q>
Assuming there are no more remotes to configure, q
may be passed to the prompt.
Post Prompt
Recalling that this was all done within an interactive
session, the session should now be closed.
From here, rclone
should be fully configured to interact with a previously created shared drive for a research project, and may now be used within new interactive
sessions or preferably sbatch
job submissions. When first learning how to use rclone
, interactive
sessions are prudent, but once the archiving section of the workflow is figured out, sbatch
is highly recommended.
Be very careful with rclone
from here on out. Verify that rclone
remotes only have access to the shared drives created for respective research projects. Do not use rclone
subcommands (e.g. lsd
, ls
, copy
, purge
, or delete
) without knowing their downstream effects first. Test before application, as any files lost may be unrecoverable!
Occasionally, the rclone config
generated file ~/.config/rclone/rclone.conf
may be inadvertently clobbered. If this happens, please open a ticket with us and we’ll try to recover the configuration. Otherwise after creating the conf file, you may benefit from creating a backup copy outside of the ~/.config/rclone
directory.
Configuring rclone for Dropbox
This section documents how to configure and use the command-line tool rclone
on the supercomputer for use with your ASU-provided Dropbox account.
Be wary of storing files and data on Dropbox as a long-term storage solution. Also, please note that in practice, Dropbox will only allow files that are smaller than 350 GiB (~375.8 GB, rounded) to be uploaded via rclone
.
Instructions on Configuration
Log into the supercomputer
Start a Virtual Desktop session under the Interactive Apps
Use lightwork partition and public QoS
Once the desktop is available, launch the session
Open the terminal application
module load rclone
Rclone commands for Configuration
rclone config
After starting the config you will be prompted to create a new “remotes” location with the “n” option
Fill in the name field (this can be any name) and hit Enter
Current remotes:
Name Type
==== ====
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> n
name> mydropbox
Select “dropbox” for the storage connection and hit Enter
Option Storage.
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value.
1 / 1Fichier
\ "fichier"
.
.
.
12 / Dropbox
\ "dropbox"
.
.
.
Storage> dropbox
Leave the client_id blank and hit Enter
Leave the client_secret blank and hit Enter
Option client_id.
OAuth Client Id.
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_id>
Option client_secret.
OAuth Client Secret.
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_secret>
Use "n" and hit Enter
Edit advanced config?
y) Yes
n) No (default)
y/n> n
Use "Y" to use auto-config (which is the default) to complete the Rclone configuration.
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine
y) Yes (default)
n) No
y/n> y
A browser window will open, click “Allow”
Once completed, this should be set up and available via the command line. To check you the following command
rclone listremotes
Copy files and directories to Dropbox with the Command Line
Use the rclone
copy
subcommand:
rclone copy ~/test dropbox:test
If the directory "test" does not exist in Dropbox, Rclone will create it when specified as above. Some recommended performance and logging flags are:
--transfers=2*N
whereN
is the number of cores allocated,--checkers=4*N
whereN
is the number of cores allocated,-P -vv --log-file="<a-unique-log-file-name>"
to enable Progress information to be printed, increased logging verbosity, and a unique log file to store that logging information (very important for ensuring successful transfers),--fast-list --use-mmap
two options likely to improve performance (caveats on fast list, details on mmap)--dropbox-chunk-size=148M
increase performance by better-utilizing memory caching (dropbox docs)
Quickly Verify transfers
Verify if the directory "test" exists in Dropbox by listing the directories using the command:
rclone lsd dropbox:
-1 2022-09-26 12:05:29 -1 test
You can view the contents of "test" directory using the following command:
rclone ls --max-depth 1 dropbox:test
The flag, --max-depth 1
is included because the sub command ls
will list ALL subdirectories AND files by default.
Synchronize live directories with Dropbox
The Destination is updated to match the Source, including deleting files if necessary.
The sync subcommand will mirror a source directory on the destination. As a result, if anything already exists in the destination path, those data are subject to being updated or deleted. This is a good tool to use for directories that are actively utilized and that have cruft uploaded from a previous upload.
rclone sync SOURCE remote:DESTINATION
Given that sync
involves deletion on the destination, it may be good to first test the sync first with the --dry-run
or use the --interactive
/-i
flag to avoid data loss.
Performance and logging flags are the same as with the copy
command, but are included below:
--transfers=2*N
whereN
is the number of cores allocated,--checkers=4*N
whereN
is the number of cores allocated,-P -vv --log-file="<a-unique-log-file-name>"
to enable Progress information to be printed, increased logging verbosity, and a unique log file to store that logging information (very important for ensuring successful transfers),--fast-list --use-mmap
two options likely to improve performance (caveats on fast list, details on mmap)--dropbox-chunk-size=148M
increase performance by better-utilizing memory caching (dropbox docs)
Mount Dropbox Using Rclone In Linux for GUI use
First, create a mount point to mount Dropbox. For example, create a mount point named "dropbox" in your$HOME
directory.
$ mkdir ~/dropbox
Next, mount the Dropbox using Rclone as shown below:
$ rclone mount mydropbox: ~/dropbox/
Here, "mydropbox" is the remote name, and "dropbox" is the mount point. Replace these values with your own.
Don't forget to add colon (:) after the remote's name.
Under “Application” → “File Manager,” you should see a Dropbox mount point with the following icon