使用rclone+nginx制作FTP镜像站——NOAA数据库分享的(最终?)解决方案

还是NOAA数据库分享的问题,之前尝试了很多方案、出现了各种各样的问题:

  • 尝试1:用Python录入MySQL数据库,采取了大量的分表和分页优化查询速度,体积占到了3TB,买了一堆固态硬盘加速IO
  • 问题:体积太大、查询很慢、支持用户并发数低、成本太高、需要手动/脚本更新数据
  • ———-
  • 尝试2:用Python将数据格式化后按【站点】-【日期】的格式存放在不同目录下,每次用户填写表格使用PHP直接读取文件下载
  • 问题:体积略大、需要手动/脚本更新数据(其实这个方案用了挺久的,没有啥其他的毛病,但是每月要运行脚本更新数据好累啊)
  • ———-
  • 尝试3:用NextCloud挂载NOAA的FTP服务,然后分享账号和文件
  • 问题:NextCloud不会更新目录,而且文件数量太大经常崩溃(这是在本文要讲的方案之前使用的生产方案,也用了有半年了,不用再手动更新数据,而且还能卖账号!)

所以为了解决数据分享的问题,我又搞了个新方案:【Rclone挂载FTP配合Nginx的目录显示功能实现镜像站点的搭建】

此方案在写本文前已经使用了两个月了,非常非常稳定!如果不是服务器坏掉的话应该可以一直运行下去,也有一个缺点就是没有用户鉴权,其实可以加个nginx密码的,过段时间再说吧(小汤加油,顺便附上他博客:https://ivistang.cloudraft.cn/)。

教程正文:

这里选用的机器是Virmach家的黑五鸡肋鸡代号KVM-LA(真的很鸡肋很折腾 https://www.liujason.com/article/118.html),OS选的是CentOS-7-x64作为演示,其实都一样,看喜好吧。

1. 下载Rclone软件【偷懒的可以直接看2.2的一键包,不看这一步】
根据自己的Linux发行版安装即可,我之前写的一篇安装记录(https://www.liujason.com/article/244.html)是windows的,当时是为了挂载onedrive,Linux也是一样的,去下载即可。
官网下载链接:https://rclone.org/downloads/

这里我是amd64的Linux所以:

curl -O https://downloads.rclone.org/rclone-current-linux-amd64.zip

2. 下载好之后安装(2.1和2.2二选一)
2.1解压和添加到环境变量:

yum install unzip curl screen -y
#全新的minimal系统可能不含unzip和curl,安装了才行;screen是每次必装
unzip rclone-current-linux-amd64.zip
cd rclone-*-linux-amd64
#直接复制到bin里就可以了,记得该权限
sudo cp rclone /usr/bin/
sudo chown root:root /usr/bin/rclone
sudo chmod 755 /usr/bin/rclone
#安装manpage
sudo mkdir -p /usr/local/share/man/man1
sudo cp rclone.1 /usr/local/share/man/man1/
sudo mandb

2.2我在看官网的时候看到了一键包,这里也把代码放上来

yum install unzip curl -y
#全新的minimal系统可能不含unzip和curl,安装了才行
curl https://rclone.org/install.sh | sudo bash
#官网的一键脚本,省心呀

3. 使用rclone挂载FTP
rclone的基本使用方法参见–help,我在文末附录贴出来给大家参考。
我们这里主要使用的就是挂载功能,这里要注意,如果是用的容器而非完全虚拟化的话,需要给容器开启fuse功能(proxmox中的lxc需设置为privileged,总之就是要开启fuse)。

3.1 配置rclone
我们现在来配置rclone来挂载FTP,先输入命令rclone config,然后根据提示进行配置,我这里详细注释了(平时我绝对不会写这么详细的,为了给小汤学习这次写的特别细):

[root@KVM-LA ~]# rclone config
2019/09/12 03:42:06 NOTICE: Config file "/root/.config/rclone/rclone.conf" not found - using defaults
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
#新建一个配置文件
name> noaa
#配置文件名
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
......
10 / FTP Connection
   \ "ftp"
......
Storage> 10
** See help for ftp backend at: https://rclone.org/ftp/ **
#选择FTP
FTP host to connect to
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Connect to ftp.example.com
   \ "ftp.example.com"
host> ftp.ncdc.noaa.gov
#设置ftp服务器地址
FTP username, leave blank for current username, root
Enter a string value. Press Enter for the default ("").
user> anonymous
#设置账号
FTP port, leave blank to use default (21)
Enter a string value. Press Enter for the default ("").
port> 21
#设置端口
FTP password
y) Yes type in my own password
g) Generate random password
y/g> y
Enter the password:
password:
Confirm the password:
password:
#设置密码
Use FTP over TLS (Implicit)
Enter a boolean value (true or false). Press Enter for the default ("false").
tls> 
#设置是否使用TLS加密
Edit advanced config? (y/n)
y) Yes
n) No
y/n> n
Remote config
--------------------
[noaa]
type = ftp
host = ftp.ncdc.noaa.gov
user = anonymous
port = 21
pass = *** ENCRYPTED ***
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
noaa                 ftp

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q
#接下来的就照着我这个填就好了

其中在设置FTP的时候要注意,如果是匿名FTP的话,有三种账号密码需要一个个尝试:
(1)用户名:anonymous 密码:Email
(2)用户名:FTP 密码:FTP或空
(3)用户名:USER 密码:pass
这里NOAA用的是第三个,我一个个试过了,只有这个可以

3.2 挂载rclone到系统存储
就和系统的mount挂载磁盘一样,挂载rclone虚拟盘同样也需要先新建一个目录才行:

sudo -u www mkdir /www/wwwroot/noaa-mirror.cloud.ac.cn/noaa -p
#注意,这里我把noaa这个路径放在了noaa-mirror.cloud.ac.cn网站(nginx)的目录下,这样是为了后续直接可以在网站的子路径中查看
#另外还要注意权限问题,切换一下用户

然后再挂载上去:

screen
#用screen后台运行
rclone mount noaa:/pub/data/ /www/wwwroot/noaa-mirror.cloud.ac.cn/noaa --read-only --copy-links --no-gzip-encoding --no-check-certificate --allow-other --allow-non-empty --umask 000
#格式为rclone mount 配置名称:远程路径 本地路径 --参数
#注意这里挂载的是noaa原镜像的子目录,因为上级目录中有很多是不需要的。

我在执行挂载的时候遇到了错误,实际上是缺失fuse库导致的,添加即可:

failed to mount FUSE fs: fusermount: exec: "fusermount": executable file not found in $PATH
yum install fuse -y

这时候重新挂载,然后退出screen,看看挂载情况:

[root@KVM-LA ~]# df -h
Filesystem       Size  Used Avail Use% Mounted on
/dev/vda1        9.6G  2.3G  6.9G  25% /
devtmpfs         7.8G     0  7.8G   0% /dev
tmpfs            7.8G   16K  7.8G   1% /dev/shm
tmpfs            7.8G   41M  7.8G   1% /run
tmpfs            7.8G     0  7.8G   0% /sys/fs/cgroup
tmpfs            1.6G     0  1.6G   0% /run/user/0
noaa:/pub/data/  1.0P     0  1.0P   0% /www/wwwroot/noaa-mirror.cloud.ac.cn/noaa

再看看路径下的情况,已经全部挂载上了:

cd /www/wwwroot/noaa-mirror.cloud.ac.cn/noaa
ls
.......

另外说一句,当fuse挂载后要卸载100%会卡住,要用lazy模式卸载(umount -l XXX)

——至此挂载FTP到本地磁盘的任务完成——

4. 通过Nginx将数据发布到网络中
4.1 利用新建一个静态网站
这一步很简单,yum也行,一键包也行,或者用AHM/宝塔面板也行,我就不多说了。

4.2 Nginx开启autoindex
虽然开启了网站,但是打开https://noaa-mirror.cloud.ac.cn/noaa/ 页面进去也是404,这是因为没有开启autoindex,导致Nginx自动查询“/noaa/”路径下的index.html文件,然后发现文件找不到,于是返回404错误代码。解决方法是在对应的conf文件中的server段里增加:

location / {
autoindex on;
autoindex_exact_size off; #这里是关闭精确显示大小,就会以MB之类的单位显示,否则会显示bytes Orz
autoindex_localtime on; #这里会使用服务器时间,否则是GMT
}

搞定之后重启Nginx就可以看到效果了

——–附录:rclone帮助文档——–

[root@KVM-LA ~]# rclone --help

Rclone syncs files to and from cloud storage providers as well as
mounting them, listing them in lots of different ways.

See the home page (https://rclone.org/) for installation, usage,
documentation, changelog and configuration walkthroughs.

Usage:
  rclone [flags]
  rclone [command]

Available Commands:
  about           Get quota information from the remote.
  authorize       Remote authorization.
  cachestats      Print cache stats for a remote
  cat             Concatenates any files and sends them to stdout.
  check           Checks the files in the source and destination match.
  cleanup         Clean up the remote if possible
  config          Enter an interactive configuration session.
  copy            Copy files from source to dest, skipping already copied
  copyto          Copy files from source to dest, skipping already copied
  copyurl         Copy url content to dest.
  cryptcheck      Cryptcheck checks the integrity of a crypted remote.
  cryptdecode     Cryptdecode returns unencrypted file names.
  dbhashsum       Produces a Dropbox hash file for all the objects in the path.
  dedupe          Interactively find duplicate files and delete/rename them.
  delete          Remove the contents of path.
  deletefile      Remove a single file from remote.
  genautocomplete Output completion script for a given shell.
  gendocs         Output markdown docs for rclone to the directory supplied.
  hashsum         Produces an hashsum file for all the objects in the path.
  help            Show help for rclone commands, flags and backends.
  link            Generate public link to file/folder.
  listremotes     List all the remotes in the config file.
  ls              List the objects in the path with size and path.
  lsd             List all directories/containers/buckets in the path.
  lsf             List directories and objects in remote:path formatted for parsing
  lsjson          List directories and objects in the path in JSON format.
  lsl             List the objects in path with modification time, size and path.
  md5sum          Produces an md5sum file for all the objects in the path.
  mkdir           Make the path if it doesn't already exist.
  mount           Mount the remote as file system on a mountpoint.
  move            Move files from source to dest.
  moveto          Move file or directory from source to dest.
  ncdu            Explore a remote with a text based user interface.
  obscure         Obscure password for use in the rclone.conf
  purge           Remove the path and all of its contents.
  rc              Run a command against a running rclone.
  rcat            Copies standard input to file on remote.
  rcd             Run rclone listening to remote control commands only.
  rmdir           Remove the path if empty.
  rmdirs          Remove empty directories under the path.
  serve           Serve a remote over a protocol.
  settier         Changes storage class/tier of objects in remote.
  sha1sum         Produces an sha1sum file for all the objects in the path.
  size            Prints the total size and number of objects in remote:path.
  sync            Make source and dest identical, modifying destination only.
  touch           Create new file or change file modification time.
  tree            List the contents of the remote in a tree like fashion.
  version         Show the version number.

Use "rclone [command] --help" for more information about a command.
Use "rclone help flags" for to see the global flags.
Use "rclone help backends" for a list of supported services.

 

说点什么

Please Login to comment
avatar
  Subscribe  
提醒