add xdev/xvol indexing guards

This commit is contained in:
ed 2022-08-03 22:20:28 +02:00
parent 90555a4cea
commit 680f8ae814
7 changed files with 77 additions and 31 deletions

1
.gitignore vendored
View file

@ -13,6 +13,7 @@ copyparty.egg-info/
/py2/
/sfx/
/unt/
/log/
# ide
*.sublime-workspace

View file

@ -56,10 +56,11 @@ try the **[read-only demo server](https://a.ocv.me/pub/demo/)** 👀 running fro
* [searching](#searching) - search by size, date, path/name, mp3-tags, ...
* [server config](#server-config) - using arguments or config files, or a mix of both
* [ftp-server](#ftp-server) - an FTP server can be started using `--ftp 3921`
* [file indexing](#file-indexing)
* [exclude-patterns](#exclude-patterns)
* [periodic rescan](#periodic-rescan) - filesystem monitoring;
* [upload rules](#upload-rules) - set upload rules using volume flags
* [file indexing](#file-indexing) - enables dedup and music search ++
* [exclude-patterns](#exclude-patterns) - to save some time
* [filesystem guards](#filesystem-guards) - avoid traversing into other filesystems
* [periodic rescan](#periodic-rescan) - filesystem monitoring
* [upload rules](#upload-rules) - set upload rules using volflags
* [compress uploads](#compress-uploads) - files can be autocompressed on upload
* [database location](#database-location) - in-volume (`.hist/up2k.db`, default) or somewhere else
* [metadata from audio files](#metadata-from-audio-files) - set `-e2t` to index tags on upload
@ -311,7 +312,7 @@ examples:
* `u1` can open the `inc` folder, but cannot see the contents, only upload new files to it
* `u2` can browse it and move files *from* `/inc` into any folder where `u2` has write-access
* make folder `/mnt/ss` available at `/i`, read-write for u1, get-only for everyone else, and enable accesskeys: `-v /mnt/ss:i:rw,u1:g:c,fk=4`
* `c,fk=4` sets the `fk` volume-flag to 4, meaning each file gets a 4-character accesskey
* `c,fk=4` sets the `fk` volflag to 4, meaning each file gets a 4-character accesskey
* `u1` can upload files, browse the folder, and see the generated accesskeys
* other users cannot browse the folder, but can access the files if they have the full file URL with the accesskey
@ -658,7 +659,9 @@ an FTP server can be started using `--ftp 3921`, and/or `--ftps` for explicit T
## file indexing
file indexing relies on two database tables, the up2k filetree (`-e2d`) and the metadata tags (`-e2t`), stored in `.hist/up2k.db`. Configuration can be done through arguments, volume flags, or a mix of both.
enables dedup and music search ++
file indexing relies on two database tables, the up2k filetree (`-e2d`) and the metadata tags (`-e2t`), stored in `.hist/up2k.db`. Configuration can be done through arguments, volflags, or a mix of both.
through arguments:
* `-e2d` enables file indexing on upload
@ -671,7 +674,7 @@ through arguments:
* `-e2vu` patches the database with the new hashes from the filesystem
* `-e2vp` panics and kills copyparty instead
the same arguments can be set as volume flags, in addition to `d2d`, `d2ds`, `d2t`, `d2ts`, `d2v` for disabling:
the same arguments can be set as volflags, in addition to `d2d`, `d2ds`, `d2t`, `d2ts`, `d2v` for disabling:
* `-v ~/music::r:c,e2dsa,e2tsr` does a full reindex of everything on startup
* `-v ~/music::r:c,d2d` disables **all** indexing, even if any `-e2*` are on
* `-v ~/music::r:c,d2t` disables all `-e2t*` (tags), does not affect `-e2d*`
@ -685,7 +688,7 @@ note:
### exclude-patterns
to save some time, you can provide a regex pattern for filepaths to only index by filename/path/size/last-modified (and not the hash of the file contents) by setting `--no-hash \.iso$` or the volume-flag `:c,nohash=\.iso$`, this has the following consequences:
to save some time, you can provide a regex pattern for filepaths to only index by filename/path/size/last-modified (and not the hash of the file contents) by setting `--no-hash \.iso$` or the volflag `:c,nohash=\.iso$`, this has the following consequences:
* initial indexing is way faster, especially when the volume is on a network disk
* makes it impossible to [file-search](#file-search)
* if someone uploads the same file contents, the upload will not be detected as a dupe, so it will not get symlinked or rejected
@ -694,6 +697,14 @@ similarly, you can fully ignore files/folders using `--no-idx [...]` and `:c,noi
if you set `--no-hash [...]` globally, you can enable hashing for specific volumes using flag `:c,nohash=`
### filesystem guards
avoid traversing into other filesystems using `--xdev` / volflag `:c,xdev`, skipping any symlinks or bind-mounts to another HDD for example
and/or you can `--xvol` / `:c,xvol` to ignore all symlinks leaving the volume's top directory, but still allow bind-mounts pointing elsewhere
**NB: only affects the indexer** -- users can still access anything inside a volume, unless shadowed by another volume
### periodic rescan
filesystem monitoring; if copyparty is not the only software doing stuff on your filesystem, you may want to enable periodic rescans to keep the index up to date
@ -705,7 +716,7 @@ uploads are disabled while a rescan is happening, so rescans will be delayed by
## upload rules
set upload rules using volume flags, some examples:
set upload rules using volflags, some examples:
* `:c,sz=1k-3m` sets allowed filesize between 1 KiB and 3 MiB inclusive (suffixes: `b`, `k`, `m`, `g`)
* `:c,df=4g` block uploads if there would be less than 4 GiB free disk space afterwards
@ -727,16 +738,16 @@ you can also set transaction limits which apply per-IP and per-volume, but these
files can be autocompressed on upload, either on user-request (if config allows) or forced by server-config
* volume flag `gz` allows gz compression
* volume flag `xz` allows lzma compression
* volume flag `pk` **forces** compression on all files
* volflag `gz` allows gz compression
* volflag `xz` allows lzma compression
* volflag `pk` **forces** compression on all files
* url parameter `pk` requests compression with server-default algorithm
* url parameter `gz` or `xz` requests compression with a specific algorithm
* url parameter `xz` requests xz compression
things to note,
* the `gz` and `xz` arguments take a single optional argument, the compression level (range 0 to 9)
* the `pk` volume flag takes the optional argument `ALGORITHM,LEVEL` which will then be forced for all uploads, for example `gz,9` or `xz,0`
* the `pk` volflag takes the optional argument `ALGORITHM,LEVEL` which will then be forced for all uploads, for example `gz,9` or `xz,0`
* default compression is gzip level 9
* all upload methods except up2k are supported
* the files will be indexed after compression, so dupe-detection and file-search will not work as expected
@ -756,7 +767,7 @@ in-volume (`.hist/up2k.db`, default) or somewhere else
copyparty creates a subfolder named `.hist` inside each volume where it stores the database, thumbnails, and some other stuff
this can instead be kept in a single place using the `--hist` argument, or the `hist=` volume flag, or a mix of both:
this can instead be kept in a single place using the `--hist` argument, or the `hist=` volflag, or a mix of both:
* `--hist ~/.cache/copyparty -v ~/music::r:c,hist=-` sets `~/.cache/copyparty` as the default place to put volume info, but `~/music` gets the regular `.hist` subfolder (`-` restores default behavior)
note:
@ -794,7 +805,7 @@ see the beautiful mess of a dictionary in [mtag.py](https://github.com/9001/copy
provide custom parsers to index additional tags, also see [./bin/mtag/README.md](./bin/mtag/README.md)
copyparty can invoke external programs to collect additional metadata for files using `mtp` (either as argument or volume flag), there is a default timeout of 30sec, and only files which contain audio get analyzed by default (see ay/an/ad below)
copyparty can invoke external programs to collect additional metadata for files using `mtp` (either as argument or volflag), there is a default timeout of 30sec, and only files which contain audio get analyzed by default (see ay/an/ad below)
* `-mtp .bpm=~/bin/audio-bpm.py` will execute `~/bin/audio-bpm.py` with the audio file as argument 1 to provide the `.bpm` tag, if that does not exist in the audio metadata
* `-mtp key=f,t5,~/bin/audio-key.py` uses `~/bin/audio-key.py` to get the `key` tag, replacing any existing metadata tag (`f,`), aborting if it takes longer than 5sec (`t5,`)
@ -835,8 +846,8 @@ if this becomes popular maybe there should be a less janky way to do it actually
tell search engines you dont wanna be indexed, either using the good old [robots.txt](https://www.robotstxt.org/robotstxt.html) or through copyparty settings:
* `--no-robots` adds HTTP (`X-Robots-Tag`) and HTML (`<meta>`) headers with `noindex, nofollow` globally
* volume-flag `[...]:c,norobots` does the same thing for that single volume
* volume-flag `[...]:c,robots` ALLOWS search-engine crawling for that volume, even if `--no-robots` is set globally
* volflag `[...]:c,norobots` does the same thing for that single volume
* volflag `[...]:c,robots` ALLOWS search-engine crawling for that volume, even if `--no-robots` is set globally
also, `--force-js` disables the plain HTML folder listing, making things harder to parse for search engines
@ -1059,7 +1070,7 @@ some notes on hardening
other misc notes:
* you can disable directory listings by giving permission `g` instead of `r`, only accepting direct URLs to files
* combine this with volume-flag `c,fk` to generate per-file accesskeys; users which have full read-access will then see URLs with `?k=...` appended to the end, and `g` users must provide that URL including the correct key to avoid a 404
* combine this with volflag `c,fk` to generate per-file accesskeys; users which have full read-access will then see URLs with `?k=...` appended to the end, and `g` users must provide that URL including the correct key to avoid a 404
## gotchas

View file

@ -42,7 +42,7 @@ run [`install-deps.sh`](install-deps.sh) to build/install most dependencies requ
* `mtp` modules will not run if a file has existing tags in the db, so clear out the tags with `-e2tsr` the first time you launch with new `mtp` options
## usage with volume-flags
## usage with volflags
instead of affecting all volumes, you can set the options for just one volume like so:

View file

@ -397,10 +397,12 @@ def run_argparse(argv: list[str], formatter: Any, retry: bool) -> argparse.Names
\033[36md2t\033[35m disables metadata collection, overrides -e2t*
\033[36md2v\033[35m disables file verification, overrides -e2v*
\033[36md2d\033[35m disables all database stuff, overrides -e2*
\033[36mnohash=\\.iso$\033[35m skips hashing file contents if path matches *.iso
\033[36mnoidx=\\.iso$\033[35m fully ignores the contents at paths matching *.iso
\033[36mhist=/tmp/cdb\033[35m puts thumbnails and indexes at that location
\033[36mscan=60\033[35m scan for new files every 60sec, same as --re-maxage
\033[36mnohash=\\.iso$\033[35m skips hashing file contents if path matches *.iso
\033[36mnoidx=\\.iso$\033[35m fully ignores the contents at paths matching *.iso
\033[36mxdev\033[35m do not descend into other filesystems
\033[36mxvol\033[35m skip symlinks leaving the volume root
\033[0mdatabase, audio tags:
"mte", "mth", "mtp", "mtm" all work the same as -mte, -mth, ...
@ -595,6 +597,8 @@ def run_argparse(argv: list[str], formatter: Any, retry: bool) -> argparse.Names
ap2.add_argument("--hist", metavar="PATH", type=u, help="where to store volume data (db, thumbs)")
ap2.add_argument("--no-hash", metavar="PTN", type=u, help="regex: disable hashing of matching paths during e2ds folder scans")
ap2.add_argument("--no-idx", metavar="PTN", type=u, help="regex: disable indexing of matching paths during e2ds folder scans")
ap2.add_argument("--xdev", action="store_true", help="do not descend into other filesystems (symlink or bind-mount to another HDD, ...)")
ap2.add_argument("--xvol", action="store_true", help="skip symlinks leaving the volume root")
ap2.add_argument("--re-maxage", metavar="SEC", type=int, default=0, help="disk rescan volume interval, 0=off, can be set per-volume with the 'scan' volflag")
ap2.add_argument("--db-act", metavar="SEC", type=float, default=10, help="defer any scheduled volume reindexing until SEC seconds after last db write (uploads, renames, ...)")
ap2.add_argument("--srch-time", metavar="SEC", type=int, default=30, help="search deadline -- terminate searches running for more than SEC seconds")

View file

@ -728,12 +728,12 @@ class AuthSrv(object):
self, lvl: str, uname: str, axs: AXS, flags: dict[str, Any]
) -> None:
if lvl.strip("crwmdg"):
raise Exception("invalid volume flag: {},{}".format(lvl, uname))
raise Exception("invalid volflag: {},{}".format(lvl, uname))
if lvl == "c":
cval: Union[bool, str] = True
try:
# volume flag with arguments, possibly with a preceding list of bools
# volflag with arguments, possibly with a preceding list of bools
uname, cval = uname.split("=", 1)
except:
# just one or more bools
@ -1066,7 +1066,7 @@ class AuthSrv(object):
if ptn:
vol.flags[vf] = re.compile(ptn)
for k in ["e2t", "e2ts", "e2tsr", "e2v", "e2vu", "e2vp"]:
for k in ["e2t", "e2ts", "e2tsr", "e2v", "e2vu", "e2vp", "xdev", "xvol"]:
if getattr(self.args, k):
vol.flags[k] = True
@ -1084,7 +1084,7 @@ class AuthSrv(object):
if "mth" not in vol.flags:
vol.flags["mth"] = self.args.mth
# append parsers from argv to volume-flags
# append parsers from argv to volflags
self._read_volflag(vol.flags, "mtp", self.args.mtp, True)
# d2d drops all database features for a volume
@ -1147,7 +1147,7 @@ class AuthSrv(object):
for mtp in local_only_mtp:
if mtp not in local_mte:
t = 'volume "/{}" defines metadata tag "{}", but doesnt use it in "-mte" (or with "cmte" in its volume-flags)'
t = 'volume "/{}" defines metadata tag "{}", but doesnt use it in "-mte" (or with "cmte" in its volflags)'
self.log(t.format(vol.vpath, mtp), 1)
errors = True
@ -1156,7 +1156,7 @@ class AuthSrv(object):
tags = [y for x in tags for y in x.split(",")]
for mtp in tags:
if mtp not in all_mte:
t = 'metadata tag "{}" is defined by "-mtm" or "-mtp", but is not used by "-mte" (or by any "cmte" volume-flag)'
t = 'metadata tag "{}" is defined by "-mtm" or "-mtp", but is not used by "-mte" (or by any "cmte" volflag)'
self.log(t.format(mtp), 1)
errors = True

View file

@ -672,6 +672,11 @@ class Up2k(object):
top = vol.realpath
rei = vol.flags.get("noidx")
reh = vol.flags.get("nohash")
dev = 0
if vol.flags.get("xdev"):
dev = bos.stat(top).st_dev
with self.mutex:
reg = self.register_vpath(top, vol.flags)
assert reg and self.pp
@ -689,11 +694,25 @@ class Up2k(object):
excl += list(self.asrv.vfs.histtab.values())
if WINDOWS:
excl = [x.replace("/", "\\") for x in excl]
else:
# ~/.wine/dosdevices/z:/ and such
excl += ["/dev", "/proc", "/run", "/sys"]
rtop = absreal(top)
n_add = n_rm = 0
try:
n_add = self._build_dir(db, top, set(excl), top, rtop, rei, reh, [])
n_add = self._build_dir(
db,
top,
set(excl),
top,
rtop,
rei,
reh,
[],
dev,
bool(vol.flags.get("xvol")),
)
n_rm = self._drop_lost(db.c, top, excl)
except Exception as ex:
db_ex_chk(self.log, ex, db_path)
@ -717,7 +736,13 @@ class Up2k(object):
rei: Optional[Pattern[str]],
reh: Optional[Pattern[str]],
seen: list[str],
dev: int,
xvol: bool,
) -> int:
if xvol and not rcdir.startswith(top):
self.log("skip xvol: [{}] -> [{}]".format(top, rcdir), 6)
return 0
if rcdir in seen:
t = "bailing from symlink loop,\n prev: {}\n curr: {}\n from: {}"
self.log(t.format(seen[-1], rcdir, cdir), 3)
@ -750,6 +775,9 @@ class Up2k(object):
sz = inf.st_size
if stat.S_ISDIR(inf.st_mode):
rap = absreal(abspath)
if dev and inf.st_dev != dev:
self.log("skip xdev {}->{}: {}".format(dev, inf.st_dev, abspath), 6)
continue
if abspath in excl or rap in excl:
unreg.append(rp)
continue
@ -758,7 +786,9 @@ class Up2k(object):
continue
# self.log(" dir: {}".format(abspath))
try:
ret += self._build_dir(db, top, excl, abspath, rap, rei, reh, seen)
ret += self._build_dir(
db, top, excl, abspath, rap, rei, reh, seen, dev, xvol
)
except:
t = "failed to index subdir [{}]:\n{}"
self.log(t.format(abspath, min_ex()), c=1)
@ -1109,7 +1139,7 @@ class Up2k(object):
with self.mutex:
cur.connection.commit()
# bail if a volume flag disables indexing
# bail if a volflag disables indexing
if "d2t" in flags or "d2d" in flags:
return 0, n_rm, True

View file

@ -185,7 +185,7 @@ brew install python@2
pip install virtualenv
# readme toc
cat README.md | awk 'function pr() { if (!h) {return}; if (/^ *[*!#|]/||!s) {printf "%s\n",h;h=0;return}; if (/.../) {printf "%s - %s\n",h,$0;h=0}; }; /^#/{s=1;pr()} /^#* *(file indexing|exclude-patterns|install on android|dev env setup|just the sfx|complete release|optional gpl stuff)|`$/{s=0} /^#/{lv=length($1);sub(/[^ ]+ /,"");bab=$0;gsub(/ /,"-",bab); h=sprintf("%" ((lv-1)*4+1) "s [%s](#%s)", "*",$0,bab);next} !h{next} {sub(/ .*/,"");sub(/[:,]$/,"")} {pr()}' > toc; grep -E '^## readme toc' -B1000 -A2 <README.md >p1; grep -E '^## quickstart' -B2 -A999999 <README.md >p2; (cat p1; grep quickstart -A1000 <toc; cat p2) >README.md; rm p1 p2 toc
cat README.md | awk 'function pr() { if (!h) {return}; if (/^ *[*!#|]/||!s) {printf "%s\n",h;h=0;return}; if (/.../) {printf "%s - %s\n",h,$0;h=0}; }; /^#/{s=1;pr()} /^#* *(install on android|dev env setup|just the sfx|complete release|optional gpl stuff)|`$/{s=0} /^#/{lv=length($1);sub(/[^ ]+ /,"");bab=$0;gsub(/ /,"-",bab); h=sprintf("%" ((lv-1)*4+1) "s [%s](#%s)", "*",$0,bab);next} !h{next} {sub(/ .*/,"");sub(/[:;,]$/,"")} {pr()}' > toc; grep -E '^## readme toc' -B1000 -A2 <README.md >p1; grep -E '^## quickstart' -B2 -A999999 <README.md >p2; (cat p1; grep quickstart -A1000 <toc; cat p2) >README.md; rm p1 p2 toc
# fix firefox phantom breakpoints,
# suggestions from bugtracker, doesnt work (debugger is not attachable)