mirror of
https://github.com/9001/copyparty.git
synced 2025-08-17 00:52:16 -06:00
add xdev/xvol indexing guards
This commit is contained in:
parent
90555a4cea
commit
680f8ae814
1
.gitignore
vendored
1
.gitignore
vendored
|
@ -13,6 +13,7 @@ copyparty.egg-info/
|
|||
/py2/
|
||||
/sfx/
|
||||
/unt/
|
||||
/log/
|
||||
|
||||
# ide
|
||||
*.sublime-workspace
|
||||
|
|
47
README.md
47
README.md
|
@ -56,10 +56,11 @@ try the **[read-only demo server](https://a.ocv.me/pub/demo/)** 👀 running fro
|
|||
* [searching](#searching) - search by size, date, path/name, mp3-tags, ...
|
||||
* [server config](#server-config) - using arguments or config files, or a mix of both
|
||||
* [ftp-server](#ftp-server) - an FTP server can be started using `--ftp 3921`
|
||||
* [file indexing](#file-indexing)
|
||||
* [exclude-patterns](#exclude-patterns)
|
||||
* [periodic rescan](#periodic-rescan) - filesystem monitoring;
|
||||
* [upload rules](#upload-rules) - set upload rules using volume flags
|
||||
* [file indexing](#file-indexing) - enables dedup and music search ++
|
||||
* [exclude-patterns](#exclude-patterns) - to save some time
|
||||
* [filesystem guards](#filesystem-guards) - avoid traversing into other filesystems
|
||||
* [periodic rescan](#periodic-rescan) - filesystem monitoring
|
||||
* [upload rules](#upload-rules) - set upload rules using volflags
|
||||
* [compress uploads](#compress-uploads) - files can be autocompressed on upload
|
||||
* [database location](#database-location) - in-volume (`.hist/up2k.db`, default) or somewhere else
|
||||
* [metadata from audio files](#metadata-from-audio-files) - set `-e2t` to index tags on upload
|
||||
|
@ -311,7 +312,7 @@ examples:
|
|||
* `u1` can open the `inc` folder, but cannot see the contents, only upload new files to it
|
||||
* `u2` can browse it and move files *from* `/inc` into any folder where `u2` has write-access
|
||||
* make folder `/mnt/ss` available at `/i`, read-write for u1, get-only for everyone else, and enable accesskeys: `-v /mnt/ss:i:rw,u1:g:c,fk=4`
|
||||
* `c,fk=4` sets the `fk` volume-flag to 4, meaning each file gets a 4-character accesskey
|
||||
* `c,fk=4` sets the `fk` volflag to 4, meaning each file gets a 4-character accesskey
|
||||
* `u1` can upload files, browse the folder, and see the generated accesskeys
|
||||
* other users cannot browse the folder, but can access the files if they have the full file URL with the accesskey
|
||||
|
||||
|
@ -658,7 +659,9 @@ an FTP server can be started using `--ftp 3921`, and/or `--ftps` for explicit T
|
|||
|
||||
## file indexing
|
||||
|
||||
file indexing relies on two database tables, the up2k filetree (`-e2d`) and the metadata tags (`-e2t`), stored in `.hist/up2k.db`. Configuration can be done through arguments, volume flags, or a mix of both.
|
||||
enables dedup and music search ++
|
||||
|
||||
file indexing relies on two database tables, the up2k filetree (`-e2d`) and the metadata tags (`-e2t`), stored in `.hist/up2k.db`. Configuration can be done through arguments, volflags, or a mix of both.
|
||||
|
||||
through arguments:
|
||||
* `-e2d` enables file indexing on upload
|
||||
|
@ -671,7 +674,7 @@ through arguments:
|
|||
* `-e2vu` patches the database with the new hashes from the filesystem
|
||||
* `-e2vp` panics and kills copyparty instead
|
||||
|
||||
the same arguments can be set as volume flags, in addition to `d2d`, `d2ds`, `d2t`, `d2ts`, `d2v` for disabling:
|
||||
the same arguments can be set as volflags, in addition to `d2d`, `d2ds`, `d2t`, `d2ts`, `d2v` for disabling:
|
||||
* `-v ~/music::r:c,e2dsa,e2tsr` does a full reindex of everything on startup
|
||||
* `-v ~/music::r:c,d2d` disables **all** indexing, even if any `-e2*` are on
|
||||
* `-v ~/music::r:c,d2t` disables all `-e2t*` (tags), does not affect `-e2d*`
|
||||
|
@ -685,7 +688,7 @@ note:
|
|||
|
||||
### exclude-patterns
|
||||
|
||||
to save some time, you can provide a regex pattern for filepaths to only index by filename/path/size/last-modified (and not the hash of the file contents) by setting `--no-hash \.iso$` or the volume-flag `:c,nohash=\.iso$`, this has the following consequences:
|
||||
to save some time, you can provide a regex pattern for filepaths to only index by filename/path/size/last-modified (and not the hash of the file contents) by setting `--no-hash \.iso$` or the volflag `:c,nohash=\.iso$`, this has the following consequences:
|
||||
* initial indexing is way faster, especially when the volume is on a network disk
|
||||
* makes it impossible to [file-search](#file-search)
|
||||
* if someone uploads the same file contents, the upload will not be detected as a dupe, so it will not get symlinked or rejected
|
||||
|
@ -694,6 +697,14 @@ similarly, you can fully ignore files/folders using `--no-idx [...]` and `:c,noi
|
|||
|
||||
if you set `--no-hash [...]` globally, you can enable hashing for specific volumes using flag `:c,nohash=`
|
||||
|
||||
### filesystem guards
|
||||
|
||||
avoid traversing into other filesystems using `--xdev` / volflag `:c,xdev`, skipping any symlinks or bind-mounts to another HDD for example
|
||||
|
||||
and/or you can `--xvol` / `:c,xvol` to ignore all symlinks leaving the volume's top directory, but still allow bind-mounts pointing elsewhere
|
||||
|
||||
**NB: only affects the indexer** -- users can still access anything inside a volume, unless shadowed by another volume
|
||||
|
||||
### periodic rescan
|
||||
|
||||
filesystem monitoring; if copyparty is not the only software doing stuff on your filesystem, you may want to enable periodic rescans to keep the index up to date
|
||||
|
@ -705,7 +716,7 @@ uploads are disabled while a rescan is happening, so rescans will be delayed by
|
|||
|
||||
## upload rules
|
||||
|
||||
set upload rules using volume flags, some examples:
|
||||
set upload rules using volflags, some examples:
|
||||
|
||||
* `:c,sz=1k-3m` sets allowed filesize between 1 KiB and 3 MiB inclusive (suffixes: `b`, `k`, `m`, `g`)
|
||||
* `:c,df=4g` block uploads if there would be less than 4 GiB free disk space afterwards
|
||||
|
@ -727,16 +738,16 @@ you can also set transaction limits which apply per-IP and per-volume, but these
|
|||
|
||||
files can be autocompressed on upload, either on user-request (if config allows) or forced by server-config
|
||||
|
||||
* volume flag `gz` allows gz compression
|
||||
* volume flag `xz` allows lzma compression
|
||||
* volume flag `pk` **forces** compression on all files
|
||||
* volflag `gz` allows gz compression
|
||||
* volflag `xz` allows lzma compression
|
||||
* volflag `pk` **forces** compression on all files
|
||||
* url parameter `pk` requests compression with server-default algorithm
|
||||
* url parameter `gz` or `xz` requests compression with a specific algorithm
|
||||
* url parameter `xz` requests xz compression
|
||||
|
||||
things to note,
|
||||
* the `gz` and `xz` arguments take a single optional argument, the compression level (range 0 to 9)
|
||||
* the `pk` volume flag takes the optional argument `ALGORITHM,LEVEL` which will then be forced for all uploads, for example `gz,9` or `xz,0`
|
||||
* the `pk` volflag takes the optional argument `ALGORITHM,LEVEL` which will then be forced for all uploads, for example `gz,9` or `xz,0`
|
||||
* default compression is gzip level 9
|
||||
* all upload methods except up2k are supported
|
||||
* the files will be indexed after compression, so dupe-detection and file-search will not work as expected
|
||||
|
@ -756,7 +767,7 @@ in-volume (`.hist/up2k.db`, default) or somewhere else
|
|||
|
||||
copyparty creates a subfolder named `.hist` inside each volume where it stores the database, thumbnails, and some other stuff
|
||||
|
||||
this can instead be kept in a single place using the `--hist` argument, or the `hist=` volume flag, or a mix of both:
|
||||
this can instead be kept in a single place using the `--hist` argument, or the `hist=` volflag, or a mix of both:
|
||||
* `--hist ~/.cache/copyparty -v ~/music::r:c,hist=-` sets `~/.cache/copyparty` as the default place to put volume info, but `~/music` gets the regular `.hist` subfolder (`-` restores default behavior)
|
||||
|
||||
note:
|
||||
|
@ -794,7 +805,7 @@ see the beautiful mess of a dictionary in [mtag.py](https://github.com/9001/copy
|
|||
|
||||
provide custom parsers to index additional tags, also see [./bin/mtag/README.md](./bin/mtag/README.md)
|
||||
|
||||
copyparty can invoke external programs to collect additional metadata for files using `mtp` (either as argument or volume flag), there is a default timeout of 30sec, and only files which contain audio get analyzed by default (see ay/an/ad below)
|
||||
copyparty can invoke external programs to collect additional metadata for files using `mtp` (either as argument or volflag), there is a default timeout of 30sec, and only files which contain audio get analyzed by default (see ay/an/ad below)
|
||||
|
||||
* `-mtp .bpm=~/bin/audio-bpm.py` will execute `~/bin/audio-bpm.py` with the audio file as argument 1 to provide the `.bpm` tag, if that does not exist in the audio metadata
|
||||
* `-mtp key=f,t5,~/bin/audio-key.py` uses `~/bin/audio-key.py` to get the `key` tag, replacing any existing metadata tag (`f,`), aborting if it takes longer than 5sec (`t5,`)
|
||||
|
@ -835,8 +846,8 @@ if this becomes popular maybe there should be a less janky way to do it actually
|
|||
tell search engines you dont wanna be indexed, either using the good old [robots.txt](https://www.robotstxt.org/robotstxt.html) or through copyparty settings:
|
||||
|
||||
* `--no-robots` adds HTTP (`X-Robots-Tag`) and HTML (`<meta>`) headers with `noindex, nofollow` globally
|
||||
* volume-flag `[...]:c,norobots` does the same thing for that single volume
|
||||
* volume-flag `[...]:c,robots` ALLOWS search-engine crawling for that volume, even if `--no-robots` is set globally
|
||||
* volflag `[...]:c,norobots` does the same thing for that single volume
|
||||
* volflag `[...]:c,robots` ALLOWS search-engine crawling for that volume, even if `--no-robots` is set globally
|
||||
|
||||
also, `--force-js` disables the plain HTML folder listing, making things harder to parse for search engines
|
||||
|
||||
|
@ -1059,7 +1070,7 @@ some notes on hardening
|
|||
other misc notes:
|
||||
|
||||
* you can disable directory listings by giving permission `g` instead of `r`, only accepting direct URLs to files
|
||||
* combine this with volume-flag `c,fk` to generate per-file accesskeys; users which have full read-access will then see URLs with `?k=...` appended to the end, and `g` users must provide that URL including the correct key to avoid a 404
|
||||
* combine this with volflag `c,fk` to generate per-file accesskeys; users which have full read-access will then see URLs with `?k=...` appended to the end, and `g` users must provide that URL including the correct key to avoid a 404
|
||||
|
||||
|
||||
## gotchas
|
||||
|
|
|
@ -42,7 +42,7 @@ run [`install-deps.sh`](install-deps.sh) to build/install most dependencies requ
|
|||
* `mtp` modules will not run if a file has existing tags in the db, so clear out the tags with `-e2tsr` the first time you launch with new `mtp` options
|
||||
|
||||
|
||||
## usage with volume-flags
|
||||
## usage with volflags
|
||||
|
||||
instead of affecting all volumes, you can set the options for just one volume like so:
|
||||
|
||||
|
|
|
@ -397,10 +397,12 @@ def run_argparse(argv: list[str], formatter: Any, retry: bool) -> argparse.Names
|
|||
\033[36md2t\033[35m disables metadata collection, overrides -e2t*
|
||||
\033[36md2v\033[35m disables file verification, overrides -e2v*
|
||||
\033[36md2d\033[35m disables all database stuff, overrides -e2*
|
||||
\033[36mnohash=\\.iso$\033[35m skips hashing file contents if path matches *.iso
|
||||
\033[36mnoidx=\\.iso$\033[35m fully ignores the contents at paths matching *.iso
|
||||
\033[36mhist=/tmp/cdb\033[35m puts thumbnails and indexes at that location
|
||||
\033[36mscan=60\033[35m scan for new files every 60sec, same as --re-maxage
|
||||
\033[36mnohash=\\.iso$\033[35m skips hashing file contents if path matches *.iso
|
||||
\033[36mnoidx=\\.iso$\033[35m fully ignores the contents at paths matching *.iso
|
||||
\033[36mxdev\033[35m do not descend into other filesystems
|
||||
\033[36mxvol\033[35m skip symlinks leaving the volume root
|
||||
|
||||
\033[0mdatabase, audio tags:
|
||||
"mte", "mth", "mtp", "mtm" all work the same as -mte, -mth, ...
|
||||
|
@ -595,6 +597,8 @@ def run_argparse(argv: list[str], formatter: Any, retry: bool) -> argparse.Names
|
|||
ap2.add_argument("--hist", metavar="PATH", type=u, help="where to store volume data (db, thumbs)")
|
||||
ap2.add_argument("--no-hash", metavar="PTN", type=u, help="regex: disable hashing of matching paths during e2ds folder scans")
|
||||
ap2.add_argument("--no-idx", metavar="PTN", type=u, help="regex: disable indexing of matching paths during e2ds folder scans")
|
||||
ap2.add_argument("--xdev", action="store_true", help="do not descend into other filesystems (symlink or bind-mount to another HDD, ...)")
|
||||
ap2.add_argument("--xvol", action="store_true", help="skip symlinks leaving the volume root")
|
||||
ap2.add_argument("--re-maxage", metavar="SEC", type=int, default=0, help="disk rescan volume interval, 0=off, can be set per-volume with the 'scan' volflag")
|
||||
ap2.add_argument("--db-act", metavar="SEC", type=float, default=10, help="defer any scheduled volume reindexing until SEC seconds after last db write (uploads, renames, ...)")
|
||||
ap2.add_argument("--srch-time", metavar="SEC", type=int, default=30, help="search deadline -- terminate searches running for more than SEC seconds")
|
||||
|
|
|
@ -728,12 +728,12 @@ class AuthSrv(object):
|
|||
self, lvl: str, uname: str, axs: AXS, flags: dict[str, Any]
|
||||
) -> None:
|
||||
if lvl.strip("crwmdg"):
|
||||
raise Exception("invalid volume flag: {},{}".format(lvl, uname))
|
||||
raise Exception("invalid volflag: {},{}".format(lvl, uname))
|
||||
|
||||
if lvl == "c":
|
||||
cval: Union[bool, str] = True
|
||||
try:
|
||||
# volume flag with arguments, possibly with a preceding list of bools
|
||||
# volflag with arguments, possibly with a preceding list of bools
|
||||
uname, cval = uname.split("=", 1)
|
||||
except:
|
||||
# just one or more bools
|
||||
|
@ -1066,7 +1066,7 @@ class AuthSrv(object):
|
|||
if ptn:
|
||||
vol.flags[vf] = re.compile(ptn)
|
||||
|
||||
for k in ["e2t", "e2ts", "e2tsr", "e2v", "e2vu", "e2vp"]:
|
||||
for k in ["e2t", "e2ts", "e2tsr", "e2v", "e2vu", "e2vp", "xdev", "xvol"]:
|
||||
if getattr(self.args, k):
|
||||
vol.flags[k] = True
|
||||
|
||||
|
@ -1084,7 +1084,7 @@ class AuthSrv(object):
|
|||
if "mth" not in vol.flags:
|
||||
vol.flags["mth"] = self.args.mth
|
||||
|
||||
# append parsers from argv to volume-flags
|
||||
# append parsers from argv to volflags
|
||||
self._read_volflag(vol.flags, "mtp", self.args.mtp, True)
|
||||
|
||||
# d2d drops all database features for a volume
|
||||
|
@ -1147,7 +1147,7 @@ class AuthSrv(object):
|
|||
|
||||
for mtp in local_only_mtp:
|
||||
if mtp not in local_mte:
|
||||
t = 'volume "/{}" defines metadata tag "{}", but doesnt use it in "-mte" (or with "cmte" in its volume-flags)'
|
||||
t = 'volume "/{}" defines metadata tag "{}", but doesnt use it in "-mte" (or with "cmte" in its volflags)'
|
||||
self.log(t.format(vol.vpath, mtp), 1)
|
||||
errors = True
|
||||
|
||||
|
@ -1156,7 +1156,7 @@ class AuthSrv(object):
|
|||
tags = [y for x in tags for y in x.split(",")]
|
||||
for mtp in tags:
|
||||
if mtp not in all_mte:
|
||||
t = 'metadata tag "{}" is defined by "-mtm" or "-mtp", but is not used by "-mte" (or by any "cmte" volume-flag)'
|
||||
t = 'metadata tag "{}" is defined by "-mtm" or "-mtp", but is not used by "-mte" (or by any "cmte" volflag)'
|
||||
self.log(t.format(mtp), 1)
|
||||
errors = True
|
||||
|
||||
|
|
|
@ -672,6 +672,11 @@ class Up2k(object):
|
|||
top = vol.realpath
|
||||
rei = vol.flags.get("noidx")
|
||||
reh = vol.flags.get("nohash")
|
||||
|
||||
dev = 0
|
||||
if vol.flags.get("xdev"):
|
||||
dev = bos.stat(top).st_dev
|
||||
|
||||
with self.mutex:
|
||||
reg = self.register_vpath(top, vol.flags)
|
||||
assert reg and self.pp
|
||||
|
@ -689,11 +694,25 @@ class Up2k(object):
|
|||
excl += list(self.asrv.vfs.histtab.values())
|
||||
if WINDOWS:
|
||||
excl = [x.replace("/", "\\") for x in excl]
|
||||
else:
|
||||
# ~/.wine/dosdevices/z:/ and such
|
||||
excl += ["/dev", "/proc", "/run", "/sys"]
|
||||
|
||||
rtop = absreal(top)
|
||||
n_add = n_rm = 0
|
||||
try:
|
||||
n_add = self._build_dir(db, top, set(excl), top, rtop, rei, reh, [])
|
||||
n_add = self._build_dir(
|
||||
db,
|
||||
top,
|
||||
set(excl),
|
||||
top,
|
||||
rtop,
|
||||
rei,
|
||||
reh,
|
||||
[],
|
||||
dev,
|
||||
bool(vol.flags.get("xvol")),
|
||||
)
|
||||
n_rm = self._drop_lost(db.c, top, excl)
|
||||
except Exception as ex:
|
||||
db_ex_chk(self.log, ex, db_path)
|
||||
|
@ -717,7 +736,13 @@ class Up2k(object):
|
|||
rei: Optional[Pattern[str]],
|
||||
reh: Optional[Pattern[str]],
|
||||
seen: list[str],
|
||||
dev: int,
|
||||
xvol: bool,
|
||||
) -> int:
|
||||
if xvol and not rcdir.startswith(top):
|
||||
self.log("skip xvol: [{}] -> [{}]".format(top, rcdir), 6)
|
||||
return 0
|
||||
|
||||
if rcdir in seen:
|
||||
t = "bailing from symlink loop,\n prev: {}\n curr: {}\n from: {}"
|
||||
self.log(t.format(seen[-1], rcdir, cdir), 3)
|
||||
|
@ -750,6 +775,9 @@ class Up2k(object):
|
|||
sz = inf.st_size
|
||||
if stat.S_ISDIR(inf.st_mode):
|
||||
rap = absreal(abspath)
|
||||
if dev and inf.st_dev != dev:
|
||||
self.log("skip xdev {}->{}: {}".format(dev, inf.st_dev, abspath), 6)
|
||||
continue
|
||||
if abspath in excl or rap in excl:
|
||||
unreg.append(rp)
|
||||
continue
|
||||
|
@ -758,7 +786,9 @@ class Up2k(object):
|
|||
continue
|
||||
# self.log(" dir: {}".format(abspath))
|
||||
try:
|
||||
ret += self._build_dir(db, top, excl, abspath, rap, rei, reh, seen)
|
||||
ret += self._build_dir(
|
||||
db, top, excl, abspath, rap, rei, reh, seen, dev, xvol
|
||||
)
|
||||
except:
|
||||
t = "failed to index subdir [{}]:\n{}"
|
||||
self.log(t.format(abspath, min_ex()), c=1)
|
||||
|
@ -1109,7 +1139,7 @@ class Up2k(object):
|
|||
with self.mutex:
|
||||
cur.connection.commit()
|
||||
|
||||
# bail if a volume flag disables indexing
|
||||
# bail if a volflag disables indexing
|
||||
if "d2t" in flags or "d2d" in flags:
|
||||
return 0, n_rm, True
|
||||
|
||||
|
|
|
@ -185,7 +185,7 @@ brew install python@2
|
|||
pip install virtualenv
|
||||
|
||||
# readme toc
|
||||
cat README.md | awk 'function pr() { if (!h) {return}; if (/^ *[*!#|]/||!s) {printf "%s\n",h;h=0;return}; if (/.../) {printf "%s - %s\n",h,$0;h=0}; }; /^#/{s=1;pr()} /^#* *(file indexing|exclude-patterns|install on android|dev env setup|just the sfx|complete release|optional gpl stuff)|`$/{s=0} /^#/{lv=length($1);sub(/[^ ]+ /,"");bab=$0;gsub(/ /,"-",bab); h=sprintf("%" ((lv-1)*4+1) "s [%s](#%s)", "*",$0,bab);next} !h{next} {sub(/ .*/,"");sub(/[:,]$/,"")} {pr()}' > toc; grep -E '^## readme toc' -B1000 -A2 <README.md >p1; grep -E '^## quickstart' -B2 -A999999 <README.md >p2; (cat p1; grep quickstart -A1000 <toc; cat p2) >README.md; rm p1 p2 toc
|
||||
cat README.md | awk 'function pr() { if (!h) {return}; if (/^ *[*!#|]/||!s) {printf "%s\n",h;h=0;return}; if (/.../) {printf "%s - %s\n",h,$0;h=0}; }; /^#/{s=1;pr()} /^#* *(install on android|dev env setup|just the sfx|complete release|optional gpl stuff)|`$/{s=0} /^#/{lv=length($1);sub(/[^ ]+ /,"");bab=$0;gsub(/ /,"-",bab); h=sprintf("%" ((lv-1)*4+1) "s [%s](#%s)", "*",$0,bab);next} !h{next} {sub(/ .*/,"");sub(/[:;,]$/,"")} {pr()}' > toc; grep -E '^## readme toc' -B1000 -A2 <README.md >p1; grep -E '^## quickstart' -B2 -A999999 <README.md >p2; (cat p1; grep quickstart -A1000 <toc; cat p2) >README.md; rm p1 p2 toc
|
||||
|
||||
# fix firefox phantom breakpoints,
|
||||
# suggestions from bugtracker, doesnt work (debugger is not attachable)
|
||||
|
|
Loading…
Reference in a new issue