diff --git a/.gitignore b/.gitignore index 34b9433f..6227e9bf 100644 --- a/.gitignore +++ b/.gitignore @@ -13,6 +13,7 @@ copyparty.egg-info/ /py2/ /sfx/ /unt/ +/log/ # ide *.sublime-workspace diff --git a/README.md b/README.md index a0aea44a..3f352c9c 100644 --- a/README.md +++ b/README.md @@ -56,10 +56,11 @@ try the **[read-only demo server](https://a.ocv.me/pub/demo/)** 👀 running fro * [searching](#searching) - search by size, date, path/name, mp3-tags, ... * [server config](#server-config) - using arguments or config files, or a mix of both * [ftp-server](#ftp-server) - an FTP server can be started using `--ftp 3921` - * [file indexing](#file-indexing) - * [exclude-patterns](#exclude-patterns) - * [periodic rescan](#periodic-rescan) - filesystem monitoring; - * [upload rules](#upload-rules) - set upload rules using volume flags + * [file indexing](#file-indexing) - enables dedup and music search ++ + * [exclude-patterns](#exclude-patterns) - to save some time + * [filesystem guards](#filesystem-guards) - avoid traversing into other filesystems + * [periodic rescan](#periodic-rescan) - filesystem monitoring + * [upload rules](#upload-rules) - set upload rules using volflags * [compress uploads](#compress-uploads) - files can be autocompressed on upload * [database location](#database-location) - in-volume (`.hist/up2k.db`, default) or somewhere else * [metadata from audio files](#metadata-from-audio-files) - set `-e2t` to index tags on upload @@ -311,7 +312,7 @@ examples: * `u1` can open the `inc` folder, but cannot see the contents, only upload new files to it * `u2` can browse it and move files *from* `/inc` into any folder where `u2` has write-access * make folder `/mnt/ss` available at `/i`, read-write for u1, get-only for everyone else, and enable accesskeys: `-v /mnt/ss:i:rw,u1:g:c,fk=4` - * `c,fk=4` sets the `fk` volume-flag to 4, meaning each file gets a 4-character accesskey + * `c,fk=4` sets the `fk` volflag to 4, meaning each file gets a 4-character accesskey * `u1` can upload files, browse the folder, and see the generated accesskeys * other users cannot browse the folder, but can access the files if they have the full file URL with the accesskey @@ -658,7 +659,9 @@ an FTP server can be started using `--ftp 3921`, and/or `--ftps` for explicit T ## file indexing -file indexing relies on two database tables, the up2k filetree (`-e2d`) and the metadata tags (`-e2t`), stored in `.hist/up2k.db`. Configuration can be done through arguments, volume flags, or a mix of both. +enables dedup and music search ++ + +file indexing relies on two database tables, the up2k filetree (`-e2d`) and the metadata tags (`-e2t`), stored in `.hist/up2k.db`. Configuration can be done through arguments, volflags, or a mix of both. through arguments: * `-e2d` enables file indexing on upload @@ -671,7 +674,7 @@ through arguments: * `-e2vu` patches the database with the new hashes from the filesystem * `-e2vp` panics and kills copyparty instead -the same arguments can be set as volume flags, in addition to `d2d`, `d2ds`, `d2t`, `d2ts`, `d2v` for disabling: +the same arguments can be set as volflags, in addition to `d2d`, `d2ds`, `d2t`, `d2ts`, `d2v` for disabling: * `-v ~/music::r:c,e2dsa,e2tsr` does a full reindex of everything on startup * `-v ~/music::r:c,d2d` disables **all** indexing, even if any `-e2*` are on * `-v ~/music::r:c,d2t` disables all `-e2t*` (tags), does not affect `-e2d*` @@ -685,7 +688,7 @@ note: ### exclude-patterns -to save some time, you can provide a regex pattern for filepaths to only index by filename/path/size/last-modified (and not the hash of the file contents) by setting `--no-hash \.iso$` or the volume-flag `:c,nohash=\.iso$`, this has the following consequences: +to save some time, you can provide a regex pattern for filepaths to only index by filename/path/size/last-modified (and not the hash of the file contents) by setting `--no-hash \.iso$` or the volflag `:c,nohash=\.iso$`, this has the following consequences: * initial indexing is way faster, especially when the volume is on a network disk * makes it impossible to [file-search](#file-search) * if someone uploads the same file contents, the upload will not be detected as a dupe, so it will not get symlinked or rejected @@ -694,6 +697,14 @@ similarly, you can fully ignore files/folders using `--no-idx [...]` and `:c,noi if you set `--no-hash [...]` globally, you can enable hashing for specific volumes using flag `:c,nohash=` +### filesystem guards + +avoid traversing into other filesystems using `--xdev` / volflag `:c,xdev`, skipping any symlinks or bind-mounts to another HDD for example + +and/or you can `--xvol` / `:c,xvol` to ignore all symlinks leaving the volume's top directory, but still allow bind-mounts pointing elsewhere + +**NB: only affects the indexer** -- users can still access anything inside a volume, unless shadowed by another volume + ### periodic rescan filesystem monitoring; if copyparty is not the only software doing stuff on your filesystem, you may want to enable periodic rescans to keep the index up to date @@ -705,7 +716,7 @@ uploads are disabled while a rescan is happening, so rescans will be delayed by ## upload rules -set upload rules using volume flags, some examples: +set upload rules using volflags, some examples: * `:c,sz=1k-3m` sets allowed filesize between 1 KiB and 3 MiB inclusive (suffixes: `b`, `k`, `m`, `g`) * `:c,df=4g` block uploads if there would be less than 4 GiB free disk space afterwards @@ -727,16 +738,16 @@ you can also set transaction limits which apply per-IP and per-volume, but these files can be autocompressed on upload, either on user-request (if config allows) or forced by server-config -* volume flag `gz` allows gz compression -* volume flag `xz` allows lzma compression -* volume flag `pk` **forces** compression on all files +* volflag `gz` allows gz compression +* volflag `xz` allows lzma compression +* volflag `pk` **forces** compression on all files * url parameter `pk` requests compression with server-default algorithm * url parameter `gz` or `xz` requests compression with a specific algorithm * url parameter `xz` requests xz compression things to note, * the `gz` and `xz` arguments take a single optional argument, the compression level (range 0 to 9) -* the `pk` volume flag takes the optional argument `ALGORITHM,LEVEL` which will then be forced for all uploads, for example `gz,9` or `xz,0` +* the `pk` volflag takes the optional argument `ALGORITHM,LEVEL` which will then be forced for all uploads, for example `gz,9` or `xz,0` * default compression is gzip level 9 * all upload methods except up2k are supported * the files will be indexed after compression, so dupe-detection and file-search will not work as expected @@ -756,7 +767,7 @@ in-volume (`.hist/up2k.db`, default) or somewhere else copyparty creates a subfolder named `.hist` inside each volume where it stores the database, thumbnails, and some other stuff -this can instead be kept in a single place using the `--hist` argument, or the `hist=` volume flag, or a mix of both: +this can instead be kept in a single place using the `--hist` argument, or the `hist=` volflag, or a mix of both: * `--hist ~/.cache/copyparty -v ~/music::r:c,hist=-` sets `~/.cache/copyparty` as the default place to put volume info, but `~/music` gets the regular `.hist` subfolder (`-` restores default behavior) note: @@ -794,7 +805,7 @@ see the beautiful mess of a dictionary in [mtag.py](https://github.com/9001/copy provide custom parsers to index additional tags, also see [./bin/mtag/README.md](./bin/mtag/README.md) -copyparty can invoke external programs to collect additional metadata for files using `mtp` (either as argument or volume flag), there is a default timeout of 30sec, and only files which contain audio get analyzed by default (see ay/an/ad below) +copyparty can invoke external programs to collect additional metadata for files using `mtp` (either as argument or volflag), there is a default timeout of 30sec, and only files which contain audio get analyzed by default (see ay/an/ad below) * `-mtp .bpm=~/bin/audio-bpm.py` will execute `~/bin/audio-bpm.py` with the audio file as argument 1 to provide the `.bpm` tag, if that does not exist in the audio metadata * `-mtp key=f,t5,~/bin/audio-key.py` uses `~/bin/audio-key.py` to get the `key` tag, replacing any existing metadata tag (`f,`), aborting if it takes longer than 5sec (`t5,`) @@ -835,8 +846,8 @@ if this becomes popular maybe there should be a less janky way to do it actually tell search engines you dont wanna be indexed, either using the good old [robots.txt](https://www.robotstxt.org/robotstxt.html) or through copyparty settings: * `--no-robots` adds HTTP (`X-Robots-Tag`) and HTML (``) headers with `noindex, nofollow` globally -* volume-flag `[...]:c,norobots` does the same thing for that single volume -* volume-flag `[...]:c,robots` ALLOWS search-engine crawling for that volume, even if `--no-robots` is set globally +* volflag `[...]:c,norobots` does the same thing for that single volume +* volflag `[...]:c,robots` ALLOWS search-engine crawling for that volume, even if `--no-robots` is set globally also, `--force-js` disables the plain HTML folder listing, making things harder to parse for search engines @@ -1059,7 +1070,7 @@ some notes on hardening other misc notes: * you can disable directory listings by giving permission `g` instead of `r`, only accepting direct URLs to files - * combine this with volume-flag `c,fk` to generate per-file accesskeys; users which have full read-access will then see URLs with `?k=...` appended to the end, and `g` users must provide that URL including the correct key to avoid a 404 + * combine this with volflag `c,fk` to generate per-file accesskeys; users which have full read-access will then see URLs with `?k=...` appended to the end, and `g` users must provide that URL including the correct key to avoid a 404 ## gotchas diff --git a/bin/mtag/README.md b/bin/mtag/README.md index db9c9ca7..6361ae92 100644 --- a/bin/mtag/README.md +++ b/bin/mtag/README.md @@ -42,7 +42,7 @@ run [`install-deps.sh`](install-deps.sh) to build/install most dependencies requ * `mtp` modules will not run if a file has existing tags in the db, so clear out the tags with `-e2tsr` the first time you launch with new `mtp` options -## usage with volume-flags +## usage with volflags instead of affecting all volumes, you can set the options for just one volume like so: diff --git a/copyparty/__main__.py b/copyparty/__main__.py index f92542c5..56ffbf21 100644 --- a/copyparty/__main__.py +++ b/copyparty/__main__.py @@ -397,10 +397,12 @@ def run_argparse(argv: list[str], formatter: Any, retry: bool) -> argparse.Names \033[36md2t\033[35m disables metadata collection, overrides -e2t* \033[36md2v\033[35m disables file verification, overrides -e2v* \033[36md2d\033[35m disables all database stuff, overrides -e2* - \033[36mnohash=\\.iso$\033[35m skips hashing file contents if path matches *.iso - \033[36mnoidx=\\.iso$\033[35m fully ignores the contents at paths matching *.iso \033[36mhist=/tmp/cdb\033[35m puts thumbnails and indexes at that location \033[36mscan=60\033[35m scan for new files every 60sec, same as --re-maxage + \033[36mnohash=\\.iso$\033[35m skips hashing file contents if path matches *.iso + \033[36mnoidx=\\.iso$\033[35m fully ignores the contents at paths matching *.iso + \033[36mxdev\033[35m do not descend into other filesystems + \033[36mxvol\033[35m skip symlinks leaving the volume root \033[0mdatabase, audio tags: "mte", "mth", "mtp", "mtm" all work the same as -mte, -mth, ... @@ -595,6 +597,8 @@ def run_argparse(argv: list[str], formatter: Any, retry: bool) -> argparse.Names ap2.add_argument("--hist", metavar="PATH", type=u, help="where to store volume data (db, thumbs)") ap2.add_argument("--no-hash", metavar="PTN", type=u, help="regex: disable hashing of matching paths during e2ds folder scans") ap2.add_argument("--no-idx", metavar="PTN", type=u, help="regex: disable indexing of matching paths during e2ds folder scans") + ap2.add_argument("--xdev", action="store_true", help="do not descend into other filesystems (symlink or bind-mount to another HDD, ...)") + ap2.add_argument("--xvol", action="store_true", help="skip symlinks leaving the volume root") ap2.add_argument("--re-maxage", metavar="SEC", type=int, default=0, help="disk rescan volume interval, 0=off, can be set per-volume with the 'scan' volflag") ap2.add_argument("--db-act", metavar="SEC", type=float, default=10, help="defer any scheduled volume reindexing until SEC seconds after last db write (uploads, renames, ...)") ap2.add_argument("--srch-time", metavar="SEC", type=int, default=30, help="search deadline -- terminate searches running for more than SEC seconds") diff --git a/copyparty/authsrv.py b/copyparty/authsrv.py index 3a1d5eb0..d3740a41 100644 --- a/copyparty/authsrv.py +++ b/copyparty/authsrv.py @@ -728,12 +728,12 @@ class AuthSrv(object): self, lvl: str, uname: str, axs: AXS, flags: dict[str, Any] ) -> None: if lvl.strip("crwmdg"): - raise Exception("invalid volume flag: {},{}".format(lvl, uname)) + raise Exception("invalid volflag: {},{}".format(lvl, uname)) if lvl == "c": cval: Union[bool, str] = True try: - # volume flag with arguments, possibly with a preceding list of bools + # volflag with arguments, possibly with a preceding list of bools uname, cval = uname.split("=", 1) except: # just one or more bools @@ -1066,7 +1066,7 @@ class AuthSrv(object): if ptn: vol.flags[vf] = re.compile(ptn) - for k in ["e2t", "e2ts", "e2tsr", "e2v", "e2vu", "e2vp"]: + for k in ["e2t", "e2ts", "e2tsr", "e2v", "e2vu", "e2vp", "xdev", "xvol"]: if getattr(self.args, k): vol.flags[k] = True @@ -1084,7 +1084,7 @@ class AuthSrv(object): if "mth" not in vol.flags: vol.flags["mth"] = self.args.mth - # append parsers from argv to volume-flags + # append parsers from argv to volflags self._read_volflag(vol.flags, "mtp", self.args.mtp, True) # d2d drops all database features for a volume @@ -1147,7 +1147,7 @@ class AuthSrv(object): for mtp in local_only_mtp: if mtp not in local_mte: - t = 'volume "/{}" defines metadata tag "{}", but doesnt use it in "-mte" (or with "cmte" in its volume-flags)' + t = 'volume "/{}" defines metadata tag "{}", but doesnt use it in "-mte" (or with "cmte" in its volflags)' self.log(t.format(vol.vpath, mtp), 1) errors = True @@ -1156,7 +1156,7 @@ class AuthSrv(object): tags = [y for x in tags for y in x.split(",")] for mtp in tags: if mtp not in all_mte: - t = 'metadata tag "{}" is defined by "-mtm" or "-mtp", but is not used by "-mte" (or by any "cmte" volume-flag)' + t = 'metadata tag "{}" is defined by "-mtm" or "-mtp", but is not used by "-mte" (or by any "cmte" volflag)' self.log(t.format(mtp), 1) errors = True diff --git a/copyparty/up2k.py b/copyparty/up2k.py index eabbe526..01a32c43 100644 --- a/copyparty/up2k.py +++ b/copyparty/up2k.py @@ -672,6 +672,11 @@ class Up2k(object): top = vol.realpath rei = vol.flags.get("noidx") reh = vol.flags.get("nohash") + + dev = 0 + if vol.flags.get("xdev"): + dev = bos.stat(top).st_dev + with self.mutex: reg = self.register_vpath(top, vol.flags) assert reg and self.pp @@ -689,11 +694,25 @@ class Up2k(object): excl += list(self.asrv.vfs.histtab.values()) if WINDOWS: excl = [x.replace("/", "\\") for x in excl] + else: + # ~/.wine/dosdevices/z:/ and such + excl += ["/dev", "/proc", "/run", "/sys"] rtop = absreal(top) n_add = n_rm = 0 try: - n_add = self._build_dir(db, top, set(excl), top, rtop, rei, reh, []) + n_add = self._build_dir( + db, + top, + set(excl), + top, + rtop, + rei, + reh, + [], + dev, + bool(vol.flags.get("xvol")), + ) n_rm = self._drop_lost(db.c, top, excl) except Exception as ex: db_ex_chk(self.log, ex, db_path) @@ -717,7 +736,13 @@ class Up2k(object): rei: Optional[Pattern[str]], reh: Optional[Pattern[str]], seen: list[str], + dev: int, + xvol: bool, ) -> int: + if xvol and not rcdir.startswith(top): + self.log("skip xvol: [{}] -> [{}]".format(top, rcdir), 6) + return 0 + if rcdir in seen: t = "bailing from symlink loop,\n prev: {}\n curr: {}\n from: {}" self.log(t.format(seen[-1], rcdir, cdir), 3) @@ -750,6 +775,9 @@ class Up2k(object): sz = inf.st_size if stat.S_ISDIR(inf.st_mode): rap = absreal(abspath) + if dev and inf.st_dev != dev: + self.log("skip xdev {}->{}: {}".format(dev, inf.st_dev, abspath), 6) + continue if abspath in excl or rap in excl: unreg.append(rp) continue @@ -758,7 +786,9 @@ class Up2k(object): continue # self.log(" dir: {}".format(abspath)) try: - ret += self._build_dir(db, top, excl, abspath, rap, rei, reh, seen) + ret += self._build_dir( + db, top, excl, abspath, rap, rei, reh, seen, dev, xvol + ) except: t = "failed to index subdir [{}]:\n{}" self.log(t.format(abspath, min_ex()), c=1) @@ -1109,7 +1139,7 @@ class Up2k(object): with self.mutex: cur.connection.commit() - # bail if a volume flag disables indexing + # bail if a volflag disables indexing if "d2t" in flags or "d2d" in flags: return 0, n_rm, True diff --git a/docs/notes.sh b/docs/notes.sh index 54054214..27d0e3ee 100644 --- a/docs/notes.sh +++ b/docs/notes.sh @@ -185,7 +185,7 @@ brew install python@2 pip install virtualenv # readme toc -cat README.md | awk 'function pr() { if (!h) {return}; if (/^ *[*!#|]/||!s) {printf "%s\n",h;h=0;return}; if (/.../) {printf "%s - %s\n",h,$0;h=0}; }; /^#/{s=1;pr()} /^#* *(file indexing|exclude-patterns|install on android|dev env setup|just the sfx|complete release|optional gpl stuff)|`$/{s=0} /^#/{lv=length($1);sub(/[^ ]+ /,"");bab=$0;gsub(/ /,"-",bab); h=sprintf("%" ((lv-1)*4+1) "s [%s](#%s)", "*",$0,bab);next} !h{next} {sub(/ .*/,"");sub(/[:,]$/,"")} {pr()}' > toc; grep -E '^## readme toc' -B1000 -A2 p1; grep -E '^## quickstart' -B2 -A999999 p2; (cat p1; grep quickstart -A1000 README.md; rm p1 p2 toc +cat README.md | awk 'function pr() { if (!h) {return}; if (/^ *[*!#|]/||!s) {printf "%s\n",h;h=0;return}; if (/.../) {printf "%s - %s\n",h,$0;h=0}; }; /^#/{s=1;pr()} /^#* *(install on android|dev env setup|just the sfx|complete release|optional gpl stuff)|`$/{s=0} /^#/{lv=length($1);sub(/[^ ]+ /,"");bab=$0;gsub(/ /,"-",bab); h=sprintf("%" ((lv-1)*4+1) "s [%s](#%s)", "*",$0,bab);next} !h{next} {sub(/ .*/,"");sub(/[:;,]$/,"")} {pr()}' > toc; grep -E '^## readme toc' -B1000 -A2 p1; grep -E '^## quickstart' -B2 -A999999 p2; (cat p1; grep quickstart -A1000 README.md; rm p1 p2 toc # fix firefox phantom breakpoints, # suggestions from bugtracker, doesnt work (debugger is not attachable)