mirror of
https://github.com/9001/copyparty.git
synced 2025-08-17 09:02:15 -06:00
disable upload deduplication by default;
dedup is still encouraged and fully supported, but being default-enabled has caused too many surprises enabling `--dedup` restores the previous default behavior also renames `--never-symlink` to `--hardlink-only`
This commit is contained in:
parent
1111153f06
commit
a2e0f98693
39
README.md
39
README.md
|
@ -65,7 +65,8 @@ turn almost any device into a file server with resumable uploads/downloads using
|
|||
* [smb server](#smb-server) - unsafe, slow, not recommended for wan
|
||||
* [browser ux](#browser-ux) - tweaking the ui
|
||||
* [opengraph](#opengraph) - discord and social-media embeds
|
||||
* [file indexing](#file-indexing) - enables dedup and music search ++
|
||||
* [file deduplication](#file-deduplication) - enable symlink-based upload deduplication
|
||||
* [file indexing](#file-indexing) - enable music search, upload-undo, and better dedup
|
||||
* [exclude-patterns](#exclude-patterns) - to save some time
|
||||
* [filesystem guards](#filesystem-guards) - avoid traversing into other filesystems
|
||||
* [periodic rescan](#periodic-rescan) - filesystem monitoring
|
||||
|
@ -1155,9 +1156,41 @@ NOTE: because discord (and maybe others) strip query args such as `?raw` in open
|
|||
if you want to entirely replace the copyparty response with your own jinja2 template, give the template filepath to `--og-tpl` or volflag `og_tpl` (all members of `HttpCli` are available through the `this` object)
|
||||
|
||||
|
||||
## file deduplication
|
||||
|
||||
enable symlink-based upload deduplication globally with `--dedup` or per-volume with volflag `dedup`
|
||||
|
||||
when someone tries to upload a file that already exists on the server, the upload will be politely declined and a symlink is created instead, pointing to the nearest copy on disk, thus reducinc disk space usage
|
||||
|
||||
**warning:** when enabling dedup, you should also:
|
||||
* enable indexing with `-e2dsa` or volflag `e2dsa` (see [file indexing](#file-indexing) section below); strongly recommended
|
||||
* ...and/or `--hardlink-only` to use hardlink-based deduplication instead of symlinks; see explanation below
|
||||
|
||||
it will not be safe to rename/delete files if you only enable dedup and none of the above; if you enable indexing then it is not *necessary* to also do hardlinks (but you may still want to)
|
||||
|
||||
by default, deduplication is done based on symlinks (symbolic links); these are tiny files which are pointers to the nearest full copy of the file
|
||||
|
||||
you can choose to use hardlinks instead of softlinks, globally with `--hardlink-only` or volflag `hardlinkonly`;
|
||||
|
||||
advantages of using hardlinks:
|
||||
* hardlinks are more compatible with other software; they behave entirely like regular files
|
||||
* you can safely move and rename files using other file managers
|
||||
* symlinks need to be managed by copyparty to ensure the destinations remain correct
|
||||
|
||||
advantages of using symlinks (default):
|
||||
* each symlink can have its own last-modified timestamp, but a single timestamp is shared by all hardlinks
|
||||
* symlinks make it more obvious to other software that the file is not a regular file, so this can be less dangerous
|
||||
* hardlinks look like regular files, so other software may assume they are safe to edit without affecting the other copies
|
||||
|
||||
**warning:** if you edit the contents of a deduplicated file, then you will also edit all other copies of that file! This is especially surprising with hardlinks, because they look like regular files, but that same file exists in multiple locations
|
||||
|
||||
global-option `--xlink` / volflag `xlink` additionally enables deduplication across volumes, but this is probably buggy and not recommended
|
||||
|
||||
|
||||
|
||||
## file indexing
|
||||
|
||||
enables dedup and music search ++
|
||||
enable music search, upload-undo, and better dedup
|
||||
|
||||
file indexing relies on two database tables, the up2k filetree (`-e2d`) and the metadata tags (`-e2t`), stored in `.hist/up2k.db`. Configuration can be done through arguments, volflags, or a mix of both.
|
||||
|
||||
|
@ -1171,7 +1204,6 @@ through arguments:
|
|||
* `-e2v` verfies file integrity at startup, comparing hashes from the db
|
||||
* `-e2vu` patches the database with the new hashes from the filesystem
|
||||
* `-e2vp` panics and kills copyparty instead
|
||||
* `--xlink` enables deduplication across volumes
|
||||
|
||||
the same arguments can be set as volflags, in addition to `d2d`, `d2ds`, `d2t`, `d2ts`, `d2v` for disabling:
|
||||
* `-v ~/music::r:c,e2ds,e2tsr` does a full reindex of everything on startup
|
||||
|
@ -1184,7 +1216,6 @@ note:
|
|||
* upload-times can be displayed in the file listing by enabling the `.up_at` metadata key, either globally with `-e2d -mte +.up_at` or per-volume with volflags `e2d,mte=+.up_at` (will have a ~17% performance impact on directory listings)
|
||||
* `e2tsr` is probably always overkill, since `e2ds`/`e2dsa` would pick up any file modifications and `e2ts` would then reindex those, unless there is a new copyparty version with new parsers and the release note says otherwise
|
||||
* the rescan button in the admin panel has no effect unless the volume has `-e2ds` or higher
|
||||
* deduplication is possible on windows if you run copyparty as administrator (not saying you should!)
|
||||
|
||||
### exclude-patterns
|
||||
|
||||
|
|
|
@ -992,10 +992,10 @@ def add_upload(ap):
|
|||
ap2.add_argument("--reg-cap", metavar="N", type=int, default=38400, help="max number of uploads to keep in memory when running without \033[33m-e2d\033[0m; roughly 1 MiB RAM per 600")
|
||||
ap2.add_argument("--no-fpool", action="store_true", help="disable file-handle pooling -- instead, repeatedly close and reopen files during upload (bad idea to enable this on windows and/or cow filesystems)")
|
||||
ap2.add_argument("--use-fpool", action="store_true", help="force file-handle pooling, even when it might be dangerous (multiprocessing, filesystems lacking sparse-files support, ...)")
|
||||
ap2.add_argument("--dedup", action="store_true", help="enable symlink-based upload deduplication (volflag=dedup)")
|
||||
ap2.add_argument("--safe-dedup", metavar="N", type=int, default=50, help="how careful to be when deduplicating files; [\033[32m1\033[0m] = just verify the filesize, [\033[32m50\033[0m] = verify file contents have not been altered (volflag=safededup)")
|
||||
ap2.add_argument("--hardlink", action="store_true", help="prefer hardlinks instead of symlinks when possible (within same filesystem) (volflag=hardlink)")
|
||||
ap2.add_argument("--never-symlink", action="store_true", help="do not fallback to symlinks when a hardlink cannot be made (volflag=neversymlink)")
|
||||
ap2.add_argument("--no-dedup", action="store_true", help="disable symlink/hardlink creation; copy file contents instead (volflag=copydupes)")
|
||||
ap2.add_argument("--hardlink", action="store_true", help="enable hardlink-based dedup; will fallback on symlinks when that is impossible (across filesystems) (volflag=hardlink)")
|
||||
ap2.add_argument("--hardlink-only", action="store_true", help="do not fallback to symlinks when a hardlink cannot be made (volflag=hardlinkonly)")
|
||||
ap2.add_argument("--no-dupe", action="store_true", help="reject duplicate files during upload; only matches within the same volume (volflag=nodupe)")
|
||||
ap2.add_argument("--no-snap", action="store_true", help="disable snapshots -- forget unfinished uploads on shutdown; don't create .hist/up2k.snap files -- abandoned/interrupted uploads must be cleaned up manually")
|
||||
ap2.add_argument("--snap-wri", metavar="SEC", type=int, default=300, help="write upload state to ./hist/up2k.snap every \033[33mSEC\033[0m seconds; allows resuming incomplete uploads after a server crash")
|
||||
|
@ -1345,7 +1345,7 @@ def add_transcoding(ap):
|
|||
def add_db_general(ap, hcores):
|
||||
noidx = APPLESAN_TXT if MACOS else ""
|
||||
ap2 = ap.add_argument_group('general db options')
|
||||
ap2.add_argument("-e2d", action="store_true", help="enable up2k database, making files searchable + enables upload deduplication")
|
||||
ap2.add_argument("-e2d", action="store_true", help="enable up2k database; this enables file search, upload-undo, improves deduplication")
|
||||
ap2.add_argument("-e2ds", action="store_true", help="scan writable folders for new files on startup; sets \033[33m-e2d\033[0m")
|
||||
ap2.add_argument("-e2dsa", action="store_true", help="scans all folders on startup; sets \033[33m-e2ds\033[0m")
|
||||
ap2.add_argument("-e2v", action="store_true", help="verify file integrity; rehash all files and compare with db")
|
||||
|
@ -1358,7 +1358,7 @@ def add_db_general(ap, hcores):
|
|||
ap2.add_argument("--re-dhash", action="store_true", help="force a cache rebuild on startup; enable this once if it gets out of sync (should never be necessary)")
|
||||
ap2.add_argument("--no-forget", action="store_true", help="never forget indexed files, even when deleted from disk -- makes it impossible to ever upload the same file twice -- only useful for offloading uploads to a cloud service or something (volflag=noforget)")
|
||||
ap2.add_argument("--dbd", metavar="PROFILE", default="wal", help="database durability profile; sets the tradeoff between robustness and speed, see \033[33m--help-dbd\033[0m (volflag=dbd)")
|
||||
ap2.add_argument("--xlink", action="store_true", help="on upload: check all volumes for dupes, not just the target volume (volflag=xlink)")
|
||||
ap2.add_argument("--xlink", action="store_true", help="on upload: check all volumes for dupes, not just the target volume (probably buggy, not recommended) (volflag=xlink)")
|
||||
ap2.add_argument("--hash-mt", metavar="CORES", type=int, default=hcores, help="num cpu cores to use for file hashing; set 0 or 1 for single-core hashing")
|
||||
ap2.add_argument("--re-maxage", metavar="SEC", type=int, default=0, help="rescan filesystem for changes every \033[33mSEC\033[0m seconds; 0=off (volflag=scan)")
|
||||
ap2.add_argument("--db-act", metavar="SEC", type=float, default=10.0, help="defer any scheduled volume reindexing until \033[33mSEC\033[0m seconds after last db write (uploads, renames, ...)")
|
||||
|
@ -1621,6 +1621,7 @@ def main(argv: Optional[list[str]] = None, rsrc: Optional[str] = None) -> None:
|
|||
("--hdr-au-usr", "--idp-h-usr"),
|
||||
("--idp-h-sep", "--idp-gsep"),
|
||||
("--th-no-crop", "--th-crop=n"),
|
||||
("--never-symlink", "--hardlink-only"),
|
||||
]
|
||||
for dk, nk in deprecated:
|
||||
idx = -1
|
||||
|
@ -1645,7 +1646,7 @@ def main(argv: Optional[list[str]] = None, rsrc: Optional[str] = None) -> None:
|
|||
argv.extend(["--qr"])
|
||||
if ANYWIN or not os.geteuid():
|
||||
# win10 allows symlinks if admin; can be unexpected
|
||||
argv.extend(["-p80,443,3923", "--ign-ebind", "--no-dedup"])
|
||||
argv.extend(["-p80,443,3923", "--ign-ebind"])
|
||||
except:
|
||||
pass
|
||||
|
||||
|
|
|
@ -1891,6 +1891,11 @@ class AuthSrv(object):
|
|||
if len(zs) == 3: # fc5 => ffcc55
|
||||
vol.flags["tcolor"] = "".join([x * 2 for x in zs])
|
||||
|
||||
if vol.flags.get("neversymlink"):
|
||||
vol.flags["hardlinkonly"] = True # was renamed
|
||||
if vol.flags.get("hardlinkonly"):
|
||||
vol.flags["hardlink"] = True
|
||||
|
||||
for k1, k2 in IMPLICATIONS:
|
||||
if k1 in vol.flags:
|
||||
vol.flags[k2] = True
|
||||
|
@ -1995,9 +2000,6 @@ class AuthSrv(object):
|
|||
for x in drop:
|
||||
vol.flags.pop(x)
|
||||
|
||||
if vol.flags.get("neversymlink") and not vol.flags.get("hardlink"):
|
||||
vol.flags["copydupes"] = True
|
||||
|
||||
# verify tags mentioned by -mt[mp] are used by -mte
|
||||
local_mtp = {}
|
||||
local_only_mtp = {}
|
||||
|
@ -2076,6 +2078,8 @@ class AuthSrv(object):
|
|||
|
||||
have_e2d = False
|
||||
have_e2t = False
|
||||
have_dedup = False
|
||||
unsafe_dedup = []
|
||||
t = "volumes and permissions:\n"
|
||||
for zv in vfs.all_vols.values():
|
||||
if not self.warn_anonwrite or verbosity < 5:
|
||||
|
@ -2108,6 +2112,11 @@ class AuthSrv(object):
|
|||
if "e2t" in zv.flags:
|
||||
have_e2t = True
|
||||
|
||||
if "dedup" in zv.flags:
|
||||
have_dedup = True
|
||||
if "e2d" not in zv.flags and "hardlink" not in zv.flags:
|
||||
unsafe_dedup.append("/" + zv.vpath)
|
||||
|
||||
t += "\n"
|
||||
|
||||
if self.warn_anonwrite and verbosity > 4:
|
||||
|
@ -2120,10 +2129,17 @@ class AuthSrv(object):
|
|||
self.log("\n\033[{}\033[0m\n".format(t))
|
||||
|
||||
if not have_e2t:
|
||||
t = "hint: argument -e2ts enables multimedia indexing (artist/title/...)"
|
||||
t = "hint: enable multimedia indexing (artist/title/...) with argument -e2ts"
|
||||
self.log(t, 6)
|
||||
else:
|
||||
t = "hint: argument -e2dsa enables searching, upload-undo, and better deduplication"
|
||||
t = "hint: enable searching and upload-undo with argument -e2dsa"
|
||||
self.log(t, 6)
|
||||
|
||||
if unsafe_dedup:
|
||||
t = "WARNING: symlink-based deduplication is enabled for some volumes, but without indexing. Please enable -e2dsa and/or --hardlink to avoid problems when moving/renaming files. Affected volumes: %s"
|
||||
self.log(t % (", ".join(unsafe_dedup)), 3)
|
||||
elif not have_dedup:
|
||||
t = "hint: enable upload deduplication with --dedup (but see readme for consequences)"
|
||||
self.log(t, 6)
|
||||
|
||||
zv, _ = vfs.get("/", "*", False, False)
|
||||
|
|
|
@ -12,8 +12,7 @@ def vf_bmap() -> dict[str, str]:
|
|||
"dav_auth": "davauth",
|
||||
"dav_rt": "davrt",
|
||||
"ed": "dots",
|
||||
"never_symlink": "neversymlink",
|
||||
"no_dedup": "copydupes",
|
||||
"hardlink_only": "hardlinkonly",
|
||||
"no_dupe": "nodupe",
|
||||
"no_forget": "noforget",
|
||||
"no_pipe": "nopipe",
|
||||
|
@ -24,6 +23,7 @@ def vf_bmap() -> dict[str, str]:
|
|||
"safe_dedup": "safededup",
|
||||
}
|
||||
for k in (
|
||||
"dedup",
|
||||
"dotsrch",
|
||||
"e2d",
|
||||
"e2ds",
|
||||
|
@ -130,11 +130,11 @@ permdescs = {
|
|||
|
||||
flagcats = {
|
||||
"uploads, general": {
|
||||
"nodupe": "rejects existing files (instead of symlinking them)",
|
||||
"hardlink": "does dedup with hardlinks instead of symlinks",
|
||||
"neversymlink": "disables symlink fallback; full copy instead",
|
||||
"copydupes": "disables dedup, always saves full copies of dupes",
|
||||
"dedup": "enable symlink-based file deduplication",
|
||||
"hardlink": "enable hardlink-based file deduplication,\nwith fallback on symlinks when that is impossible",
|
||||
"hardlinkonly": "dedup with hardlink only, never symlink;\nmake a full copy if hardlink is impossible",
|
||||
"safededup": "verify on-disk data before using it for dedup",
|
||||
"nodupe": "rejects existing files (instead of symlinking them)",
|
||||
"sparse": "force use of sparse files, mainly for s3-backed storage",
|
||||
"daw": "enable full WebDAV write support (dangerous);\nPUT-operations will now \033[1;31mOVERWRITE\033[0;35m existing files",
|
||||
"nosub": "forces all uploads into the top folder of the vfs",
|
||||
|
@ -161,7 +161,7 @@ flagcats = {
|
|||
"lifetime=3600": "uploads are deleted after 1 hour",
|
||||
},
|
||||
"database, general": {
|
||||
"e2d": "enable database; makes files searchable + enables upload dedup",
|
||||
"e2d": "enable database; makes files searchable + enables upload-undo",
|
||||
"e2ds": "scan writable folders for new files on startup; also sets -e2d",
|
||||
"e2dsa": "scans all folders for new files on startup; also sets -e2d",
|
||||
"e2t": "enable multimedia indexing; makes it possible to search for tags",
|
||||
|
@ -179,7 +179,7 @@ flagcats = {
|
|||
"noforget": "don't forget files when deleted from disk",
|
||||
"fat32": "avoid excessive reindexing on android sdcardfs",
|
||||
"dbd=[acid|swal|wal|yolo]": "database speed-durability tradeoff",
|
||||
"xlink": "cross-volume dupe detection / linking",
|
||||
"xlink": "cross-volume dupe detection / linking (dangerous)",
|
||||
"xdev": "do not descend into other filesystems",
|
||||
"xvol": "do not follow symlinks leaving the volume root",
|
||||
"dotsrch": "show dotfiles in search results",
|
||||
|
|
|
@ -3143,7 +3143,7 @@ class Up2k(object):
|
|||
|
||||
linked = False
|
||||
try:
|
||||
if "copydupes" in flags:
|
||||
if not flags.get("dedup"):
|
||||
raise Exception("dedup is disabled in config")
|
||||
|
||||
lsrc = src
|
||||
|
@ -3181,7 +3181,7 @@ class Up2k(object):
|
|||
linked = True
|
||||
except Exception as ex:
|
||||
self.log("cannot hardlink: " + repr(ex))
|
||||
if "neversymlink" in flags:
|
||||
if "hardlinkonly" in flags:
|
||||
raise Exception("symlink-fallback disabled in cfg")
|
||||
|
||||
if not linked:
|
||||
|
@ -4308,7 +4308,7 @@ class Up2k(object):
|
|||
# this creates a link pointing from dabs to alink; alink may
|
||||
# not exist yet, which becomes problematic if the symlinking
|
||||
# fails and it has to fall back on hardlinking/copying files
|
||||
# (for example --no-dedup in a volume with symlinked dupes);
|
||||
# (for example a volume with symlinked dupes but no --dedup);
|
||||
# fsrc=sabs is then a source that currently resolves to copy
|
||||
|
||||
self._symlink(dabs, alink, flags, False, lmod=lmod or 0, fsrc=sabs)
|
||||
|
|
|
@ -260,6 +260,8 @@ IMPLICATIONS = [
|
|||
["e2vu", "e2v"],
|
||||
["e2vp", "e2v"],
|
||||
["e2v", "e2d"],
|
||||
["hardlink_only", "hardlink"],
|
||||
["hardlink", "dedup"],
|
||||
["tftpvv", "tftpv"],
|
||||
["smbw", "smb"],
|
||||
["smb1", "smb"],
|
||||
|
|
|
@ -16,7 +16,7 @@ open up notepad and save the following as `c:\users\you\documents\party.conf` (f
|
|||
```yaml
|
||||
[global]
|
||||
lo: ~/logs/cpp-%Y-%m%d.xz # log to c:\users\you\logs\
|
||||
e2dsa, e2ts, no-dedup, z # sets 4 flags; see expl.
|
||||
e2dsa, e2ts, z # sets 3 flags; see explanation
|
||||
p: 80, 443 # listen on ports 80 and 443, not 3923
|
||||
theme: 2 # default theme: protonmail-monokai
|
||||
lang: nor # default language: viking
|
||||
|
@ -46,11 +46,10 @@ open up notepad and save the following as `c:\users\you\documents\party.conf` (f
|
|||
|
||||
### config explained: [global]
|
||||
|
||||
the `[global]` section accepts any config parameters [listed here](https://ocv.me/copyparty/helptext.html), also viewable by running copyparty (either the exe or the sfx.py) with `--help`, so this is the same as running copyparty with arguments `--lo c:\users\you\logs\copyparty-%Y-%m%d.xz -e2dsa -e2ts --no-dedup -z -p 80,443 --theme 2 --lang nor`
|
||||
the `[global]` section accepts any config parameters [listed here](https://ocv.me/copyparty/helptext.html), also viewable by running copyparty (either the exe or the sfx.py) with `--help`, so this is the same as running copyparty with arguments `--lo c:\users\you\logs\copyparty-%Y-%m%d.xz -e2dsa -e2ts -z -p 80,443 --theme 2 --lang nor`
|
||||
* `lo: ~/logs/cpp-%Y-%m%d.xz` writes compressed logs (the compression will make them delayed)
|
||||
* `e2dsa` enables the upload deduplicator and file indexer, which enables searching
|
||||
* `e2dsa` enables the file indexer, which enables searching and upload-undo
|
||||
* `e2ts` enables music metadata indexing, making albums / titles etc. searchable too
|
||||
* `no-dedup` writes full dupes to disk instead of symlinking, since lots of windows software doesn't handle symlinks well
|
||||
* but the improved upload speed from `e2dsa` is not affected
|
||||
* `z` enables zeroconf, making the server available at `http://HOSTNAME.local/` from any other machine in the LAN
|
||||
* `p: 80,443` listens on the ports `80` and `443` instead of the default `3923`
|
||||
|
|
|
@ -20,7 +20,7 @@ cat $f | awk '
|
|||
o{next}
|
||||
/^#/{s=1;rs=0;pr()}
|
||||
/^#* *(nix package)/{rs=1}
|
||||
/^#* *(install on android|dev env setup|just the sfx|complete release|optional gpl stuff|nixos module)|`$/{s=rs}
|
||||
/^#* *(themes|install on android|dev env setup|just the sfx|complete release|optional gpl stuff|nixos module)|```/{s=rs}
|
||||
/^#/{
|
||||
lv=length($1);
|
||||
sub(/[^ ]+ /,"");
|
||||
|
|
|
@ -117,10 +117,10 @@ class Cfg(Namespace):
|
|||
def __init__(self, a=None, v=None, c=None, **ka0):
|
||||
ka = {}
|
||||
|
||||
ex = "chpw daw dav_auth dav_inf dav_mac dav_rt e2d e2ds e2dsa e2t e2ts e2tsr e2v e2vu e2vp early_ban ed emp exp force_js getmod grid gsel hardlink ih ihead magic never_symlink nid nih no_acode no_athumb no_dav no_db_ip no_dedup no_del no_dupe no_lifetime no_logues no_mv no_pipe no_poll no_readme no_robots no_sb_md no_sb_lg no_scandir no_tarcmp no_thumb no_vthumb no_zip nrand nw og og_no_head og_s_title q rand smb srch_dbg stats uqe vague_403 vc ver write_uplog xdev xlink xvol zs"
|
||||
ex = "chpw daw dav_auth dav_inf dav_mac dav_rt e2d e2ds e2dsa e2t e2ts e2tsr e2v e2vu e2vp early_ban ed emp exp force_js getmod grid gsel hardlink ih ihead magic hardlink_only nid nih no_acode no_athumb no_dav no_db_ip no_del no_dupe no_lifetime no_logues no_mv no_pipe no_poll no_readme no_robots no_sb_md no_sb_lg no_scandir no_tarcmp no_thumb no_vthumb no_zip nrand nw og og_no_head og_s_title q rand smb srch_dbg stats uqe vague_403 vc ver write_uplog xdev xlink xvol zs"
|
||||
ka.update(**{k: False for k in ex.split()})
|
||||
|
||||
ex = "dotpart dotsrch hook_v no_dhash no_fastboot no_fpool no_htp no_rescan no_sendfile no_snap no_voldump re_dhash plain_ip"
|
||||
ex = "dedup dotpart dotsrch hook_v no_dhash no_fastboot no_fpool no_htp no_rescan no_sendfile no_snap no_voldump re_dhash plain_ip"
|
||||
ka.update(**{k: True for k in ex.split()})
|
||||
|
||||
ex = "ah_cli ah_gen css_browser hist js_browser js_other mime mimes no_forget no_hash no_idx nonsus_urls og_tpl og_ua"
|
||||
|
|
Loading…
Reference in a new issue