global-option `--no-clone` / volflag `noclone` entirely disables
serverside deduplication; clients will then fully upload dupe files
can be useful when `--safe-dedup=1` is not an option due to other
software tampering with the on-disk files, and your filesystem has
prohibitively slow or expensive reads
drop chunk-hashes in the up2k snap, plus other insignificant attribs
to reduce both the snapfile size and the ram usage by about 90%
reduces startup/shutdown time by a lot since there's less to serdes
(does not affect -e2d which was already optimal)
other changes:
* improve incoming-eta accuracy when the initial handshake
was made a long time before the upload actually started
* move the list of incoming files in the controlpanel to the top
* exponentially slow upload handshakes caused by lack of rd+fn
sqlite index; became apparent after a volume hit 200k files
* listing big folders 5% faster due to `_quotep3b`
* optimize `unquote`, 20% faster but only used rarely
* reindex on startup 150x faster in some rare cases
(same filename in MANY folders)
the database is now around 10% larger (likely worst-case)
dedup is still encouraged and fully supported, but
being default-enabled has caused too many surprises
enabling `--dedup` restores the previous default behavior
also renames `--never-symlink` to `--hardlink-only`
symlinks between volumes will only be created if xlink is
enabled, so such symlinks should be ignored if xlink is
disabled, as they might originate from other software
this prevents accidental rewriting of non-dedup symlinks
if --no-dedup was enabled in a volume which already contained
symlinked duplicate files, renaming/moving folders could fail
this is due to folder contents being moved one file at a time
(which is how symlink breakage is prevented) except the links
are moved assuming the final directory layout, meaning they
may be intermittently broken during the movie
with no-dedup, the symlinks are converted into full files as
each symlink is encountered, but a temporarily broken symlink
would crash the procedure
fix this by giving `_symlink` a new parameter `fsrc`
which is a known valid inode for data copying purposes
previously, the assumption was made that the database and filesystem
would not desync, and that an upload could safely be substituted with
a symlink to an existing copy on-disk, assuming said copy still
existed on-disk at all
this is fine if copyparty is the only software that makes changes to
the filesystem, but that is a shitty assumption to make in hindsight
add `--safe-dedup` which takes a "safety level", and by default (50)
it will no longer blindly expect that the filesystem has not been
altered through other means; the file contents will now be hashed
and compared to the database
deduplication can be much slower as a result, but definitely worth it
as this avoids some potentially very unpleasant surprises
the previous behavior can be restored with `--safe-dedup 1`
* v1.13.8 broke collision resolving for non-identical files;
the correct filename was reserved but not symlinked to
the original file, leaving a zerobyte file instead.
See v1.14.3 github release notes for remediation info
* add sanchecks for early detection of index/fs desync;
saves performance and gives less confusing logs
if files (one or more) are selected for sharing, then
a virtual folder is created to hold the selected files
if a single file is selected for sharing, then
the returned URL will point directly to that file
and fix some shares-related bugs:
* password coalescing
* log-spam on reload
* wark landed in the wrong registry when moved to another volume
(harmless; upload would succeed on the next handshake)
* dedup did not apply correctly when moved into another volume,
since all the checks were done based on the previous vol;
fix this by recursing the whole thing
also update the reloc example after some real-world experience
Reported-by: @daniiooo
hooks can now interrupt or redirect actions, and initiate
related actions, by printing json on stdout with commands
mainly to mitigate limitations such as sharex/sharex#3992
xbr/xau can redirect uploads to other destinations with `reloc`
and most hooks can initiate indexing or deletion of additional
files by giving a list of vpaths in json-keys `idx` or `del`
there are limitations;
* xbu/xau effects don't apply to ftp, tftp, smb
* xau will intentionally fail if a reloc destination exists
* xau effects do not apply to up2k
also provides more details for hooks:
* xbu/xau: basic-uploader vpath with filename
* xbr/xar: add client ip
v1.13.5 made some proxies angry with its massive chunklists
when stitching chunks, only list the first chunk hash in full,
and include a truncated hash for the consecutive chunks
should be enough for logfiles to make sense
and to smoketest that clients are behaving
compile to bytecode so cpython doesn't have to keep it in memory
ram usage reduced by:
* min: 5.4 MiB (32.6 to 27.2)
* ac/im: 5.2 MiB (39.0 to 33.8)
* dj/iv: 10.6 MiB (67.3 to 56.7)
startup time reduced from:
* min: 1.3s to 0.6s
* ac/im: 1.6s to 0.9s
* dj/iv: 2.0s to 1.1s
image size increased by 4 MiB (min), 6 MiB (ac/im/iv), 9 MiB (dj)
ram usage measured on idle with:
while true; do ps aux | grep -E 'R[S]S|no[-]crt'; read -n1; echo; done
startup time measured with:
time podman run --rm -it localhost/copyparty-min-amd64 --exit=idx
* wait until page (au) has loaded to register hotkeys
* hotkey `m` would grow sidebar if tree was minimized
* more exact warning about num.parallel uploads
* keep more console logs in memory
* message phrasing
* progress donuts should include inflight bytes
* changes to stitch-size in settings didn't apply until next refresh
* serverlog was too verbose; truncate chunk hashes
* mention absolute cloudflare limit in readme
rather than sending each file chunk as a separate HTTP request,
sibling chunks will now be fused together into larger HTTP POSTs
which results in unreasonably huge speed boosts on some routes
( `2.6x` from Norway to US-East, `1.6x` from US-West to Finland )
the `x-up2k-hash` request header now takes a comma-separated list
of chunk hashes, which must all be sibling chunks, resulting in
one large consecutive range of file data as the post body
a new global-option `--u2sz`, default `1,64,96`, sets the target
request size as 64 MiB, allowing the settings ui to specify any
value between 1 and 96 MiB, which is cloudflare's max value
this does not cause any issues for resumable uploads; thanks to the
streaming HTTP POST parser, each chunk will be verified and written
to disk as they arrive, meaning only the untransmitted chunks will
have to be resent in the event of a connection drop -- of course
assuming there are no misconfigured WAFs or caching-proxies
the previous up2k approach of uploading each chunk in a separate HTTP
POST was inefficient in many real-world scenarios, mainly due to TCP
window-scaling behaving erratically in some IXPs / along some routes
a particular link from Norway to Virginia,US is unusably slow for
the first 4 MiB, only reaching optimal speeds after 100 MiB, and
then immediately resets the scale when the request has been sent;
connection reuse does not help in this case
on this route, the basic-uploader was somehow faster than up2k
with 6 parallel uploads; only time i've seen this
hooks can be restricted to users with certain permissions, for example
`--xm aw,notify-send` will only `notify-send` if user has write-access
the user's list of permissions are now also included in the json
that is passed to the hook if enabled; `--xm aw,j,notify-send`
will now also stop parsing flags when encountering a blank value,
allowing to specify any initial arguments to the command:
`--xm aw,j,,notify-send,hey` would run `notify-send` with `hey`
as its first argument, and the json would be the 2nd argument,
similarly `--xm ,notify-send,hey` when no flags specified
this is somewhat explained in `--help-hooks`, but
additional related features are planned in the near future
and will all be better documented when the dust settles
use sigmasks to block SIGINT, SIGTERM, SIGUSR1 from all other threads
also initiate shutdown by calling sighandler directly,
in case this misses anything and that is still unreliable
(discovered by `--exit=idx` being noop once in a blue moon)
if a given filesystem were to disappear (e.g. removable storage)
followed by another filesystem appearing at the same location,
this would not get noticed by up2k in a timely manner
fix this by discarding the mtab cache after `--mtab-age` seconds and
rebuild it from scratch, unless the previous values are definitely
correct (as indicated by identical output from `/bin/mount`)
probably reduces windows performance by an acceptable amount
* hasher thread could die if a client would rapidly
upload and delete files (so very unlikely)
* two unprotected calls to register_vpath which was
almost-definitely safe because the volumes
already existed in the registry
the default (256 KiB) appears optimal in the most popular scenario
(linux host with storage on local physical disk, usually NVMe)
was previously a mix of 64 and 512 KiB;
now the same value is enforced everywhere
download-as-tar is now 20% faster with the default value
to abort an upload, refresh the page and access the unpost tab,
which now includes unfinished uploads (sorted before completed ones)
can be configured through u2abort (global or volflag);
by default it requires both the IP and account to match
https://a.ocv.me/pub/g/nerd-stuff/2024-0310-stoltzekleiven.jpg
this improves performance on s3-backed volumes
noktuas reported on discord that the upload performance was
unexpectedly poor when writing to an s3 bucket through a JuiceFS
fuse-mount, only getting 1.5 MiB/s with copyparty, meanwhile a
regular filecopy averaged 30 MiB/s plus
the issue was that s3 does not support sparse files, so copyparty
would fall back to sequential uploading, and also disable fpool,
causing JuiceFS to repeatedly commit the same 5 MiB range to
the storage provider as each chunk arrived from the client
by forcing use of sparse files, s3 adapters such as JuiceFS and
geesefs will "only" write the entire file to s3 *twice*, initially
it writes the full filesize of zerobytes (depending on adapter,
hopefully using gzip compression to reduce the bandwidth necessary)
and then the actual file data in an adapter-specific chunksize
with this volflag, copyparty appears to reach the full expected speed
nothing dangerous, just confusing log messages if an
admin hammers the reload button 100+ times per second,
or another linux process rapidly sends SIGUSR1
some cifs servers cause sqlite to fail in interesting ways; any attempt
to create a table can instantly throw an exception, which results in a
zerobyte database being created. During the next startup, the db would
be determined to be corrupted, and up2k would invoke _backup_db before
deleting and recreating it -- except that sqlite's connection.backup()
will hang indefinitely and deadlock up2k
add a watchdog which fires if it takes longer than 1 minute to open the
database, printing a big warning that the filesystem probably does not
support locking or is otherwise sqlite-incompatible, then writing a
stacktrace of all threads to a textfile in the config directory
(in case this deadlock is due to something completely different),
before finally crashing spectacularly
additionally, delete the database if the creation fails, which should
prevents the deadlock on the next startup, so combine that with a
message hinting at the filesystem incompatibility
the 1-minute limit may sound excessively gracious, but considering what
some of the copyparty instances out there is running on, really isn't
this was reported when connecting to a cifs server running alpine
thx to abex on discord for the detailed bug report!
features which should be good to go:
* user groups
* assigning permissions by group
* dynamically created volumes based on username/groupname
* rebuild vfs when new users/groups appear
but several important features still pending;
* detect dangerous configurations
* dynamic vol below readable path
* remember volumes created during previous runs
* helps prevent unintended access
* correct filesystem-scan on startup
some clients (clonezilla-webdav) rapidly create and delete files;
this fails if copyparty is still hashing the file (usually the case)
and the same thing can probably happen due to antivirus etc
add global-option --rm-retry (volflag rm_retry) specifying
for how long (and how quickly) to keep retrying the deletion
default: retry for 5sec on windows, 0sec (disabled) on everything else
because this is only a problem on windows
when a file was reindexed (due to a change in size or last-modified
timestamp) the uploader-IP would get removed, but the upload timestamp
was ported over. This was intentional so there was probably a reason...
new behavior is to keep both uploader-IP and upload timestamp if the
file contents are unchanged (determined by comparing warks), and to
discard both uploader-IP and upload timestamp if that is not the case
due to all upload APIs invoking up2k.hash_file to index uploads,
the uploads could block during a rescan for a crazy long time
(past most gateway timeouts); now this is mostly fire-and-forget
"mostly" because this also adds a conditional slowdown to
help the hasher churn through if the queue gets too big
worst case, if the server is restarted before it catches up, this
would rely on filesystem reindexing to eventually index the files
after a restart or on a schedule, meaning uploader info would be
lost on shutdown, but this is usually fine anyways (and this was
also the case until now)
* make gen_tree 0.1% faster
* improve filekey warning message
* fix oversight in 0c50ea1757
* support `--xdev` on windows (the python docs mention that os.scandir
doesn't assign st_ino, st_dev and st_nlink on win but i can't read)
* permission `.` grants dotfile visibility if user has `r` too
* `-ed` will grant dotfiles to all `r` accounts (same as before)
* volflag `dots` likewise
also drops compatibility for pre-0.12.0 `-v` syntax
(`-v .::red` will no longer translate to `-v .::r,ed`)
when moving/deleting a file, all symlinked dupes are verified to ensure
this action does not break any symlinks, however it did this by checking
the realpath of each link. This was not good enough, since the deleted
file may be a part of a series of nested symlinks
this situation occurs because the deduper tries to keep relative
symlinks as close as possible, only traversing into parent/sibling
folders as required, which can lead to several levels of nested links
webdav clients tend to upload and then immediately delete
files to test for write-access and available disk space,
so don't crash and burn when that happens
* cpp_uptime is now a gauge
* cpp_bans is now cpp_active_bans (and also a gauge)
and other related fixes:
* stop emitting invalid cpp_disk_size/free for offline volumes
* support overriding the spec-mandatory mimetype with ?mime=foo