previous approach:
* cache 64K on first read
* cache 1M on subsequent intersecting reads
new approach:
* cache 64K on first read
* cache 1M on the next intersecting read
* cache 8M on subsequent intersecting reads
* cache 4M on standalone reads at offsets >1M
improves performance by 50% on windows
and should help on high-latency connections
drop chunk-hashes in the up2k snap, plus other insignificant attribs
to reduce both the snapfile size and the ram usage by about 90%
reduces startup/shutdown time by a lot since there's less to serdes
(does not affect -e2d which was already optimal)
other changes:
* improve incoming-eta accuracy when the initial handshake
was made a long time before the upload actually started
* move the list of incoming files in the controlpanel to the top
* do not absreal paths unless necessary
* do not determine username if no users configured
* impacket 0.12 fixed the foldersize limit, but now
you get extremely poor performance in large folders
so the previous workaround is still default-enabled
* pyz: yeet the resource tar which is now pointless thanks to pkgres
* cache impresource stuff because pyz lookups are Extremely slow
* prefer tx_file when possible for slightly better performance
* use hardcoded list of expected resources instead of dynamic
discovery at runtime; much simpler and probably safer
* fix some forgotten resources (copying.txt, insecure.pem)
* fix loading jinja templates on windows
add support for reading webdeps and jinja-templates using either
importlib_resources or pkg_resources, which removes the need for
extracting these to a temporary folder on the filesystem
* util: add helper functions to abstract embedded resource access
* http*: serve embedded resources through resource abstraction
* main: check webdeps through resource abstraction
* httpconn: remove unused method `respath(name)`
* use __package__ to find package resources
* util: use importlib_resources backport if available
* pass E.pkg as module object for importlib_resources compatibility
* util: add pkg_resources compatibility to resource abstraction
* show media tags in shares
* html hydrator assumed a folder named `foo.txt` was a doc
* due to sessions, use `pwd` as password placeholder on services
* exponentially slow upload handshakes caused by lack of rd+fn
sqlite index; became apparent after a volume hit 200k files
* listing big folders 5% faster due to `_quotep3b`
* optimize `unquote`, 20% faster but only used rarely
* reindex on startup 150x faster in some rare cases
(same filename in MANY folders)
the database is now around 10% larger (likely worst-case)
reduce the overhead of function-calls from the client thread
to the svchub singletons (up2k, thumbs, metrics) down to 14%
and optimize up2k chunk-receiver to spend 5x less time bookkeeping
which restores up2k performance to before introducing incoming-ETA
dedup is still encouraged and fully supported, but
being default-enabled has caused too many surprises
enabling `--dedup` restores the previous default behavior
also renames `--never-symlink` to `--hardlink-only`
symlinks between volumes will only be created if xlink is
enabled, so such symlinks should be ignored if xlink is
disabled, as they might originate from other software
this prevents accidental rewriting of non-dedup symlinks
if --no-dedup was enabled in a volume which already contained
symlinked duplicate files, renaming/moving folders could fail
this is due to folder contents being moved one file at a time
(which is how symlink breakage is prevented) except the links
are moved assuming the final directory layout, meaning they
may be intermittently broken during the movie
with no-dedup, the symlinks are converted into full files as
each symlink is encountered, but a temporarily broken symlink
would crash the procedure
fix this by giving `_symlink` a new parameter `fsrc`
which is a known valid inode for data copying purposes
previously, the assumption was made that the database and filesystem
would not desync, and that an upload could safely be substituted with
a symlink to an existing copy on-disk, assuming said copy still
existed on-disk at all
this is fine if copyparty is the only software that makes changes to
the filesystem, but that is a shitty assumption to make in hindsight
add `--safe-dedup` which takes a "safety level", and by default (50)
it will no longer blindly expect that the filesystem has not been
altered through other means; the file contents will now be hashed
and compared to the database
deduplication can be much slower as a result, but definitely worth it
as this avoids some potentially very unpleasant surprises
the previous behavior can be restored with `--safe-dedup 1`