Update Poly1305 to match the DJB approach for Fp arithmetic. This improves performance and correctness. Also fix wrong output when passing an empty string.