Index > Scribe > Aspell: Invalid UTF-8 Sequence (Linux) | |
---|---|
Author/Date | Aspell: Invalid UTF-8 Sequence (Linux) |
RoDen 13/07/2022 12:15pm | Encoding is hard-coded in SpellCheckAspell.cpp
s += "iso-8859-1.cmap"; Should be taken from charset cp1251 Otherwise non-ISO dictionaries are processed incorrectly: ../Code/SpellCheckAspell.cpp:970 - Dictionary for 'bg' install attempt 1 ../Code/SpellCheckAspell.cpp:770 - Downloading 'ftp://ftp.gnu.org/gnu/aspell/dict/bg/aspell6-bg-4.1-0.tar.bz2' to '/home/user/src/scribe/trunk/Linux/Aspell/dict/bg/aspell6-bg-4.1-0.tar.bz2' ../Code/SpellCheckAspell.cpp:868 - Decompressing '/home/user/src/scribe/trunk/Linux/Aspell/dict/bg/aspell6-bg-4.1-0.tar.bz2' ../Code/SpellCheckAspell.cpp:868 - Decompressing '/home/user/src/scribe/trunk/Linux/Aspell/dict/bg/aspell6-bg-4.1-0.tar' ../Code/SpellCheckAspell.cpp:868 - Decompressing '/home/user/src/scribe/trunk/Linux/Aspell/dict/bg/bg.cwl' Warning: The string "��������" is invalid. Invalid UTF-8 sequence at position 1. Skipping string. Warning: The string "���������" is invalid. Invalid UTF-8 sequence at position 1. Skipping string. Warning: The string "�����" is invalid. Invalid UTF-8 sequence at position 1. Skipping string. Warning: The string "�������" is invalid. Invalid UTF-8 sequence at position 1. Skipping string. Warning: The string "�������" is invalid. Invalid UTF-8 sequence at position 1. Skipping string. Warning: The string "��������" is invalid. Invalid UTF-8 sequence at position 1. Skipping string. Also, most Aspell dictionaries (except Danish, English, German, Greek & Portuguese) are outdated. It's better to use precompiled dictionaries from local distributions (which are better maintained) if it's possible. Or advise users to do so. |
fret 13/07/2022 12:25pm | Encoding is hard-coded in SpellCheckAspell.cpp This is in SetupPaths(), which is just checking for the presence of cmap files. I just chose a random one to test for (as opposed to all of them I guess). That 'iso-8859-1' doesn't effect the encoding or decoding at all. The warnings you're seeing are likely due to bad input more so than issues with aspell. Are they easily reproducible? Maybe one particular email? Like, what are you doing at the time they show up in the console? |
RoDen 13/07/2022 6:54pm | The warnings show up when I try to install any non-ISO dictionary. After the installation all such dictionaries are empty because all the strings with non-ISO symbols are skipped.
And I've managed to successfully install the Russian dictionary with the following changes: s += "koi8-r.cmap"; // s += "iso-8859-1.cmap"; Also, spell checking doesn't work in the "Text" tab. |
Reply | |