Ticket #1053 (new defect)
Encoding/Conversion fails when non-ASCII characters are used in any field of contacts, todos or events
|Reported by:||ThoMaus||Owned by:||dgollub|
|Component:||OpenSync: Format Conversion||Version:||0.22|
|Severity:||critical||Keywords:||encoding, conversion, synce|
|Cc:||ThoMaus, Graham, Cobb|
Environment in use
- OpenSync? 0.22 (Plugins: kdepim synce-legacy)
- SynCE -- issue occurs with 0.12 and 0.13
- Kontact 3.5.10
- OpenSuSE 11.1
- PDA with WinCE aka WM2003
- Sync between KDE-PIM and PDA is running fine, as long as no characters outside the ASCII set are used (on the PDA).
- As soon as the PDA data contains any non-ASCII, e.g. umlauts and the like, the conversion engine fails with 'invalid utf8 passed to VFormat. Limbing along.' The individual resulting XML data field is cut off at the position of the non-ASCII character -- the XML structure is undamaged.
- The entries ending up in Kontact's databases (std.ics and std.vcf) are encoded as UTF8 and the data is cut off where the intermediary XML data was cut off.
- Trace output from msynctool is suggesting that the intermediary vcal or vcard data is in Windows code page 1250 encoding (which is very similar to ISO-latin-1)
- A pcap-traffic-capture on the ppp0 interface shows that the data from the PDA is encoded as shown by the msynctool trace, with the notable exception that all characters are encoded as 2-byte entities but the note or comment fields of contacts, todos or events, which are encoded as single byte characters (but e. g. umlauts still have the identical binary representation besides the difference of the 1- or 2-byte-width).
- Non-ASCII traveling the opposite direction, i.e. from KDE PIM to the PDA, are:
- UTF-8-encoded in the PIM databases
- UTF-8-encoded in the intermediary vcal (for VEVENT or VTODO) or vcard (contacts)
- HTML-encoded Unicode in the intermediary XML (i.e. a diaresis becomes ä)
- junk on the PDA (i.e. a diaresis is displayed as two characters: the representations of capital A with tilde and currency sign)
This conversion problem renders OpenSync? useless for users relying on characters outside the ASCII set -- which should constitute a significant percentage of the potential user base. The problem exists at least for WM2003 devices but as it seems located in the central conversion routines it might impair other sync-plugins not using natively UTF-8, too.
Therefore I consider this a critical defect.
Given guidance I'm willing to investigate deeper into this and try to provide a (part of a) solution (see below).