mirror of
https://github.com/modoboa/modoboa.git
synced 2026-04-26 09:26:00 +03:00
[GH-ISSUE #782] Problem when converting from byte to UTF8 #703
Labels
No labels
bug
bug
dependencies
design
documentation
duplicate
enhancement
enhancement
enhancement
feedback-needed
help-needed
help-needed
installer
invalid
looking-for-sponsors
modoboa-contacts
new-ui
new-ui
pr
pull-request
pyconfr
python
question
security
stale
webmail
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/modoboa-modoboa#703
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @carragom on GitHub (Nov 13, 2015).
Original GitHub issue: https://github.com/modoboa/modoboa/issues/782
Hi there,
I have some data in my amavis database that's rendering modoboa unusable. All parts of the system throw an internal server error. I have traced the issue to the file
modoboa/extensions/amavis/sql_conector.py. The problem happens with all queries that use the functionconvert_from(maddr.email, 'UTF8'). Turns out if I manually run the query directly in PostgreSQL it throws the error:There seems to be data in there that's not UTF8 compliant. At first I saw #677 and upgraded to 1.2.2 thinking it was the same problem. But this did not fixed the issue. In the end I replaced all
convert_fromfunctions inmodoboa/extensions/amavis/sql_conector.pyto useLATIN1instead ofUTF8and the issue is now fixed.In the end, the problematic character was a
ñ. I know RFC6531 states that email addresses could be encoded inUTF8but amavis does not seem to be playing by those rules; yet?. An immediate solution is to switch toLATIN1and hope it's the right encoding.As an alternate solution the data could be retrieved in binary form from the database and converted in python maybe even a django filter that would fallback to something like "invalid address" if the conversion fails. But this would force modoboa to retrive all record from the database and in my case that would be around 7K and counting.
Hope this helps.
Cheers.
@tonioo commented on GitHub (Nov 16, 2015):
Hi,
as for the Quarantine.mail_text field, we could try to use a BinaryField and to remove all calls to convert_from.
Do you think you could try it ?
@tonioo commented on GitHub (Nov 27, 2015):
@carragom ping
@carragom commented on GitHub (Dec 18, 2015):
Hi @tonioo sorry for the absence. At least in version
1.2.2Quarantine.mail_textalready is aBinaryFieldas shown here or am I looking at the wrong place ?. Maybe I did not understand what you meant ?@tonioo commented on GitHub (Dec 18, 2015):
Hi @carragom, your problem seems to be related to the email field:
https://github.com/modoboa/modoboa-amavis/blob/master/modoboa_amavis/models.py#L20
@carragom commented on GitHub (Dec 18, 2015):
@tonioo Yes that's the field causing the problems. Like I said, I see two options to fix this:
1- Keep the conversion in the database as it's now, but ask the database to convert to
LATIN1instead ofUTF8in all occurrences of theconvert_fromfunction here. I did this and it's working for me. I'm not sure if this is the right encoding but for sure it's better than usingUTF8that we already know it breaks.2- Switch to using
BinaryFieldfor theMaddr.emailand handle the conversion in python. Just keep in mind that the number of rows in theMaddrtable grows fast, when I reported this a month ago the table was around 7K rows, right now it's sitting at 12K rows. So doing this conversion out of the database could be a performance issue.I have been looking around here to see if there is any indication on what encoding it's actually used without luck. But it does mention that the option
$sql_allow_8bit_addressneeds to be set to use this field asbytea. SoLATIN1sounds like a safe encoding to use.If you ask me I would just go option number 1 which is simple to implement and have no performance issues.
I can provide a quick PR for option 1 if you decide to go with it.
Let me know.
@tonioo commented on GitHub (Jan 27, 2016):
@carragom Using LATIN1 as encoding won't cover all cases. I guess we will encounter the same issues with another encoding soon. I still think a BinaryField is the right answer and I do hope Django uses the appropriate field when it generates queries. The manual conversion you see in the current code would also disappear.
@tonioo commented on GitHub (Jan 27, 2016):
@carragom BTW, the right place for this issue is into the https://github.com/modoboa/modoboa-amavis repository.
@carragom commented on GitHub (Jan 27, 2016):
@tonioo I agree that LATIN1 does not cover all cases and it's far from ideal. The one thing for sure is that this problem renders Modoboa unusable, all of it, not just the amavis module. So this should be fixed in any way necessary.
It might be possible that amavis does not intent for these fields to be used as text, from the amavis README.sql-pg.txt:
Thanks a lot for your time.
@tonioo commented on GitHub (Jan 28, 2016):
Please look at this thread (the end of the page is interesting):
https://code.djangoproject.com/ticket/2417
And this commit (django source code):
github.com/django/django@8ee1eddb7eAnd tell me what do you think :)
@carragom commented on GitHub (Jan 29, 2016):
Yes using a
BinaryFieldis definitively an option see here. But switching toBinaryFieldalone is not enough. Every custom query usingconvert_fromneeds to be replaced with something that fetches the data from the table and filter's it on the python side. This means probably rewriting this entire class.In any case, it does not matter what type of field is used or where the conversion happens (db or app) at some point those bytes on the database will have to be converted to text in order to be useful and the conversion will require a character set.
UTF8is not the right charset for that data and currently breaks the entire application. The main objective here is to find a way where the application does not break even if the conversion fails.Again I see two options:
1- Find a way to handle the conversion gracefully at the database level (maybe a stored procedure would help here or just use
LATIN1as charset which is working for me and seems to be what amavis is using).2- Use a
BinaryFieldand move the entire logic of converting/filtering the data to the web app which is inefficient and a lot of work and will still break if we keep trying to useUTF8as charset.Again thanks for your time, I hope I was a bit more clear this time.
Cheers.
@tonioo commented on GitHub (Jun 9, 2016):
This issue was moved to modoboa/modoboa-amavis#35