Zine

slug url encoding problem for Chinese Chacters

slug url encoding problem for Chinese Chacters

From:
wsh
Date:
2010-06-20 @ 03:11
Subject:
slug url encoding problem for Chinese Chacters
hi, all

I'v successfully set up a Zine develop instance on my machine. The only 
problem is that if I leave a comment on a post which title contains 
Chinese Characters, after I submited my comment Zine show me a Page Not 
Found error page. I think there must be some url encoding problem, since 
there are Chinese Characters in the slug. I've checked the source code 
but can't figured out which endpoint this url is mapped to. Can anybody 
give some point?


Thanks

Re: slug url encoding problem for Chinese Chacters

From:
Armin Ronacher
Date:
2010-06-20 @ 08:46
Subject:
Re: slug url encoding problem for Chinese Chacters
Hi,

On 6/20/10 5:11 AM, wsh wrote:
> I'v successfully set up a Zine develop instance on my machine. The only
> problem is that if I leave a comment on a post which title contains
> Chinese Characters, after I submited my comment Zine show me a Page Not
> Found error page.
Two questions.  Are you using the development version?  There this 
should not happen.  It will fall back to numbers in that case. 
Alternatively you can disable ASCII-only slugs in the Admin panel.


Regards,
Armin

Re: slug url encoding problem for Chinese Chacters

From:
wsh
Date:
2010-06-20 @ 17:30
Subject:
Re: slug url encoding problem for Chinese Chacters
Armin Ronacher wrote:
> Hi,
>
> On 6/20/10 5:11 AM, wsh wrote:
>   
>> I'v successfully set up a Zine develop instance on my machine. The only
>> problem is that if I leave a comment on a post which title contains
>> Chinese Characters, after I submited my comment Zine show me a Page Not
>> Found error page.
>>     
> Two questions.  Are you using the development version?  There this 
> should not happen.  It will fall back to numbers in that case. 
> Alternatively you can disable ASCII-only slugs in the Admin panel.
>
>
> Regards,
> Armin
>
>   
Yes. I get the latest code from the codebase

$ hg clone http://dev.pocoo.org/hg/zine-main zine

I've try ASCII-only slugs two. It seems don't work. I checked the code and find some clue. 
The problem seems cased by the "_redirect_target" hidden field in the form. I captured http request headers in firefox as follows:

http://localhost:4000/2010/6/20/%E6%B1%89

GET /2010/6/20/%E6%B1%89 HTTP/1.1
Host: localhost:4000
User-Agent: Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: zh-cn,zh;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://localhost:4000/2010/6/20/%25E6%25B1%2589
Cookie: zine_session="ltMjGAFsomUqGyUae33eAa6W/Y4=?_expires=STEyNzk3MjI0ODkKLg==&lt=RjEyNzY5NTQ5ODguNjEwODY5OQou&pmt=STAxCi4=&uid=STEKLg=="
Cache-Control: max-age=0


The last segment of the line (http://localhost:4000/2010/6/20/%E6%B1%89) is the utf-8 code of a chinese character which are three bytes and it's fine. 
But the Referer header (Referer: http://localhost:4000/2010/6/20/%25E6%25B1%2589) is incorrect, it got a additional 25 for each byte. 
Function get_redirect_target in zine/utils/http.py use this Referer header to calculate value for "_redirect_target" hidden field. 
That's why the forms are redirected to wrong location. But I can't figure out where the additional "25" (which is a hex value for ascii % char) come from.


Regards,
Shuhao







 

Re: slug url encoding problem for Chinese Chacters

From:
Armin Ronacher
Date:
2010-06-20 @ 20:23
Subject:
Re: slug url encoding problem for Chinese Chacters
Hi,

On 6/20/10 7:30 PM, wsh wrote:
> That's why the forms are redirected to wrong location. But I can't
> figure out where the additional "25" (which is a hex value for ascii
> % char) come from.
May I ask what browser you are using?  I can try to debug that problem 
next week but I'm quite busy right now so in case you have any 
experiences with debugging Python apps any help would be greatly 
appreciated.  I suppose it is caused by either improperly 
encoding/decoding somewhere.


Regards,
Armin

Re: slug url encoding problem for Chinese Chacters

From:
Kiran Jonnalagadda
Date:
2010-06-26 @ 09:16
Subject:
Re: slug url encoding problem for Chinese Chacters
On Sun, Jun 20, 2010 at 11:00 PM, wsh <shuhao.w@gmail.com> wrote:

> I've try ASCII-only slugs two. It seems don't work. I checked the code and
> find some clue.
> The problem seems cased by the "_redirect_target" hidden field in the form.
> I captured http request headers in firefox as follows:
>

This may be a case of the _charset_ feature. See:

http://www.crazysquirrel.com/computing/general/form-encoding.jspx
https://bugzilla.mozilla.org/show_bug.cgi?id=18643

I had a quick look at the Zine tip source and didn't notice _charset_ being
used. Basically, the form needs to have a hidden field named _charset_, with
no value:

<input type="hidden" name="_charset_">

The browser will then submit this with the name of the charset encoding --
by default ISO-8859-1, but in UTF-8 if the page is so. I'm not sure what
happens if this field is missing, but most likely, it's not submitting in
UTF-8. Please add the input field to your template and see if that fixes it.

Re: slug url encoding problem for Chinese Chacters

From:
Armin Ronacher
Date:
2010-06-20 @ 13:01
Subject:
Re: slug url encoding problem for Chinese Chacters
Hi,

On 6/20/10 5:11 AM, wsh wrote:
> I'v successfully set up a Zine develop instance on my machine. The only
> problem is that if I leave a comment on a post which title contains
> Chinese Characters, after I submited my comment Zine show me a Page Not
> Found error page.
Two questions.  Are you using the development version?  There this 
should not happen.  It will fall back to numbers in that case. 
Alternatively you can disable ASCII-only slugs in the Admin panel.


Regards,
Armin