Welcome to TalkGraphics.com
Results 1 to 3 of 3
  1. #1
    Join Date
    Jan 2007
    Posts
    14

    Default html entities decode

    Hi!
    I export website, open HTML file, and see: <title>Группа компаний АЛЛ</title> <-- russian letters in code page win-1251
    But all text after <body> - html entities: <span class="xr_tr" style="left: -188px; top: -11px; width: 188px;">&# 1059;&# 1051;&# 1068;&# 1058;&# 1056;&# 1040</span>

    How export all web site text in win-1251? I test this: 9. Another request was to allow overwriting of the declared used char set. This is done by applying the name that begins with "charset=" to any object. For example: "charset=UTF-8". Affects all pages in the site. But this dont work too.
    Excuse for mine poor English.

  2. #2
    Join Date
    Aug 2004
    Location
    Ukraine
    Posts
    3,904

    Default Re: html entities decode

    The page content is always encoded in Unicode. You can not change it.
    XML strings are always exported in your OS codepage and this codepage is set in the content meta tag.
    Overriding the meta tag does not influence the way page is exported. No matter what charset is declared, the page content will be rendered correctly as it is encoded in Unicode. But you have to control special strings manually to comply the declared charset.
    John.

  3. #3
    Join Date
    Jan 2007
    Posts
    14

    Default Re: html entities decode

    Thank. I shall look then for the converter.

 

 

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •