1 (edited by jihem 2017-08-18 09:46:52)

Topic: [solved] LoadFromFile : Load a UTF8 file content as a string

Hi,

1°) The two following commands download a file. The content isn't the same (characters encoding problem?)

HTTPGetFile('http://radiofrance-podcast.net/podcast09/rss_10175.xml','rss.xml',false);

WriteLnToFile('rss.xml',HTTPGet('http://radiofrance-podcast.net/podcast09/rss_10175.xml'));

2°) When I try to load the content of the file with :

sl:=TStringList.Create;
sl.LoadFromFile('rss.xml');

The content (UTF-8) isn't properly decoded. Sample:
'D├®bat de quarante minutes abordant les grandes questions actuelles'
instead of
'Débat de quarante minutes abordant les grandes questions actuelles'

How can I solve this problem? Is there a way to specify the encoding of the loaded file (ie. sl.LoadFromFile('rss.xml','utf-8') ?

Kind regards,
jihem

while(! success=retry());
https://jihem.itch.io

Re: [solved] LoadFromFile : Load a UTF8 file content as a string

Hello.



Please download latest beta version
https://www.dropbox.com/s/4rfukqr2r1awq … b.zip?dl=0

I have made some change in the function WriteLnToFile

Dmitry.

3 (edited by jihem 2017-08-18 14:11:40)

Re: [solved] LoadFromFile : Load a UTF8 file content as a string

Hi,

Now, with the update (36b), I have the same result with HTTPGetFile and WriteLnFile, so the first point is solved.
Thanks :-)

I didn't find a function to decode the UTF8 file so I wrote my own.
Sample of use : ShowMessage(LoadFromFileUTF8('rss.xml'));

Regards,
jihem

function DecodeUTF8(const s:string):string;
var
  c0,c1,c2,c3,c,i,n,m,p:Integer;
  o0,o1,o2:Integer;
  b1,b2,b3:Boolean;
begin
  Result:='';

  if (Ord(s[p]) and 255)=0 then
  begin
    i:=2;
  end
  else
  begin
    i:=1;
  end;

  n:=Length(s);
  while i<=n do
  begin
    p:=i shr 1;
    m:=i and 1;

    o0:=Ord(s[p]);
    o1:=Ord(s[p+1]);
    if m=0 then
    begin
      c0:=o0 and 255;
      c1:=o0 shr 8;
      c2:=o1 and 255;
      c3:=o1 shr 8;
    end
    else
    begin
      o2:=Ord(s[p+2]);
      c0:=o0 shr 8;
      c1:=o1 and 255;
      c2:=o1 shr 8;
      c3:=o2 and 255;
    end;
    b1:=c1>=128;
    b2:=c2>=128;
    b3:=c3>=128;

    if (c0>=240) and b1 and b2 and b3 then
    begin
      c:=(((c0 and 15) shl 6) or ((c1 and 127)) shr 12) or ((c2 and 127) shl 6) or (c3 and 127);
      inc(i,4);
    end
    else
    begin
      if (c0>=224) and b1 and b2 then
      begin
        c:=((c0 and 31) shr 12) or ((c1 and 127) shl 6) or (c2 and 127);
        inc(i,3);
      end
      else
      begin
        if (c0>=192) and b1 then
        begin
          c:=((c0 and 63) shl 6) or (c1 and 127);
          inc(i,2);
        end
        else
        begin
          c:=c0 and 127;
          inc(i);
        end;
      end;
    end;
    Result:=Result+Chr(c);
  end;
end;

function LoadFromFileUTF8 (FileName: string):string;
var
  l:integer;
  s:string;
  fs:TFileStream;
begin
  l:=GetFileSize(FileName);
  if l>0 then
  begin
    fs:=TFileStream.Create(FileName,fmOpenRead );
    fs.Position := 0;
    SetLength(s,l); // l div 2+1);
    fs.Read(s, l);
    fs.Free;
    Result:=DecodeUTF8(s);
  end
  else
    Result:='';
end;
while(! success=retry());
https://jihem.itch.io