1 (edited by jihem 2017-08-22 11:17:17)

Topic: [solved] Request for native UTF8 functions

Hi,
In the topic "[solved] LoadFromFile : Load a UTF8 file content as a string" I made a function to load a file (UTF8) into a string.
It works but is very slow : 58 seconds for a file of 400Ko.

I made the same in C# (and call it from MVD), the job is done in less than 1s (?!).

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var utf8 = File.ReadAllText("rss.xml");
            using (StreamWriter sw = new StreamWriter(File.Open("rss.out", FileMode.Create), Encoding.Unicode))
            {
                sw.WriteLine(utf8);
            }
        }
    }
}

Dmitry could you, please, include fast native functions to encode/decode UTF8 string (previously loaded with TFileStream) ?

Regards,
jihem

while(! success=retry());
https://jihem.itch.io

Re: [solved] Request for native UTF8 functions

Please attach your project where I can test it.

Dmitry.

3 (edited by jihem 2017-08-19 08:33:06)

Re: [solved] Request for native UTF8 functions

Hi Dmitry,
You can download the project here :http://codyssea.com/downloads/podcastor.zip .


You can click "import" button to test. The program download a rss.xml file (UTF8 encoded). It loads the file and decodes it (from UTF8 to unicode) and stores the result in a MVD string. Then some data are extracted from the xml and stored in the sqlite file (import table). The log table stores the content of the call to log function (so you can see the time each step require).


I have added the version which uses the my .Net program to convert UTF8 to Unicode :http://codyssea.com/downloads/podcastor-net.zip .


I'm sad to have to use .Net when everything can be done with MVD but it's realy too slow without.
I hope you could add fast native EncodeToUTF8 and DecodeFromUTF8 functions to MVD.


One of my whishes is to have the ability to call external dll from MVD so we don't have to bother you each time we need something.


Thanks for you help

Regards,
jihem

while(! success=retry());
https://jihem.itch.io

Re: [solved] Request for native UTF8 functions

Check it out

function LoadFromFileUTF8 (FileName: string):string;
var
  sl: TStringList;
begin
  sl := TStringList.Create;
  sl.LoadFromFile(FileName);
  result := sl.Text;
  sl.Free;
end;
Dmitry.

5 (edited by jihem 2017-08-21 21:38:25)

Re: [solved] Request for native UTF8 functions

Hi,
It was my first attempt. Works like a charm for unicode files but doesn't decode UTF8 files properly...

"Qui veut vraiment résoudre la crise nord-coréenne ?"
instead of
"Qui veut vraiment résoudre la crise nord-coréenne ?"

Regards,
jihem

while(! success=retry());
https://jihem.itch.io

Re: [solved] Request for native UTF8 functions

jihem wrote:

Hi,
It was my first attempt. Works like a charm for unicode files but does'nt decode UTF8 files properly...

"Qui veut vraiment résoudre la crise nord-coréenne ?"
instead of
"Qui veut vraiment résoudre la crise nord-coréenne ?"

Regards,
jihem

please make test project to reproduce this problem.

Dmitry.

7 (edited by jihem 2017-08-21 21:33:31)

Re: [solved] Request for native UTF8 functions

DriveSoft wrote:

please make test project to reproduce this problem.

http://codyssea.com/downloads/LoadFromFileUTF8.zip

http://codyssea.com/downloads/LoadFromFileUTF8.PNG

Your sample works well with unicode but not for UTF8 files.

I think you should add 2 native convert functions: EncodeUTF8 and DecodeUTF8 (like the DecodeUTF8 provided in the podcastor projects) to convert to UTF8 (back and forth). It's not a bug. It's a lack of conversion functions. The Web usualy use UTF8 file format (specially ajax calls).

I found this page for Delphi : http://docwiki.embarcadero.com/RADStudi … sion_UTF-8
Adding DetectUTF8Encoding could be a good idea too.

I don't want to be rude. And I am sorry to insist. I got this issue with several languages and it was always solved by using or including these functions (like I did in the previous sample projects). The only problem is that native functions are much faster than the one I wrote in MVD. So please, would you like to include them?

Be sure, I'm very grateful for your help.
Regards,
jihem

while(! success=retry());
https://jihem.itch.io

Re: [solved] Request for native UTF8 functions

Please download latest beta version 3.6b, I  made some changes:
https://www.dropbox.com/s/4rfukqr2r1awq … b.zip?dl=0

Dmitry.

Re: [solved] Request for native UTF8 functions

Well done. It works.
Thanks a lot :-)

while(! success=retry());
https://jihem.itch.io

Re: [solved] Request for native UTF8 functions

Bonjour Jean-Marc,


I've been pocking around your script and found two interesting details.


The first one is that assertion at the beginning :

uses 'toolbox.pas';   

I found the file in the script folder but was wondering if you managed to really use it by including it at the beginning of your script, or if it was just some leftover code you forgot to delete smile


Second this is that 'wbGetFile.exe' application that you distribute with your compiled project.

Would you mind showing us how you managed to pass the download URL from MVD to that application ?


Cheers


Mathias (de Nouvelle Calédonie)

I'm a very good housekeeper !
Each time I get a divorce, I keep the house

Zaza Gabor

11 (edited by jihem 2017-09-01 18:13:20)

Re: [solved] Request for native UTF8 functions

Hi,

Toolbox.pas contains some things I have done to manage :
- INI (store window position...),
- SQL (escape quote...),
- Strings (additional functions to extract text within html/xml node, ...),
...
and which are very usefull. I include it (or part of it) in all my projects (with another one called drivesoft.pas which contains usefull things found on the forum).

The wbGetFile.exe download a file : wbGetFile <url of the document to download> <filename>

The full projet is a rss stream downloader http://codyssea.com/index.php/2017/08/20/podcastor.
I have made wbGetFile to download the files in the background (without blocking the app) because the content of rss stream can be really big...

I use a timer to manage the process with some steps : to check if there is something to download, to launch wdGetFile (with OpenFile), to wait from the completion,... and to loop.

In fact, I use MVD to build the UI of my applications and process the data in background with some exe (made with MVD or other languages because MVD isn't multithread and I/O are blocking). The first version of wdGetFile was made with .Net (but I made another with purebasic to have less dependencies).

And finaly I use InstallCreator to build the install.

Regards,
jihem

while(! success=retry());
https://jihem.itch.io