1 (edited by mathmathou 2016-10-23 07:12:51)

Topic: [Solved] - Comparing array values with...

Hello Dmitry and all MVD fans,


I feel a bit dumb with my question because I have a valid solution to my problem, but it is so long that I would like to speed up the process.


Here is the situation :

1- From a web site, I get a string, delimited by comas, containing a long list of reference numbers. When I say long, I'm talking about more than 25.000 items. They are handled as text
2- I split this string into an array
3 - for each value of the array, I query the database looking for a match.
It looks like this :

var
    i, known : Integer;
begin
    for i := 0 to Length(sku_array) - 1 do
        begin
            known := SQLExecute('SELECT COUNT(id) FROM asset WHERE asset_sku = "'+sku_array[i]+'"');
                if known = 0 then <some code> else <some other code>;
        end;
end;

This is my code, it works, I'm even proud of it smile, but it's sooooooooooo slow !!
On a core i7 machine, running the system on a SSD drive, it can take up to 10 minutes to loop through all the results. To long for checking this on a daily basis.


So, to speed up the process, I thought about :
- handling the comparison in memory
- avoiding 25.000 queries and using just one to create a Dataset;

This way, once all the data in memory (RAM), the comparison should be quicker, right ?

So I imagined something like :

var
    sku_array : Array of String;
    Results : TDataSet;
    i : Integer;
begin
    SQLQuery('SELECT asset_sku FROM asset',Results);
        While not Results.Eof do
            begin
                for i := 0 to Length(sku_array) - 1 do
                    begin
                        if sku_array[i] = Results.FieldByName('asset_sku').AsString then <some code> else <some other code>;
                    end;
                Results.Next;
            end;
    Results.Free
end;

And when I launch it, it loops and loops and loops.... I've killed the process each time before the end, but I suspect it's going to loop something like 25.000 x 25.000 times smile


So I made some research, and tried plenty of others things but failed miserably.... sad


What would be the best way to achieve this, keeping in mind that what I want to detect are array values (from the web page) that are not present in my database (the dataset) ?


I don't need a full example, pseudo code with the main logic could do the trick.


Thank you all for your assistance, I'll post processing time comparisons when I'll have the solution wink


Cheers


Mathias

I'm a very good housekeeper !
Each time I get a divorce, I keep the house

Zaza Gabor

Re: [Solved] - Comparing array values with...

Woooohooooooo !!


Found a solution to do the comparison in less than 30 seconds smile


I'll post it tonight once at home.


Cheers


Mathias

I'm a very good housekeeper !
Each time I get a divorce, I keep the house

Zaza Gabor

Re: [Solved] - Comparing array values with...

You can try use first method but you should create a index for the database field "asset_sku".

Dmitry.

Re: [Solved] - Comparing array values with...

Hello my friends,


As promised, I'll detail the solution I found, just in case someone else meets the same problem one day.


WHAT WAS THE PROBLEM

The problem was to compare a list of 25.000+ entries with another list of more or less 25.000 entries to find new ones.
One list comes from a web page, the other one from a local database, and the goal was to determine if the list from the web contained new elements.


FIRST APPROACH OF THE PROBLEM

The first approach (which worked) was the following :
1- retrieve the data from the web
2- split the huge coma separated string into an array of n elements
3- loop through the elements of the array with, for each one, a query to the database to see if the element was found or not

something like :

var
    i, known : Integer;
begin
    for i := 0 to Length(sku_array) - 1 do
        begin
            known := SQLExecute('SELECT COUNT(id) FROM asset WHERE asset_sku = "'+sku_array[i]+'"');
                if known = 0 then <some code> else <some other code>;
        end;
end;

This works fine, the only problem being that this process took a long time to complete.
On my machine (core i7, 16Go of Ram, SSD hard drive), it took this morning 852 seconds- almost 15 minutes.....


SECOND APPROACH : COMPARING DATA IN MEMORY

To speed up the process, I imagined comparing data in memory and, instead of launching 25.000+ queries to the database (one for each element of the array), use just one big query to get in memory all the elements of the database to compare with the elements of the array already in memory. That should be quicker right ?
So the idea was :
1- retrieve the data from the web and split the string to get an array of 25.000+elements
2- create a dataset of 25.000+ elements with just one query on the local database
3- while the query still add elements (not Eof) compare each element of the array with the result of the query
This gave me something like :

var
    sku_array : Array of String;
    Results : TDataSet;
    i : Integer;
begin
    SQLQuery('SELECT asset_sku FROM asset',Results);
        While not Results.Eof do
            begin
                for i := 0 to Length(sku_array) - 1 do
                    begin
                        if sku_array[i] = Results.FieldByName('asset_sku').AsString then <some code> else <some other code>;
                    end;
                Results.Next;
            end;
    Results.Free
end;

Result ? The solution was worth than the problem because I create a loop of 25.000 iterations checking 25.000 elements which gives me a total of 625.000.000 checks to perform !!! Each of the 25.000 elements of the array was checked 25.000 times against the local data and, even in memory, this takes a huge amount of time. Honestly I killed the process each time before the end, so I have no timing to give you as a comparison....
The main problem of this approach ? I should have written a function to stop the loop once a match have been found, to avoid checkinh useless results.


FINAL APPROACH : THE SOLUTION WAS...
Idealy, I wanted to use native functions that would do the comparison loops without my coding them. Something like
if element in array then...
But the IN array does not seem to be implemented yet (Dmitry will correct me if I'm wrong) so I used something else.
1- get the data from the web, which gives me a coma separated string
2- transform that simple string into a TStringList with : OnlineList.CommaText := string
3- create a dataset of local results with just one query
4- fill a second TStringList with the results of the query
5- now I have 2 TStringList, time to compare their elements but, instead of looping again through the results, use a native function : IndexOf().
This function of the TStringList class (which is a sub-class of the TSrings class) gives you the index in a list of a string (array and lists are numbered from 0 to n). If no index is found, meaning the string is not present, then the function returns -1
This gave me :

begin
    SQLQuery('SELECT asset_sku FROM asset',Results);
    localList := TStringList.Create;
        while not Results.Eof do
            begin
                localList.add(Results.FieldByName('asset_sku').AsString);
                Results.Next;
            end;
         Results.Free;

    onlineList := TStringList.Create;
    onlineList.CommaText := <some code to get the online string>;
    try
        try
            for j := 0 to onlineList.count - 1 do
                begin
                    if localList.IndexOf(onlineList.strings(j)) = -1 then <some code because this is a new element>);
                end;
        except
            ShowMessage(Exceptionmessage);
        end;
    finally
        localList.Free;
        onlineList.Free;
    end;
end;

Same amount of online data : 25.000+ elements
Same amount of local data : more or less 25.000 elements as well
Treatment and comparison time to find all the new elements of the online list : 27 seconds on the same machine.


I'm very pleased because this is nearly a 30 times decrease in treatment time and now a acceptable process to run and perform that check on a daily basis.


Hope this was interesting and clear enough to understand.
Dmitry, do not hesitate to comment or correct if you find any flows or better approach.


Cheers


Mathias

I'm a very good housekeeper !
Each time I get a divorce, I keep the house

Zaza Gabor

Re: [Solved] - Comparing array values with...

mathmathou
Thank you for the solution )

Dmitry.

Re: [Solved] - Comparing array values with...

Thanls Dmitry smile


Was there another solution according to you ?

I'm a very good housekeeper !
Each time I get a divorce, I keep the house

Zaza Gabor

Re: [Solved] - Comparing array values with...

i can see what you were talking about.
I want to use something similar for my project, where i have "pre-invoices" and as soon as a customer accepts one of those , it transfers all the info, to the "final invoice"
It should also compare the stock to see if anything falls below 0

Re: [Solved] - Comparing array values with...

Hi Vasco,
There may be some specific reasons why you can't do it this way - but rather than use 2 tables (pre-invoice and final-invoice - both presumably with a similar structure), could you not have just one table (invoice) and then use a status indicator to show if it is pre-invoice or final-invoice (or any other status you may want)?  The attachement is a very quick example of how it could work.
Just a thought.
Derek.

Post's attachments

Attachment icon vascostatus.zip 337.29 kb, 497 downloads since 2017-11-02 

Re: [Solved] - Comparing array values with...

Thanks Derek, let me take a look at it.

I Think that i know how to do this in my mind, but i need to take a look at examples.

I have a table with invoices, i have invoice nº1, invoice nº2, and invoice nº3 for THIS customer.
Customer accepts invoice nº3 , i then  use a combobox that allows me to tell that invoice nº3 is the way to go.
So..., i then have a script that automaticly checks if everything that i put on invoice nº3 is in stock, and retrieves those items from the stock, if something is 0 it then adds what parts i need to continue my order.

The problem is that i have several lines on invoices, otherwise it would be easy

I need somethign that can read all the id's , and do a batch job sql updating every single id to their new quantities...

would any of you guys help me put this into a few code lines so i can start messing arround?
Cheers

Re: [Solved] - Comparing array values with...

ah Derek, yes i saw your example, yes i will use just one sql table, but i am using two tablegrids , so that's the mix up smile