Topic: Parsing Web page

Hello Dmitry and all MVD fans,

The million dollars question...

Is it possible, at the moment (or in the future) to parse a web page ?

For example, I enter the URL of a page, and save into the database, info from the page based on the tags contained on the page's code ?

Ooooh God, I hope the answer is yes smile

Cheers

Math

I'm a very good housekeeper !
Each time I get a divorce, I keep the house

Zaza Gabor

Re: Parsing Web page

Hello,


I added function HTTPGet, please download beta version
http://myvisualdatabase.com/forum/viewt … 497#p10497



For parsing you can use string functions.

Dmitry.

Re: Parsing Web page

Thank you Dmitry for this addition to your next release.

I had time to test with a very small program : a Textbox where the user fills the URL and a button to launch HTTPGet.

It works fine with adresses like yours : http://myvisualdatabase.com/

But if you try something else like : http://docs.daz3d.com/doku.php/public/r … 8695/start

here is what you get :

http://i.imgur.com/vW4MA60.jpg

I checked the packets and the main difference between the acess on the same adresse with Firefox and MVD is this :

Date: Tue, 10 Nov 2015 02:19:48 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=d14c57bebef44f07a5f47839e85244dd21447121988; expires=Wed, 09-Nov-16 02:19:48 GMT; path=/; domain=.daz3d.com; HttpOnly
Cache-Control: max-age=10
Expires: Tue, 10 Nov 2015 02:19:58 GMT
X-Frame-Options: SAMEORIGIN
Server: cloudflare-nginx
CF-RAY: 242e464b54a419ce-SYD

cd9
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Access denied | docs.daz3d.com used CloudFlare to restrict access</title>

I think the problem is here :
Set-Cookie: __cfduid=d14c57bebef44f07a5f47839e85244dd21447121988; expires=Wed, 09-Nov-16 02:19:48 GMT; path=/;

I think MVD does not handle cookies and Cloudflare somehow restrict the access.
Also, MVD impersonate itself as Mozilla/3.0, might also be a problem ?

User-Agent: Mozilla/3.0 (compatible; Indy Library)

Thank you for your patience and any help you could give.

Cheers

Math

I'm a very good housekeeper !
Each time I get a divorce, I keep the house

Zaza Gabor

Re: Parsing Web page

mathmathou
Thank you for the bug report, I changed User-Agent and now it works.
Please redownload beta version.

Dmitry.

Re: Parsing Web page

Hello Dmitry and Thank you so much for your reactivity !!

I downloaded the new beta and tried it with a few URL, it looks to work fine.

I'll do some more extensive tested latter on today and come back to you.

As always you've been fast and efficient, thank you again.

Sincerely

Math

I'm a very good housekeeper !
Each time I get a divorce, I keep the house

Zaza Gabor

6 (edited by mathmathou 2015-11-15 02:10:06)

Re: Parsing Web page

Oups...

Hello Dmitry,

You have done a miracle by adding the HTTPget function up and running.

It works fine with adresses in http

But with adresses in https, the answer is :

http://i.imgur.com/SkurmRp.jpg

As I assume you are using iDHTTP, do you think you could add IdSSLOpenSSL as well ?

That'd be awesome !!

Have a good weekend

Cheers

Mathias

I'm a very good housekeeper !
Each time I get a divorce, I keep the house

Zaza Gabor

Re: Parsing Web page

I added https support, but to use HTTPS protocol you need to include the libeay32.dll and ssleay32.dll in your application folder or Windows system folder.


here you can download libeay32.dll and ssleay32.dll
https://indy.fulgan.com/SSL/



please download latest beta version
http://myvisualdatabase.com/forum/viewt … ?pid=10497

Dmitry.

Re: Parsing Web page

Hello Dmitry,

Had the opportunity to test what you added to the beta, and everything works fine.

You're a magician !!

Thanks again and cheers

Mathias

I'm a very good housekeeper !
Each time I get a divorce, I keep the house

Zaza Gabor