Welcome Guest ( Log In | Register )

 Forum Rules Joomla Scraper support
 
Reply to this topicStart new topic
> How To Configure Scraper To Grab The Right Content, how to configure grabber
Web Design Seo
post Jun 29 2011, 08:22 AM
Post #1


Web Design Seo
****

Group: Root Admin
Posts: 3,326
Joined: 29-April 09
From: Sofia
Member No.: 1



You must configure scraper if scraper function is enabled. Is not enough only to switch on.

How to configure scraper?

1. Configure rss feed.
2. Open Scraper part and switch ON scraper part of aggregator: Enable - YES.



3. Open one content item in website where point links from rss feed. For example rss feed:
Код
http://feeds.nytimes.com/nyt/rss/HomePage


Example content item that is in this rss feed:
Код
http://www.nytimes.com/2011/06/30/world/asia/30afghanistan.html


4. Right click on page and click View Source to see html code of this page.


5. You must find some string in html code to set up Starting string. For this page this is:
Код
<nyt_correction_top>


This tag must be unique - must be in html code only once!
Is not required to be html tag, you must configure some exact string. If you do't know html, dont worry! This must be just some unique string, so, extension will work ok also with:
Код
<nyt_correction_top

or with other non-closed html tag.

6. You must find some string in html code to set up End string. For this page this is:
Код
<nyt_author_id>

or
Код
<div id="pageLinks">

or
Код
<div id="pageLinks


Is not required to be valid html tag, you must configure some exact string. Is recommended to be unique string!


Ready. Test scraper and grab some content. If you have configured the right tags, scraper will work. All content between start tag and end tag will be imported.

P. S. If you dont want to strip html, option "Strip html Tags" must be set to: No.
Image Relative URL prefix is only for websites that are with relative paths to pictures in content. If Website is with relative paths to pictures, just enter in this field domain from where you grab content:
Код
http://web-site.com/



7. Image Relative URL prefix
Option must be used only if in source code path to images is not full, example:
Код
<img src="directory/picture.jpg">


In this case you can add "Image Relative URL prefix":
Код
http://www.bbc.co.uk/


and in your site picture will be shown from:
Код
http://www.bbc.co.uk/directory/picture.jpg



8. "Permalink - search for" and "Permalink - replace with"
Option must be used only if end urls of content articles is different than urls in rss feed.

Example:

if url in feed is:
Код
http://www.feeds.bbc.co.uk/news/uk-england-london-18781322

and end url is:
Код
http://www.bbc.co.uk/news/uk-england-london-18781322


"Permalink - search for" must be:
Код
http://www.feeds.bbc.co.uk


and "Permalink - replace with" must be:
Код
http://www.bbc.co.uk



When will not work Scraper?
- If you don't have configure starting string and end string
- Starting string and end string must be unique, you must find on page string in html code that is unique. If string is not unique, scraper will grab content from page from first place where this string is present
- Scraper will not work if link to full content item in rss feed is not the same like the real content item url. "Permalink - search for" and "Permalink - replace with" fileds are the way to solve this, but work only in some cases.


Цитат
For some rss feeds that use system of redirects and special protection is not possible to grab full content! There is no warranty that scraper will work with every feed!

Our Google Plus pages: | 3D Web Design in Google+


--------------------
Правила на форума | Forum Rules | How to receive support. 3D Web Design: Уеб дизайн, Seo оптимизация, Web Site Extensions, Oscommerce Addons, Wordpress plugins and Joomla Extensions. Изработка на уеб сайтове и оптимизация на сайт за търсачки и Seo услуги.
Go to the top of the page
 
+Quote Post
ataman79
post Oct 3 2011, 02:27 PM
Post #2


Newbie
*

Group: Members
Posts: 21
Joined: 26-November 10
Member No.: 399



Hello ,

I'm using the scrapper version and I'm interesting when I grab rss (with or without turned on Scrapper function) how can I remove some tags from the grabbed news? Something like <img> tag , or other tags?

|thank you in advance
Go to the top of the page
 
+Quote Post
Web Design Seo
post Oct 3 2011, 02:42 PM
Post #3


Web Design Seo
****

Group: Root Admin
Posts: 3,326
Joined: 29-April 09
From: Sofia
Member No.: 1



Hello. You can do this with use of fields "Strip html tags" and "Allowed html tags". See on screen within first post.


--------------------
Правила на форума | Forum Rules | How to receive support. 3D Web Design: Уеб дизайн, Seo оптимизация, Web Site Extensions, Oscommerce Addons, Wordpress plugins and Joomla Extensions. Изработка на уеб сайтове и оптимизация на сайт за търсачки и Seo услуги.
Go to the top of the page
 
+Quote Post
jono
post Dec 14 2011, 01:47 PM
Post #4


Newbie
*

Group: Members
Posts: 1
Joined: 14-December 11
Member No.: 999



Can you please give us a newer example??cause i am trying to follow this one and it seems to be different as a result i can make my scraper work proper..

thanks in advanced

Jono
Go to the top of the page
 
+Quote Post
Web Design Seo
post Dec 14 2011, 02:44 PM
Post #5


Web Design Seo
****

Group: Root Admin
Posts: 3,326
Joined: 29-April 09
From: Sofia
Member No.: 1



There is no news about these functions. What type of example you need?

In latest Scraper you can configure two or more start strings and end strings in this way:
Код
<div class="content">|<other_tag class="other_class">


In field "Starting string" now you can enter many html tags, enter every next with separator |


--------------------
Правила на форума | Forum Rules | How to receive support. 3D Web Design: Уеб дизайн, Seo оптимизация, Web Site Extensions, Oscommerce Addons, Wordpress plugins and Joomla Extensions. Изработка на уеб сайтове и оптимизация на сайт за търсачки и Seo услуги.
Go to the top of the page
 
+Quote Post
cromaplus
post May 17 2013, 07:45 AM
Post #6


Newbie
*

Group: Members
Posts: 5
Joined: 12-May 13
Member No.: 1,684



QUOTE (Web Design Seo @ Jun 29 2011, 09:22 AM) *
You must configure scraper if scraper function is enabled. Is not enough only to switch on.

How to configure scraper?

1. Configure rss feed.
2. Open Scraper part and switch ON scraper part of aggregator: Enable - YES.



3. Open one content item in website where point links from rss feed. For example rss feed:
CODE
http://feeds.nytimes.com/nyt/rss/HomePage


Example content item that is in this rss feed:
CODE
http://www.nytimes.com/2011/06/30/world/asia/30afghanistan.html


4. Right click on page and click View Source to see html code of this page.


5. You must find some string in html code to set up Starting string. For this page this is:
CODE
<nyt_correction_top>


This tag must be unique - must be in html code only once!
Is not required to be html tag, you must configure some exact string.


6. You must find some string in html code to set up End string. For this page this is:
CODE
<nyt_author_id>

Is not required to be valid html tag, you must configure some exact string. Is recommended to be unique string!


Ready. Test scraper and grab some content. If you have configured the right tags, scraper will work. All content between start tag and end tag will be imported.

P. S. If you dont want to strip html, option "Strip html Tags" must be set to: No.
Image Relative URL prefix is only for websites that are with relative paths to pictures in content. If Website is with relative paths to pictures, just enter in this field domain from where you grab content:
CODE
http://web-site.com/



7. Image Relative URL prefix
Option must be used only if in source code path to images is not full, example:
CODE
<img src="directory/picture.jpg">


In this case you can add "Image Relative URL prefix":
CODE
http://www.bbc.co.uk/


and in your site picture will be shown from:
CODE
http://www.bbc.co.uk/directory/picture.jpg



8. "Permalink - search for" and "Permalink - replace with"
Option must be used only if end urls of content articles is different than urls in rss feed.

Example:

if url in feed is:
CODE
http://www.feeds.bbc.co.uk/news/uk-england-london-18781322

and end url is:
CODE
http://www.bbc.co.uk/news/uk-england-london-18781322


"Permalink - search for" must be:
CODE
http://www.feeds.bbc.co.uk


and "Permalink - replace with" must be:
CODE
http://www.bbc.co.uk



When will not work Scraper?
- If you don't have configure starting string and end string
- Starting string and end string must be unique, you must find on page string in html code that is unique. If string is not unique, scraper will grab content from page from first place where this string is present
- Scraper will not work if link to full content item in rss feed is not the same like the real content item url. "Permalink - search for" and "Permalink - replace with" fileds are the way to solve this, but work only in some cases.


Good morning everyone, I am trying to insert the news in the component k2, the preview I see the photos and the text correctly, the amount of news but I see it in the photo k2 as well as in the attached picture do not understand where I'm wrong, someone tell me how fix the problem?
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

Collapse

> Similar Topics

  Topic Replies Topic Starter Views Last Action
No new Pinned: How To Install Ioncube Loader
If Ioncube Loader Is Not Installed In Your Hosting
5 Web Design Seo 7,070 Today, 10:47 AM
Last post by: Web Design Seo
No New Posts How To Import Pdf-files
How to import PDF-files
1 Peter 78 31st March 2014 - 11:08 AM
Last post by: Web Design Seo
No New Posts Pinned: How To Make Money With Joomla And Sports Predictions
Joomla money site maker, Joomla money sites builder guide
0 Web Design Seo 560 15th February 2014 - 08:15 AM
Last post by: Web Design Seo
No New Posts How To Download Upgrade
1 fernando ferreira 210 27th January 2014 - 06:57 AM
Last post by: Web Design Seo
No New Posts How To Test Post By Email Component
configuration of example email box
3 Web Design Seo 3,944 19th November 2013 - 02:05 PM
Last post by: Web Design Seo
No New Posts How Do I Change The Domain My Subscription Is Tied To?
1 ryanoutboard 1,031 5th June 2013 - 06:20 AM
Last post by: Web Design Seo
No new Scraper Do Not Import .webp Image Extension
5 Juan Vicente Pascual 1,470 16th May 2013 - 06:59 AM
Last post by: Juan Vicente Pascual
No new How To Alter Trackback Url
How to alter trackback url
4 mallchick 3,137 17th April 2013 - 06:34 AM
Last post by: Web Design Seo
No New Posts How To Delete User Predictions From Result Table?
2 andrai2 894 5th April 2013 - 03:30 PM
Last post by: andrai2
No New Posts How To Unpublish Mach Prediction Before Game Starts..?
Unpublish mach prediction beforo game starts..?
2 andrai2 898 3rd April 2013 - 08:34 AM
Last post by: andrai2


 



RSS Lo-Fi Version Time is now: 24th April 2014 - 11:50 PM

Web Analytics