Printable Version of Topic

Click here to view this topic in its original format

Web Design Seo Forum _ Joomla Scraper _ Joomla Scraper, Grabber For Joomla

Posted by: Web Design Seo May 18 2011, 04:09 PM

3D Web Design present best aggregator in the world with article spinner and grabber: Joomla Scraper, developed by 3D Web Design Scraper for Joomla.

http://3dwebdesign.org/joomla-scraper is high technology grabber and Aggregator that can aggregate and import in database of Joomla many rss feeds and FULL TEXT of original content item (where point link to content from rss feed).

Цитат
Joomla Scraper is First 100% Google Panda/Penguin safe aggregator in the world!


Differences between Joomla Scraper and Aggregator Platinum:

First of all, Joomla Scraper have FULL K2 integration - import images (BMP files are NOT supported!) and tags in k2. Aggregator Platinum have only partial integration with k2.

Joomla Scraper have all of functions known from http://3dwebdesign.org/joomla-aggregator-platinum plus couple more:
- have ACL (only joomla 3 and Joomla 2.5 version)
- have spin format support
- have full integration with K2 component
- have integration with best tags component for Joomla Advanced Tags - post tags directly in tag component.
- supports JomSocial (joomla 2.5 version only, JomSocial 3 only)
- support Kunena forums (all Kunena 2 and Kunena 3 versions)
- can work with big synonyms databases to make to 100% unique content
- can grab and insert in database full content instead of short content in rss feeds
- can strip javascript and selected html tags from imported content
- have custom lightweight rss parser developed by 3D Web Design (joomla 2.5 version only)
- have limit function (only joomla 2.5 and 3.0 version)
- have Preview function (only joomla 3 and 2.5 version). Now you can preview items before import in joomla database.
- have filter by keyword function (only joomla 2.5 and 3.0 version) - to import only items that contain only some of keywords included in list.
- in Joomla Scraper is added http://3dwebdesign.org/forum/index.php?showtopic=698&view=findpost&p=5801 based on links of imported items (only joomla 2.5 and 3.0 version)
- Ram and "time used" debug functions (only joomla 2.5 and 3 version)
- New shuffler with improved alghorithm to shuffle content automatically


Цитат
By default parser in "Joomla Scraper" is latest simplepie. Our tests say us that rss feeds will work better in Aggregator with old version of simplepie. Use new version of simplepie parser ony if you get "deprecated" or other error!


Algorithm of scrapper is optimized, lightweight and robust.

http://3dwebdesign.org/joomla-scrapper


Latest version History:
14.09.2018: Joomla Scraper v.1.9.9.1 released. No new functions, only updated to work with PHP 7.2. Now you can use all php versions between php 5.5 and php 7.2.


12.01.2016: v.1.9.9 for Joomla 3. Changes:
1. New option: Added new option to limit length of imported content. Works with or without Scraper. New option name is "Limit content lenght".



If you enter in this field 25000, imported content will be limited to 25000 characters. Keep in mind that result as clear text can vary if you use strip HTML tags option. If you enter ZERO in "Limit content lenght", no matter how many content is inside imported rss feed, you will import only link to original article.


2. New function: New function for images seo - relevant alt is added to all of imported pictures. There is no new option in feed configuration - alt tag is automatically added to imported in content images, to Intro image and to Full article image (see screen).




06.10.2014: Joomla Scraper v.1.9.8 for Joomla 3 is released. This version is bug fix, is updated only Joomla 3 version!
27.05.2014: http://3dwebdesign.org/forum/index.php?showtopic=698&view=findpost&p=7501 (is updated only joomla 3 version). Now you can use aggregator in https-only sites with ssl certificate.
And you can limit imported in articles pictures by size (px) and by number of pictures. http://3dwebdesign.org/forum/full-text-import-with-scraper-as-clear-text-and-only-main-picture-t2113.
21.05.2014:[/b] v.1.9.5.1 for joomla 3 - bug fixes. 1. Fixed problem with cyrillic titles containing quotation marks. 2. Fixed problem with cyrillic url alias transliteration in K2.

20.05.2014: http://3dwebdesign.org/forum/index.php?showtopic=698&view=findpost&p=7471 (only Joomla 3 version is updated). Fixed is bug when web site is in sub directory "Class JFile Not Found". New options for images in articles (com_content only): Setting images in content articles Automatically over rules. Now default image or downloaded from rss feed image can be setted as Intro Image and/or Full article image in imported content articles.



25 February 2014: Joomla Scraper v.1.9.4 for Joomla 3 is released. Now Joomla Scraper for Joomla 3.x support Kunena 3.
24 February 2014: Joomla Scraper v.1.9.6 for Joomla 2.5 is released. Is added Kunena 3 support. Only version for Joomla 2.5 is updated!
28 January 2014: Fixed bug (sql error while sending Email Report) in both Joomla 3 and Joomla 2.5 Scraper versions.
20 January 2014: Joomla Scraper for Joomla 3 minor version update - improved compatibility with Joomla 3.2.
17 January 2014: Fixed bug in K2 importer (k2Table missing) in both Joomla 3 and Joomla 2.5 version. Added links to guide and service for free cron job service to use with Scraper.
17 December 2013: v.1.9.3 for Joomla 3: Bug fixes with manual import buttons.
09 December 2013: v.1.9.2 for Joomla 3 and v.1.9.5 for Joomla 2.5 released. New Scraper function "remove script tags".
25.06.2013: v.1.9.1 for Joomla 3.0. Minor bug fix in "SimplePie" parser. Is updated only Joomla 3 version!
10.06.2013: v.1.9 for Joomla 3.0 and 1.9.3 for Joomla 2.5: New function added: K2 tumbnails are now supported. If function is switched on, pictures will be resized automatically and imported in K2. Attention: this will cost much more execution time and memory!
23.04.2013: v.1.8.9 for Joomla 3.0, 1.9.2 for Joomla 2.5 and 1.7 for Joomla 1.5: Bug fix in cron job.
21.03.2013: http://3dwebdesign.org/forum/index.php?showtopic=698&view=findpost&p=5801: Internal "duplicate content protection" function inside Joomla Scraper instead of included in joomla duplicate content protection based on aliases (in com_content). Now you can use synonyms in titles and to import every feed as often as you want - no one item will be imported twice.

Цитат
Joomla have internal "duplicate content protection" based on titles of content items. But if you use spinner or synonyms replacement in titles, titles will be unique every time and Joomla's internal "duplicate content protection" will not work. Now in Joomla Scraper is added http://3dwebdesign.org/forum/index.php?showtopic=698&view=findpost&p=5801. In this way with or without to use spinning in titles, one article will be imported only once.


11.03.2013: Simplepie bug fixed (only new version of parser) in combination with feeds with non-utf-8 encoding.
08.03.2013: v.1.8.7 for Joomla 3.0: latest updates (better shuffler, import time and memory usage debug) are now available also for Joomla 3.0 version.
26.02.2013: v.1.8.9 for Joomla 2.5 and v.1.6.8 for joomla 1.5. Added import time and memory usage functions in: manual import, on feed preview page, in email notification and in cron. With this improved statistic you can check and diagnose import problems and measure performance of different feeds between different parsers.
25.02.2013: v.1.8.8 for Joomla 2.5 and v.1.6.7 for joomla 1.5. New, improved shuffler in spinner to make better human readable texts.

01.02.2013: v.1.8.6 for Joomla 3.0 is released. Version for Joomla 3 support import only in content and K2!

01.02.2013: v.1.8.5. Filter by keyword function added. More info for this function is bottom.
29.01.2013: released new version 1.8.4. Added two new functions: 1. To preview feed before import. If you select one or more feeds and click on button "preview", you will see the result content that will be imported. 2. To limit imported from feed items. If you write in field "feed limit" some number, for example 5, only first 5 items from this feed will be imported in joomla database.
15 January 2013: V.1.8.3: Pagination bug fix in Joomla 2.5 version.
09 January 2013: Joomla Scraper Version 1.8.1 for Joomla 2.5. It allows the latest SimplePie (ver. 1.3.1 - PHP 5.2) to be used in addition to the obsolete simplpie 1.1.2. In every feed configuration now you have choice with old simplepie parser, new simplepie and custom rss parser.
08 January 2013: V.1.8.0 for Joomla 2.5. It allows images without extension to be imported as well.
06 January 2013: http://3dwebdesign.org/forum/index.php?showtopic=1606&view=findpost&p=5353 in both versions.
19 December 2012: v.1.7.8 for Joomla 2.5 and v.1.6.5 for Joomla 1.5 - Import images in K2 tab "image" bug fix.
08 November 2012: v.1.7.7 for Joomla 2.5 - bug fix release. The fixed bugs are as follows:
1. The RSS parser didn't convert all the result fields in UTF-8;
2. The article tags were not correctly added using the Advanced Tags component.

07 November 2012: v.1.6.4 for Joomla 1.5. Better seo functions. New configurable option: Automated extraction of top key phrases from content of article. These phrases are counted and top key phrases are added as tags in content item in advanced tags component.
19 September 2012 - v.1.7.6: Added JomSocial and Kunena 2.0 support. Functions to import in Jomsocial and in v. 2.xx of Kunena are only for Joomla 2.5 version of Joomla Scraper!
10 September 2012 - v.1.7.5: In Joomla 2.5 version is added developed by 3D Web Design custom Rss parser - much faster and less memory consuming than SimplePie. Added is also ACL. Updates are only for Joomla 2.5 version.
16 July 2012 - v.1.7.3 and 1.7.4: Exclude content between html tags in scraper and full K2 integration
25 January to 24 February 2012 - v.1.6.9 to 1.7.2: Spin format and huge synonyms databases support, optimization of code for speed.


Standart functions in Joomla Scraper (presented in http://3dwebdesign.org/forum/joomla-aggregator-platinum-functions-t224 also):

Unlimited rss sources
Can post in K2
Can generate 100% unique content
Random html code replacement function
Tags generated from title
Insert tags in integrated tag component
Custom tags at end/start of tags
Random tags
Synonym replacement function
Content combine functions - Add and change html code before/after every content article
Random choice of html code to add
Download images and save them internally, BMP files are NOT supported!
Resize images
Send Email reports
Is automated over cronjobs

New since 16.05.2011: Spin content automatically with integrated article spinner. You can test function article spinning also for free here: http://articlespinner.eu/.


Options added in grabber
1. On/Off of scraper functions;
2. Starting HTML - a HTML portion which indicates the beginning of the full text article on the sources' web page, e.g. <div id="article". If left empty, the grabber will get a portion of the RSS feed item and will try to use it to determine the actual starting position of the article's full text.

3. Ending HTML - a HTML portion which indicates the ending of the full text article, for example you can add:
Код
<div class="afterarticle"

And scraper will grab full content from start of rss feed to this HTML tag in page of original content. This field must not be empty. If left empty, the grabber will exit.

4. Search text length - the number of symbols taken from the beginning of the RSS feed item used to detect the actual starting position of the article's full text. This field must be used only if Starting HTML is not supplied!

5. Strip tags - whether to strip the article's full text from the HTML tags or not.

6. Allowed tags - these tags will not be stripped from the article's full text. Usually you can allow any html tag, for example:
Код
<img>, <strong>, <p>, <br>, <br/>


7. Detect JS redirect - some RSS feed sources tend to hide their full article's from grabbers like ours by supplying in their feeds a Read More URLs which redirect the user to the page that contain the article. If this option is enabled the grabber will try to detect this situation and to get the end link to article instead of the redirection page.

8. Exclude content between html tags. Now you can use two or more html tags - content between these tags will be ignored.
9. Can import in Kunena and Jomsocial.
10. Can preview rss feed before import.
11. Can limit imported from feed items.


Functions of Joomla Scraper
All functions are described in different settings tabs in component. Here is posted some more information about these functions and examples of use.

Pictures from all functions of Joomla Scraper are below

Custom Rss parser:


To import rss feed, you can use choice of three parsers - old version of Simplepie, new Simplepie (v.1.3.1 Released on 30 October 2012) and our "Custom Rss parser". "New Simplepie" is buggy and is old simplepie, but with fixed deprecated errors that are shown on servers with newest php versions.

Цитат
Note that Simplepie parser is not developed from us, is Open Source free rss parser. If you have some questions about simplepie or you find some issue with this parser, post it here: https://github.com/simplepie/simplepie/issues


Scraper version have Custom Rss parser developed from us - much faster and less memory consuming than SimplePie. With our parser you can work on shared host with less memory and to import huge full feeds with 150-200 content items and many pictures inside.

Possibilities with custom Rss parser:
Simplepie can't recognize non-standart feed types, work only with atom, rss 0.9, 1.0 and rss 2.0. When you try to import in joomla database some non-standart xml file with simplepie parser, nothing is happened.

With our custom Feed parser you can insert from rss only selected fields in content, so, you can use non-standart feeds like xml files with products and others that are not parsed from simplepie. But Custom Rss parser need configuration for every feed and some HTML knowledge.

Attention! Performance differences between parser versions!
Scraper version have 3 different parsers - Two versions of simplepie (old and new) and one custom parser. Different simplepie parsers have different performance also.

By default parser in "Joomla Scraper" is latest simplepie. Our tests say us that feeds will work better and faster with old version of simplepie. Use "new version" of simplepie parser ony if you get "deprecated" or other error! Recommended is to use old simplepie or our Custom Rss parser (configuration needed).

Preview, copy, new and edit functions




General tab and K2 functions



Import images in K2 work with jpg, gif and png only, Bmp and others are not supported.

Limit number of imported articles functions:



You can limit number of imported items from every feed.

Filter by keyword.



You have filter to configure one or more keywords in this field. Items from feed will be imported only if some of keywords is find in title or in item content. This option allow you to filter results by keywords and to build better thematic websites. You can configure one feed many times with different keyword filters and to import results in suitable category.

Цитат
K2, Kunena and JomSocial tabs are shown only if you have these extensions installed in your Joomla!



Scraper functions in Joomla Scraper after v.1.5.5:

"Permalink - search for" and "Permalink - replace with" fields. This is needed for rss feeds that use redirect - in feed link to contnet item is different from real content url. Now with these fields scraper know the right link.

Example Feed: BBC
Код
http://newsrss.bbc.co.uk/rss/sportonline_uk_edition/football/rss.xml


- Improved grabber functions.
- Scraper clean Javascript automatically
- Possibility for Scraper to look for multiple HTML markup sets in the same RSS feed! Some feed sources tend to use several sets of HTML markup in their articles. Now you can all their content - just add a start string - end string pair for each HTML markup set in the "Starting string" and "Ending string" scraper settings. Separate each pair with the '|' character.


Цитат
You can configure in scraper two or more parameters for start and end string. Example:


start string:
Код
<div class="start">|<div class="other">


end string:
Код
<div class="end">|<div class="otherend">


Real example for feed from bbc:
Starting string:
Код
<h1 >|<div class="start">


Ending string:
Код
<div class="bookmark-list">|<div class="end">


Screen from configuration:



Guide: http://3dwebdesign.org/forum/index.php?showtopic=736


Trackback and publish functions:

Simple article - seo content structure (when content is imported from aggregator):


Split introtext option: This function insert "Read more" tag at different places of content item. Options are:
- No intro text ("Read more" tag is inserted at start of content item)
- After "before item" ( "Read more" tag is inserted after content item and before content that is inserted from "before item" field of "Content seo")
- Before "After item" - After content but before "after item" content.
- Only introtext (all text inserted from feed is inside fulltext)



Protect from duplicates function look at url to full publication in Rss feed and store all already imported urls in database. Option "automatically delete protection info older then" (in minutes) is to clear this database table from time time - huge tables are slow.

Attention: "Split introtext: After N chars" function work well only with clear text!
"After N chars" function can't recognize is content in some container (there can be many html tags into each other), in what container and to close all html tags automatically. So, if you import content with html, is better to use other options for Split introtext, not "After N chars".


Content Seo functions:



Fields "before item" and "after item" in "Content seo" tab support HTML and text in spin syntax. You can add your html code or you can add random html if you use spin format - both ways at bottom are supported:
Код
{keyword|keyword2|keyword3}
[keyword|keyword2|keyword3]


Fields "before title" and "after title" support only clear text and spin syntax.


Synonym replacement functions:



Component can work with huge synonyms databases, work well with databases of to 200,000 synonyms and more. Synonyms can be inserted in joomla database and every feed can be configured to work with different synonym database (example: feed one can work with english synonym database, feed two will work with german synonym database, feed three with russian synonyms database).

Note that you need to create your own synonyms database to use function "synonym substitution". Synonym database is not included in Joomla Scraper.

If you want, you can purchase http://3dwebdesign.org/forum/english-synonym-database-price-25-usd-t735 (price 25 usd), more info here: http://3dwebdesign.org/forum/english-synonym-database-price-25-usd-t735

If you want to create it yourself: Example Synonym Database is published here: http://3dwebdesign.org/forum/index.php?showtopic=1171 - this example is imported with component installation in joomla's database. You can add more lines to this database and use it.

Rss feed translation over Google translate functions:

Цитат
We dont guarantee that Google translate functions work! http://3dwebdesign.org/forum/google-translate-api-is-paid-service-from-12-2011-t886!




Download images functions:



You can download images local in your website. Attention: you may need 60 seconds and more in max_execution_time setting of php to use this function! Download images work with jpg, gif, png and files without extension.

Meta and tags functions:



Meta tags must be switched ON to work integration with Advanced Tags component. Imported as meta tags keywords are imported as tag in Advanced Tags also.

Cron functions:



Email report functions:



Cache functions:



Ignore list functions:



You can add list with words to ignore. These words will be not inserted in imported content.

Kunena and Jomsocial integration functions:



When importing into JomSocial, the following settings are available:
- Profile owner;
- Post where: "As status update" or "On wall";
- Allow comments.

When importing into Kunena 2.0, the following settings are available:
- Forum
- Author ID: the user ID of the posts' author
- Thread ID (optionsl): If set, all the articles will be imported as new posts in this thread. If 0, each article will be posted as separate thread.


ACL functions:



ACL is available only in Joomla 3.0 and Joomla! 2.5 version!

Possibilities for Joomla Scraper

1. With these options activated in our Scraper you can grab full content from every website. You can scrap content from facebook page, from twitter or from every other website.
2. You can grab only this content that you want and remove html tags that you dont want (like links for example).
3. You can grab from scrapped website not only content of current article, you can grab for example article with comments.
4. You can scrap many content from many websites and all of this will work automatically by schedule for every website.
5. You can change content to be unique and content imported in your website will be ranked higher in google (google will not recognize your content as duplicate)


Цитат
Note that 3D Web Design does not encourage to steal content and recommend to use Joomla Scraper to grab full content only after permission of owner of current content.



Requirements:
Recommended requirements of php for Scraper version are:
- access to php.ini settings
- max_execution_time = 300 (min 60 seconds will be needed to use scraper and download images functions)
- memory_limit = 128M or more (can work with 32 mb and 645 mb, but is recommended to use 128 or more mb to use all scraper functions and import of multiple feeds with crons)

Extension will work on every shared host with access to php.ini settings and memory limit 64 mb or over, but We recommend using of VPS server.

Automatic import with cronjobs - steps:



You must configure cron on two different places:
- in every rss feed in Joomla Scraper configuration (different settings for every feed)
- in cronjob configuration in your hosting control panel (set to run every 2, every 3 or every 5 minutes)

1. Create file with name mycron.php. Code for this php file:

Код
<?php
$a = file_get_contents('http://yoursite.com/administrator/components/com_aggregator/cron.aggregator.php');
?>


2. Upload file mycron.php in your public_html. Create cronjob in your cpanel with command:
Код
/usr/local/bin/php -q /relative path to public_html/mycron.php >> /dev/null


3. Set cron to run in your control panel (cpanel or other) every 5 or 3 minutes (recommended). In Unix Style view this should be:

Код
*/3 * * * *


4. Configure your feeds.
5. Test every feed manual first!
6. Try to import every feed without extras like scraper and image download first! More functions switched on in every feed = more load for your server!
7. When you are sure that all feeds configuration is ok, configure cron part. Before using cron, please, read this thread: http://3dwebdesign.org/forum/server-load-and-cron-in-joomla-scraper-t1771.

Prices:
Joomla Scraper with three upgrades subscription - 49 USD. Download and Online purchase: http://3dwebdesign.org/joomla-scraper.


Update from previous version: How to update component?

Update from Aggregator Platinum:
Joomla Aggregator Platinum and Joomla Scraper are full compatible. If you upgrade, just upload new files over ftp or install new extension over joomla installer. Your configured rss feeds will stay untouched.

Update from previous version of Joomla Scraper:
Just upload over new version.


Price of Jooma Scraper is only $49!

http://3dwebdesign.org/joomla-scrapper

Important: http://3dwebdesign.org/forum/joomla-scraper-affiliate-sensation-t2058
http://3dwebdesign.org/forum/joomla-scraper-licenses-and-prices-t2057.

Posted by: Web Design Seo Dec 6 2011, 02:15 PM

http://3dwebdesign.org/advanced-tags-joomla 1.7 is released today. I recommend to download update and to use tag component with joomla scraper for Joomla 2.5 and J1.5.

Posted by: Web Design Seo Dec 13 2011, 01:00 PM

Today are added more two functions to Joomla Scraper. Options are for integration with K2 and work only when aggregator post in K2!

1. Automatic import of images in K2 tab "Images".
When is switched on, and in html code is img tag, first image tag is imported in tab "Images" in K2.

1. Automatic import of keywords in K2 "Tags".
Aggregator import automatically keywords from title of current content item. When function is switched on, keywords are imported automatically as tags in K2.

http://3dwebdesign.org/forum/index.php?showtopic=867

Posted by: Web Design Seo Jan 10 2012, 11:36 AM

Latest version of Joomla Scraper for Joomla 1.7 is 1.6.9, released on 5 January 2012.
Latest version of Joomla Scraper for Joomla 1.5 is 1.5.8, released on 5 January 2012.

Posted by: Web Design Seo Feb 3 2012, 09:16 AM

Latest changes in http://3dwebdesign.org/joomla-scrapper:

1. spin format support in "before content" and "after content" fields
2. spin format support in "before title" and "after title" fields
3.Component now can work with big synonyms databases, work well with databases of to 100,000 synonyms.

Pictures are updated and are included in first post.

http://3dwebdesign.org/forum/index.php?showtopic=922

Posted by: Web Design Seo Feb 28 2012, 04:05 PM

Joomla Scraper is updated again yesterday. Updated are both versions - for joomla! 2.5/1.7 and for Joomla! 1.5.

Added is better readmore tag placement and function to strip selected html tags outside scraper. Result - better stripping of not needed html tags. For example now you can strip only links, with or without using scraper.

See here: http://3dwebdesign.org/forum/index.php?showtopic=958.

Posted by: Nikos Apr 10 2012, 02:53 PM

Hello,
does Joomla Scraper imports embed videos (flvs) from rss feeds, can I test it on a demo?

Thank you

Posted by: Web Design Seo Apr 10 2012, 03:13 PM

Yes, embed videos and flv.

Nikos, http://3dwebdesign.org/forum/index.php?showtopic=395 are uploaded very old versions of aggregators (more than 8-9 months old), without latest 5-6 changes. Yes, you can test this function in demo: flash, movies and youtube videos import will work.

Keep in mind only that latest version work 3-4 times faster with article spinner and with synonyms - work with big synonyms databases to 100,000 words insead of to 800-1000 words in demo.

Posted by: Nikos Apr 10 2012, 05:37 PM

Thank you for your reply, we need to import from Blogger to Joomla, which is the right component to buy, I suppose the Joomla Scraper, can you advise please? I need also to have the correct configuration for the embed videos in articles, which component to try on your demo? Can I send you a sample rss feed to import in a test feed to add?

My only concern is the right configuration.

Thank you

Posted by: Web Design Seo Apr 11 2012, 06:09 AM

Only import can be done with every extension. Platinum and scraper can import and flash and videos. To make choice: Please, see http://3dwebdesign.org/en/rss-feed-aggregators-comparison.html.

Posted by: Nikos Apr 11 2012, 09:28 AM

QUOTE (Web Design Seo @ Apr 11 2012, 07:09 AM) *
Only import can be done with every extension. Platinum and scraper can import and flash and videos. To make choice: Please, see http://3dwebdesign.org/en/rss-feed-aggregators-comparison.html.

Hello,

would you help me to configure in our server (if I need help), as we will buy the Joomla Scraper for Joomla 1.5 version


Thank you

Posted by: Web Design Seo Apr 11 2012, 09:45 AM

Yes, of cross, we will help if you need this. Is pretty simple - there are only two settings in php.ini that can be changed in typical configuration of most servers: max_execution_time and memory_limit.

Be sure only that you have access to php.ini settings of server.


P.S. When you buy Joomla Scraper you can receive both versions of component - joomla 1.5 and 2.5. When you update your joomla to 2.5 all will be working also.

Posted by: cardin Jun 12 2012, 07:30 AM

I am unable to make the synonyms feature work. The words are not replaced. Here's a sample of my synonyms.

new=recent;prevent=avoid;according to=as per;evaluated=assessed;select=choose;selected=chose;chosen=selected;excellent=fi
nest;do=perform;exact=accurate;explain=clarify;important=vital;work out=train;

Please help.

Thanks

Edited: Sorry posted this in under wrong topic should be under joomla

http://3dwebdesign.org/forum/index.php?showtopic=698&st=0&gopid=3683&#entry3683

Posted by: Web Design Seo Jun 12 2012, 07:35 AM

You have enter these synonyms in text field or in database? If are in text field in feed configuration must be selected "As defined"



And what is your version?

Posted by: cardin Jun 12 2012, 07:55 AM

I am using Aggregator Scraper 1.7.0 in Joomla 2.5.

I am unable to make the synonyms feature work. The words are not replaced. Here's a sample of my synonyms.

new=recent;prevent=avoid;according to=as per;evaluated=assessed;select=choose;selected=chose;chosen=selected;excellent=fi
nest;do=perform;exact=accurate;explain=clarify;important=vital;work out=train;

I am using the synonyms in text field has have selected 'As Defined' synonyms in content.

Please help.

Thanks.

Posted by: Web Design Seo Jun 13 2012, 06:18 AM

Please, check that reqirement of component are achieved - memory limit, execution time. If yes, open link from my signature in every post "How to receive support" and make all of this, step by step. If no errors, send me login data for your site to test.

Posted by: Ivan Stamenov Jun 14 2012, 08:51 AM

Цитат(cardin @ Jun 12 2012, 08:55 AM) *
I am using Aggregator Scraper 1.7.0 in Joomla 2.5.

I am unable to make the synonyms feature work. The words are not replaced. Here's a sample of my synonyms.

new=recent;prevent=avoid;according to=as per;evaluated=assessed;select=choose;selected=chose;chosen=selected;excellent=fi
nest;do=perform;exact=accurate;explain=clarify;important=vital;work out=train;

I am using the synonyms in text field has have selected 'As Defined' synonyms in content.

Please help.

Thanks.


Hi, cardin. There was a bug in the controller causing this behaviour. The fixed version will be available later today or tomorrow.

Posted by: Web Design Seo Jun 14 2012, 09:25 AM

cardin, i will send you next version in next hour. Problem is solved and new version is tested on two separate installs - one test and one live. If you find other bug in component, please, inform us.

Thank you for your help to make our component better!

Posted by: Web Design Seo Jul 16 2012, 10:46 AM

Joomla Scraper is updated again. Latest version for Joomla 2.5 is 1.7.4, released on 16 July 2012. Latest version for Joomla 1.5 is 1.6.3 (16 July 2012).

Added is option to http://3dwebdesign.org/forum/index.php?showtopic=1082 and are fixed bugs in sentences shuffler when using intro text/full text option.

Posted by: Web Design Seo Aug 27 2012, 01:02 PM

From today with every new copy of Joomla Scraper will be installed example synonyms database. If you upgrade component from previous version, use published here sql query - open phpmyadmin and run this query: http://3dwebdesign.org/forum/example-sql-and-example-synonyms-database-t1171

And here you can purchase http://3dwebdesign.org/forum/english-synonym-database-price-25-usd-t735.

Posted by: Ivan Stamenov Sep 10 2012, 10:21 AM

A new custom RSS parser is added to the Joomla 2.5 version of the component. This RSS parser is our work and is much faster and less memory consuming than SimplePie and is suitable for parsing large feeds.




With our custom Feed parser you can insert only selected fields from rss feed in content, so, you can use non-standart feeds like xml files with some products and others that are not parsed by default from simplepie.


There are some settings that need to be set for every feed, though. The parser needs to be told where (in which XML tag) to find the relevant content. The following notation must be used:

tag[.required_property_name1:required_property_value1]...[.wanted_property_name]

E.g.:

Код
<link>http://the.link.we.want/</link>
=> set the link tag setting to: link, "get the content of the link tag"

Код
<link rel='alternate'>http://the.link.we.want/</link>
=> set the link tag setting to: link.rel:alternate, "get the content of the link tag with rel property = alternate"

Код
<link rel='alternate' alt='permalink' href='http://the.link.we.want/' />
=> set the link tag setting to: link.rel:alternate.alt:permalink.href, "get the content of the href property of the link tag with rel property = alternate and alt property = permalink"


New version is already uploaded in our file directory.

Posted by: Ivan Stamenov Sep 19 2012, 06:47 AM

Version 1.7.6 of the Scraper is available. Now JomSocial and Kunena 2.0 forums are supported.



When importing into JomSocial, the following settings are available:
- Profile owner;
- Post where: "As status update" or "On wall";
- Allow comments.

When importing into Kunena 2.0, the following settings are available:
- Forum
- Author ID: the user ID of the posts' author
- Thread ID (optionsl): If set, all the articles will be imported as new posts in this thread. If 0, each article will be posted as separate thread.

Posted by: Web Design Seo Sep 19 2012, 07:10 AM

ACL and import in Jomsocial and Kunena are only for Joomla 2.5 version. Import of Kunena is developed to work only with Kunena 2.0 and over. May be will work with older Kunena versions also, but ia not tested and we don't guarantee this.

New version of Joomla Scraper is already avalable for purchase.

Posted by: Web Design Seo Nov 7 2012, 11:02 AM

Today is released new version for Joomla 1.5 - Joomla Scraper 1.6.4. Only Joomla 1.5 version is updated!

New version add automated SEO functions on next level!

Цитат
New function is added with integration with component http://3dwebdesign.org/advanced-tags-joomla (also updated today). Until now, component add as tags only one word keywords extracted from title.

Now, when automatic articles are added, tag component count phrases in title and in body of article and add configured from you number of top key phrases as tags. In this way you can have more in long tail keywords and key phrases.


In this way will be updated and version for Joomla 2.5. This update pending in next 3-4 days.

Posted by: Ivan Stamenov Nov 8 2012, 01:37 PM

New version 1.7.7 is now here. v.1.7.7 of Joomla Scraper is bug fix release. The fixed bugs are as follows:

1. The RSS parser didn't convert all the result fields in UTF-8;
2. The article tags were not correctly added using the Advanced Tags component.


Posted by: SilverOne Dec 5 2012, 03:43 PM

QUOTE (Web Design Seo @ May 18 2011, 04:09 PM) *
Pictures from all functions are below


Custom Rss parser:



In the picture above, according to example give is it word 'link' we have to fill in the field or the true link (=http://the.link.we.want/)?

Posted by: Web Design Seo Dec 5 2012, 04:05 PM

You must fill in there the field.

Posted by: SilverOne Dec 5 2012, 09:48 PM

QUOTE (Web Design Seo @ Dec 5 2012, 04:05 PM) *
the field


You have not answered to my question, also I repeat it:

In the picture above, according to example given is it the word 'link' we have to fill in the field or the true link we have to fill in the field (=http://the.link.we.want/)?

Posted by: Web Design Seo Dec 6 2012, 07:05 AM

You must fill in there the field.

You must open rss feed, than you must open xml code of this rss feed (press ctrl+u in browser) and must say parser: Hey, content from field "link" in feed code must go to field in joomla scraper parser options "link".

Is now ok smile.gif

Posted by: Web Design Seo Dec 19 2012, 09:52 AM

We find bug and Scraper version will be updated in next hours: If "download images" function is switched on, Joomla Scraper don't import images in K2 tab "image".

Today is released fix for this case and scraper are updated to versions:
- 1.7.8 for Joomla 2.5
- 1.6.5 for Joomla 1.5

Posted by: Web Design Seo Dec 21 2012, 12:02 PM

v.1.7.9 for Joomla 2.5 and 1.6.6 for Joomla 1.5 are released today. This version is only with bug fixes - for some cases of non-import pictures in K2.

Posted by: Web Design Seo Jan 7 2013, 10:19 AM

Today Joomla Scraper is updated with http://3dwebdesign.org/forum/index.php?showtopic=1606&view=findpost&p=5353.

Posted by: Ivan Stamenov Jan 8 2013, 10:25 AM

Joomla Scraper Version 1.8.0 for Joomla 2.5 is here. It allows images without extension to be imported as well.
It does not implement the latest SimplePie parser yet.

Posted by: Ivan Stamenov Jan 9 2013, 01:16 PM

Joomla Scraper Version 1.8.1 for Joomla 2.5 is now available.

It allows the latest SimplePie (ver. 1.3.1 - PHP 5.2) to be used in addition to the obsolete 1.1.2 (PHP 4).
One may choose which SimplePie version to use on a per RSS feed basis.
Important note: As the classes in both SimplePie versions have the same names, though, when using the "Import All" button, the first loaded SimplePie version will serve all the remaining feeds as well (regardless of their SimplePie version), because PHP does not allow a class (with the same name) to be re-declared.

Finally xAjax is gone as well.

If we keep the current pace, Scraper ver. 12574545.0.1 will be coming soon... smile.gif

Posted by: Ivan Stamenov Jan 9 2013, 01:24 PM

For those of you, willing to update their SimplePie version to 1.3.1, please follow these steps:

1. Download the latest SimplePie:  simplepie.131.inc.zip ( 63.48K ) : 11
;
2. Extract the archive and rename the contained file to simplepie.inc;
3. Upload it to /administrator/components/com_aggregator/inc/simplepie/ and overwrite the existing file.

If you are using PHP 5.2+, you are strongly encouraged to do so as this will get you rid of all these SimplePie "Deprecated: ..." warnings.


Can be differences in performance between parser versions!
Scraper version have 3 different parsers - 2 simplepie (old and new) and one custom parser. Different simplepie parsers have different performance also.

By default parser in "Joomla Scraper" is latest simplepie. We don't have checked and measured performance of old and new simplepie - may be some feeds will work better with old version of simplepie.

If you get "deprecated" or other error, use new version of simplepie parser.

Posted by: Web Design Seo Jan 15 2013, 12:40 PM

Today is released bug fix for pagination in list with all feeds in component. Are changed many files, we can't post here this fix.

To receive update, please, send us email from email used in your order. These three users that have buy extension yesterday and today are already received update.

Posted by: pavelKukov Jan 29 2013, 02:55 PM

Today, 2013-01-29 is released new version 1.8.4 of Aggregator-Scraper for Joomla. New version support some new great features.

1. Now is possible to preview imported items. Screenshot:



If you select one or more feeds and click on button "preview", you will see the result content that will be imported. This function is to test and fine tuning of rss feeds configuration.

2. You can limit number of imported articles. Screenshot:



If you write in field "feed limit" some number, for example 5, only first 5 items from this feed will be imported in joomla database.

Posted by: Web Design Seo Feb 1 2013, 07:00 AM

Today is released version 1.8.5. Added one new function: keywords filter.



You have now filter to configure keywords - one or more. Item from feed will be imported only if some of keywords is find in title or in item body. This option allow you to filter results by keyword and to build better thematic websites. You can configure one feed many times with different keyword filters and to import results in suitable category.

Posted by: Web Design Seo Feb 7 2013, 01:49 PM

Joomla Scraper v.1.8.6 for Joomla 3.0 is released.

Screenshots from Joomla3 version:






Цитат
Version for Joomla 3.0 support import only in content and K2!

Posted by: pavelKukov Feb 25 2013, 10:55 AM

Now Joomla Scraper for Joomla 1.5 and 2.5 have improved content shuffling algorithm and new options for better control and automated manipulation. The new algorithm is more precise and produced texts are more readable. New options and settings are:

Shuffle sentences positions - Shuffle the positions of the sentences in the article.
Shuffle sentences - Shuffle the compound sentences using the given characters as a delimiter
Delimiters for sentences - Delimiters used for splitting text into sentences
Delimiters for sentence parts - Delimiters used for splitting sentence into parts
Punctuation characters - List of characters considered as punctuation and disallowed be at start or end of sentence
Minimum sentence length - Sentences shorter than n characters will be removed. Use this option for cleaning small errors like sentences only from names. Example Mr. Paul Kalkbrenner will be split to two sentences because of the dot after Mr.
Fragment shuffling - Fragment shuffling if allowed (if sentence or part of it contain more than n words and do not contain delimiters, it will be split in to phrases which will be shuffled randomly)
Shuffling long phrases - Long phrases prevention if fragment shuffling is allowed. If given text fragment is built up with less than required number words will not be affected from long phrase detection algorithm (with other words - if you want to shuffle small sentences you must set number here. )
Protected HTML tags - Comma separated list of HTML tags which to be protected when removing HTML. This gives you opportunity to keep some formatting.

Latest updated versions are: v.1.8.8 for Joomla 2.5 and v.1.6.7 for joomla 1.5.

Posted by: pavelKukov Feb 26 2013, 11:23 AM

Today are released new versions of Joomla Scraper.

Now the new version (Joomla Scraper 1.6.6 for Joomla 1.5 and Joomla Scraper 1.8.9 for Joomla 2.5 ) detects time for import and memory usage. With this improved statistic you can make more - accurate assessment and not allow too large or slow feeds.

As you can see from screenshots, time and memory usage depends on items number, usage of scraper, synonyms replacement and content shuffle, image download and feed and site response speed.

Cost more memory (you need to increase memory_limit of php 64 or more mb, recommended over 128):
- large feeds with many items
- usage of scraper, synonyms replacement and content shuffle
- import of many feeds at once

Cost more time (you need to increase max_execution_time of php to 120 seconds or more):
- large feeds with many items
- download of images
- slow websites (response speed). Website that is opened from server in your country is fast, websites from other continents are slower.
- scraper and synonyms replacement


Feed preview without scraper and content shuffle took around second for feed with 40 items from yahoo.



Feed import without scraper and content shuffle took around 6 seconds for feed with 40 items from yahoo.



Feed import with scraper and content shuffle and image download took around 30 seconds for feed with 10 items from iTunes.



Import time for 93 items from feeds with different configuration took around 1.30 - 2 minutes. On most hosting accounts normal php configuration is "max_execution_time = 30" - 60 seconds and "memory_limit = 16M" - 32M megabytes.



Import from crontab with scraper, content shuffle and image download from site with slow response time took around one minute for only 14 items.



Цитат
Check your php settings and made needed adjustments over custom php.ini file or just place new support ticket to your host support and ask to change these php settings!

Posted by: Web Design Seo Mar 8 2013, 08:51 AM

Now latest updates (better shuffler, time for import and memory usage) are available for Joomla 3.0 version also.

Posted by: Web Design Seo Mar 11 2013, 09:34 AM

Joomla Scraper is updated again. Latest version of Joomla Scraper for Joomla 3.0 is v.1.8.7 (11 March 2013), for Joomla 2.5 is 1.9.0 (11 March 2013), for Joomla 1.5 is 1.6.9 (11.03.2013).

Latest realeases are with only one bug fixed: When using simplepie new parser in combination with feeds with non-utf-8 encoding some broken symbols are inserted in content.

Posted by: pavelKukov Mar 19 2013, 01:31 PM

Цитат
Joomla have internal "duplicate content protection" based on titles of content items. But if you use spinner or synonyms replacement in titles, titles will be unique every time and Joomla's internal "duplicate content protection" will not work. Now in Joomla Scraper is added internal duplicate content protection based on links of imported items. In this way with or without to use spinning in titles, one article will be imported only once.


Today was released new version of Jooomla Scraper for Joomla 2.5+ and 3.0+ which adds protection from importing duplicate content. You can import one RSS feed as many times as you want, and no matter how often you try to import it, the same post will never be imported in your site twice. This is very helpfull when you use options to make content unique.

The function works as follows: Duplicate content protection is enabled by default and is based on the url address of the publication. The link is saved in database and from now on aggregator will know that this link is already imported. When re-import the same rss feed, news already imported will be skipped.

To prevent flooding the database with too many records (news links that are already imported are recorded in a separate row in the table), you have the option to activate automatic deletion after a certain period of time.

More extras: Ability to control the maximum execution time directly from the administration. This option will work for sure in joomla 3+ because joomla 3+ requires PHP 5.3+ . For Joomla 2.5 this option depends on server configuration.








Posted by: pavelKukov Apr 23 2013, 12:06 PM

There was a bug in function cron import. It was found in both Joomla Scraper for Joomla 2.5 and 3.0. Now there is new versions available for download.

New versions are as follows:

for Joomla 2.5
com_aggregator_scraper-J25-1.9.2

for Joomla 3.0
com_aggregator_scraper-J30-1.8.9

for Joomla 1.5 (Component works but new version have some improvements in code)
com_aggregator_scraper-J15-1.7

Aggregator Platinum for Joomla 1.5 and 2.5 (Component works but new version have some improvements in code)

Posted by: pavelKukov Apr 24 2013, 09:33 AM

Yesterday bug fix exposed a new bug today. The error is in "/administrator/components/com_aggregator/helpers/cron.php" around line 65. Error occurs when importing emission through cron and the emission is already imported. To fix it manualy you can do:

Find in "/administrator/components/com_aggregator/helpers/cron.php" around line 20

Код
function lTrimZeros($number) {
    while ($number[0]=='0') {
        $number = substr($number,1);
    }
    return $number;
}


And Replace It With:

Код
function lTrimZeros($number) {
        $number = (string)$number;
    while (!empty($number) && $number[0]=='0') {
        $number = substr($number,1);
    }
    return $number;
}


Find in "/administrator/components/com_aggregator/helpers/cron.php" around line 65

Код
if(!empty($matches))
{
                if (isset($matches[1]) && $matches[1]=="*") {
                    $matches[2] = 0;        // from
                    $matches[4] = $numberOfElements;        //to
                } elseif (isset($matches[4]) && isset($matches[2]) && $matches[4]=="") {
                    $matches[4] = $matches[2];
                }
                if (isset($matches[5]) && isset($matches[5][0]) && $matches[5][0]!="/") {
                    $matches[6] = 1;        // step
                }
                $matches[2] = (isset($matches[2]))?$matches[2]:"0";
                $matches[4] = (isset($matches[4]))?$matches[4]:"0";
                $matches[6] = (isset($matches[6]))?$matches[6]:"0";
                                $max_loops = 50;
                                $j = (int)((isset($matches[2]) && !empty($matches[2]))?lTrimZeros($matches[2]):0);
                                $max = (int)((isset($matches[4]) && !empty($matches[4]))?lTrimZeros($matches[4]):0);
                                $incr = (int)((isset($matches[6]) && !empty($matches[6]))?lTrimZeros($matches[6]):0);
                for ($j=$j;$j<=$max && $max_loops;$j+=$incr) {
                    $targetArray[$j] = TRUE;
                                        $max_loops--;
                }
            }


OR

Код
if (isset($matches[1]) && $matches[1]=="*") {
                    $matches[2] = 0;        // from
                    $matches[4] = $numberOfElements;        //to
                } elseif (isset($matches[4]) && isset($matches[2]) && $matches[4]=="") {
                    $matches[4] = $matches[2];
                }
                if (isset($matches[5]) && isset($matches[5][0]) && $matches[5][0]!="/") {
                    $matches[6] = 1;        // step
                }
    for ($j=lTrimZeros($matches[2]);$j<=lTrimZeros($matches[4]);$j+=lTrimZeros($matches[6])) {
        $targetArray[$j] = TRUE;
    }
}


OR

Код
if(!empty($matches))
            {
                if (isset($matches[1]) && $matches[1]=="*") {
                    $matches[2] = 0;        // from
                    $matches[4] = $numberOfElements;        //to
                } elseif (isset($matches[4]) && isset($matches[2]) && $matches[4]=="") {
                    $matches[4] = $matches[2];
                }
                if (isset($matches[5]) && isset($matches[5][0]) && $matches[5][0]!="/") {
                    $matches[6] = 1;        // step
                }
                $matches[2] = (isset($matches[2]))?$matches[2]:"0";
                $matches[4] = (isset($matches[4]))?$matches[4]:"0";
                $matches[6] = (isset($matches[6]))?$matches[6]:"0";
    for ($j=lTrimZeros($matches[2]);$j<=lTrimZeros($matches[4]);$j+=lTrimZeros($matches[6])) {
        $targetArray[$j] = TRUE;
    }
}


And Replace It With:

Код
if(!empty($matches))
            {
                if (isset($matches[1]) && $matches[1]=="*") {
                    $matches[2] = 0;        // from
                    $matches[4] = $numberOfElements;        //to
                } elseif (isset($matches[4]) && isset($matches[2]) && $matches[4]=="") {
                    $matches[4] = $matches[2];
                }
                if (isset($matches[5]) && isset($matches[5][0]) && $matches[5][0]!="/") {
                    $matches[6] = 1;        // step
                }
                $matches[2] = (isset($matches[2]))?$matches[2]:0;
                $matches[4] = (isset($matches[4]))?$matches[4]:0;
                $increment = (isset($matches[6]))?(int)lTrimZeros($matches[6]):1;
                $increment = max($increment,1);
                for ($j=(int)lTrimZeros($matches[2]);$j<=(int)lTrimZeros($matches[4]);$j+=$increment) {
                    $targetArray[$j] = TRUE;
                }
            }


Soon there will be available downloadable versions, packed with those bug fixes!

Posted by: cromaplus May 12 2013, 05:01 PM

QUOTE (Web Design Seo @ Feb 8 2012, 03:30 PM) *
Latest version of Aggregator Platinum work perfect with Joomla 2.5. Is tested with Joomla 2.5.1.

hello thank you for answering, I have joomla 2.5.11 and the version of K2 k2 is v2.6.6 I hope you understand why it does not work, if I had seen before the other component scraper bought that instead of why I bought it I bed that is full support for k2

QUOTE (Web Design Seo @ May 12 2013, 03:46 PM) *
Aggregator platinum work perfect with K2. This is possible only if K2 team change something general in latest k2 version.

Please, post here your versions (joomla and k2 versions) and monday we will check your case.

hello thank you for answering, I have joomla 2.5.11 and the version of K2 k2 is v2.6.6 I hope you understand why it does not work, if I had seen before the other component scraper bought that instead of why I bought it I bed that is full support for k2

Posted by: cromaplus Jun 10 2013, 01:48 PM

hello I also do not really get it to work fully with k2, do not understand how to import images automatically and have not yet found no help for it

Posted by: cromaplus Jun 10 2013, 03:21 PM

I bought Joomla scraper but I can not import pictures do not understand why I see pictures in preview but after that there are k2

Posted by: Web Design Seo Jun 11 2013, 06:22 AM

Now is available new versions of aggregator scraper for joomla 2.5 and joomla! 3.x

Improvement in this version is that now K2 image sizes are supported. Till now K2 images was just copied with different names, from now on they will be resized as follows:

Resize is based on bigger side of picture. Smaller images than target size will be just renamed. All sizes of tumbnails (s, m, l, xl and so...) are made with configured global sizes in K2 global config.

Only joomla 2.5 and joomla 3.0 versions are updated!

Important Note:

Цитат
This new functionality will increase resource consumption (RAM and Time). This increment is based on original image size. For each image are generated six new resized images. It is recommended to increase your max execution time and memory limits. You can disable resizing through component options.


We recommend to use this new function only if you are on powerfull server - vps or other.

Posted by: pavelKukov Jun 25 2013, 09:54 AM

Soon will be available new downloadable version of "aggregator_scraper" for joomla 3.+

Version number for Joomla 3 now is 1.9.1. Is updated only Joomla 3 version!

We have found minor bug in previous versions of "aggregator_scraper" for joomla 3.+. Bug affects only some feeds, when for parsing is used old and new "SimplePie parser". Bug is expressed in that affected feeds was skipped and their content looks blank, but is not.

This bug is fixed in version 1.9.1.

It is recommended to upgrade your version of "aggregator_scraper" for joomla 3.+. Latest version is tested and working with both latest joomla 3 versions - is tested with joomla 3.0.3 and with joomla 3.1.1.

Posted by: pavelKukov Dec 9 2013, 02:53 PM

Update for joomla 2.5 and 3.x users!

New versions are named:
For joomla 2.5 - 1.9.5
For joomla 3.x - 1.9.2

What's new?

New option named "Remove Script Tags". With this new option you can decide whenever to remove or keep script tags from scraped html. This is usefull when you are trying to import content which relies on javascript.

Fixed minor error with image url's. In rare cases when you have one image repeated multiple times in code, then this image url becomes invalid in previous versions of aggregator_scraper.

Screenshot of new option:



This update is highly recommended for users that are trying to import content which relies on javascript!

Posted by: Web Design Seo Jan 17 2014, 09:52 AM

Today Joomla Scraper is updated again - is fixed bug in both Joomla 3 and Joomla 2.5 versions - bug is with K2 importer and latest K2 version (k2Table missing).

Posted by: Web Design Seo Jan 20 2014, 12:47 PM

Today Joomla Scraper for Joomla 3 is updated again - is improved compatibility with Joomla 3.2.

Posted by: pavelKukov Jan 28 2014, 09:56 AM

Today is fixed bug (sql error while sending Email Report) in both Scraper versions.
Update is available for both version for joomla 2.5.x and joomla 3.x.

Posted by: Web Design Seo Feb 24 2014, 01:44 PM

New version for Joomla 2.5 is released - Joomla Scraper v.1.9.6. Is added newest Kunena versions support - now Joomla Scraper for Joomla 2.5 support Kunena 3.

Only version for Joomla 2.5 is updated!

Posted by: pavelKukov Feb 25 2014, 10:20 AM

New version for Joomla 3.x is released - Joomla Scraper v.1.9.4.
Is added support for newest Kunena versions. Now Joomla Scraper for Joomla 3.x support Kunena 3.


Posted by: ataman79 Mar 20 2014, 10:13 AM

Is it possible to grab a certain number of characters from a certain new ?

For example I set my scraper to grab the full image and the whole text from the new. But I want to make it not to grab the full text but certain characters from it.
for example starting string is: <div class="round"> , and ending string (including the text) is <div class="extra">. The tags inside the text area are only <p> or </p>

So is this possible ?

Thanks in advance

Posted by: Web Design Seo Mar 20 2014, 10:49 AM

Is possible only with combination of both functions: Strip HTML and Introtext Lenght. But content will be without HTML - as clear text.


Posted by: ataman79 Mar 31 2014, 01:10 PM

QUOTE (Web Design Seo @ Mar 20 2014, 10:49 AM) *
Is possible only with combination of both functions: Strip HTML and Introtext Lenght. But content will be without HTML - as clear text.


Hi again,
ok in the tab scraper I set starting string and ending string

After that from your answer and the attached picture, i set the next options in the tab Publish:
Strip content HTML tags - Yes
Strip title HTML tags - Yes
Strip special chars in title - Yes

The option Allowed HTML tags <img><strong><p><br/><br> should I clear it or leave it as it's by default ?

Actually I want not to grab the whole new , but a part of it (actually the normal feed is enough form me, but in that case the picture is small) That's why I want to grab the new with the original image, but with less text.

Thanks in advance


Posted by: Web Design Seo Mar 31 2014, 01:20 PM

I recommend you to put in Allowed HTML tags default string:

Код
<img><strong><p><br/><br>


If Allowed HTML tags field is clear, scraper will clear ALL html tags - you will import in site only clear text.

Posted by: ataman79 Apr 1 2014, 02:28 PM

QUOTE (Web Design Seo @ Mar 31 2014, 02:20 PM) *
I recommend you to put in Allowed HTML tags default string:
CODE
<img><strong><p><br/><br>


If Allowed HTML tags field is clear, scraper will clear ALL html tags - you will import in site only clear text.



OK I left it as it's by default, but even that I grab again the whole new, not a certain number of characters

So where is my mistake, I filled exactly the same options in Publish tab as on you picture and in Scraper tab I set :
Starting string length: 30
Starting string : <div id="news_text_inner">
Ending string: <div class="news_ad_footer">

Thank you in advance

Posted by: ataman79 Apr 9 2014, 09:47 AM

Can someone help me ?
Why I can not get only a certain number of characters from the whole new ?

Posted by: Web Design Seo Apr 9 2014, 02:57 PM

Check this first: As is written in http://3dwebdesign.org/forum/how-to-configure-scraper-to-grab-the-right-content-t736, strings (start tag and end tag) must be unique.

Posted by: ataman79 Apr 9 2014, 03:18 PM

QUOTE (Web Design Seo @ Apr 9 2014, 03:57 PM) *
Check this first: As is written in http://3dwebdesign.org/forum/how-to-configure-scraper-to-grab-the-right-content-t736, strings (start tag and end tag) must be unique.


The strings which I'm using are unique. And they are working, it grabs the whole new. But as I asked earlier, I want to grab only a certain number of characters from the whole text.
You send me a screen shot how to set the settings, and based on it I set them, but even that I am still getting the whole text, not only the characters I set in Introtext length [chars] - 100

Other my settings in Publish tab are:
Split introtext - "After Before Item"
Introtext length [chars] - 100
Strip content HTML tags - Yes
Strip title HTML tags - Yes
Strip special chars in title - No
Allowed HTML tags - <img><strong><p><br/><br>

And even that I'm getting the whole text, so what else should I do ?

Posted by: Web Design Seo Apr 9 2014, 06:22 PM

Цитат(Web Design Seo @ Mar 20 2014, 01:49 PM) *
Is possible only with combination of both functions: Strip HTML and Introtext Lenght. But content will be without HTML - as clear text.



As i say previous time if you use Introtext Lenght, content probably will be brokenл Our extension is not well working with this combination and we recommend to use this function ONLY in combination with strip all html tags (will be imported clear text).

Why Introtext Lenght not work well with html tags?
Every php prgrammer with some programming experience in grabbers development will say you: is hard to develop scraper that recognize all html tags and automatically close these tags in content. There can be many paragraphs, divs and many more nested html tags.

Will be this function inside next 1-2 Scraper component updates?
This function is not planned for next 2 upodates.

Posted by: ataman79 Apr 11 2014, 03:00 PM

I understood what you mean and I done exactly what you said. Please have a look at the pictures with the settings.
But even that I can not get certain number of the article sad.gif






Any idea ?

Posted by: ataman79 May 10 2014, 08:53 AM

Hi ,
I still have no answer to my question

can you please check my posts above ?

Thanks

Posted by: Web Design Seo May 10 2014, 09:30 AM

Starting string lenght function in scraper work only if start tag and end tag are not configured. We don't recommend you to use "Starting string lenght" function in scraper at all.

Posted by: ataman79 May 10 2014, 09:55 AM

QUOTE (Web Design Seo @ May 10 2014, 10:30 AM) *
Starting string lenght function in scraper work only if start tag and end tag are not configured. We don't recommend you to use "Starting string lenght" function in scraper at all - clear this field in options.


I am clearing it, but it always put me default value 30 ?????
the only thing I can set is 0 , is this ok ?

And is this the problem, for which I can not get only certain number of characters from the full article ?

Posted by: Web Design Seo May 10 2014, 11:23 AM

Save not working? Will be checked in monday from programmer.

Posted by: ataman79 May 10 2014, 11:36 AM

QUOTE (Web Design Seo @ May 10 2014, 12:23 PM) *
Save not working? Will be checked in monday from programmer.



I mean that when I clear the field "Starting string length" - without any value in the field, and after that when I press Save & Close, it puts automatically the value 30. I mean it always wants to have some value - that's why I wrote that I can put value 0

OK I will wait for your answer.

Posted by: Web Design Seo May 10 2014, 11:44 AM

30 is default value.

Posted by: ataman79 May 10 2014, 11:51 AM

QUOTE (Web Design Seo @ May 10 2014, 10:30 AM) *
Starting string lenght function in scraper work only if start tag and end tag are not configured. We don't recommend you to use "Starting string lenght" function in scraper at all.



Yes , correct it's the default value , that's why I wrote that it always puts this value. And you wrote me that I must clear the Starting string length

-----------------------

Именно дефоулт стойността е 30 и винаги ми я слага, а ти ми писа, че полето трабвало да бъде изчистено ...

Това ли може да е проблема, заради който не мога да взема определена дължина от пълния текст ?

---------------


Posted by: Web Design Seo May 20 2014, 09:44 AM

New version of Joomla Scraper for Joomla 3.x is released - v.1.9.5 with a lot of improvements (only Joomla 3 version is updated).


Changes in Joomla Scraper v.1.9.5:



1. Set Content Images (Yes/No) - Script will try to find most appropriate image from content and will set this image as "Intro Image" and/or "Full article image". If this fails than will be used default image bellow. If default image is empty than no image will be configured. Attention: This option requires enabled images download!
2. Default Image (File choose) - Set selected image as "Intro Image" and/or "Full article image" in cases when no image if found from scraped content.
3. Minimum Image Side (integer) - This will limit automatic choice to images, which smaller side is bigger or equal to selected value. Images with smaller side than selected will be considered inappropriate.
4. Set Intro Image(Yes/No) - Automatically selected image will be set as "Intro Image" for newly created article.
5. Set Full Article Image(Yes/No) - Automatically selected image will be set as "Full article image" for newly created article.

Screenshot from new functions:




How it works?
1. All images that were found in feed are downloaded locally (download images function must be switched on).
2. Image list is filtered only to images with bigger sides than setting "Minimum Image Side".
3. All images are sorted by proportions. For example image with sides 400x300 pixels have proportions of 4/3 or 1.333(3).
4. All images are sorted by closest proportions to 1.333.
5. If there are multiple images with the same proportion and this proportion is closest possible to 1.333, than biggest (by pixels, not file size) image is selected.
6. This image is set for "Intro Image" and/or inside "Full article image" - you have settings for both.
7. As alt text is set article alias (commonly article title) - this is very good for SEO.

Posted by: Web Design Seo May 21 2014, 10:33 AM

Aggregator scrapper for joomla 3 is updated again with some bug fixes. Version number is now v.1.9.5.1

Bugs fixed:

Цитат
1. Fixed problem with cyrillic titles containing quotation marks.
2. Fixed problem with cyrillic url alias transliteration in K2.


Posted by: Web Design Seo May 27 2014, 11:01 AM

New version 1.9.6 of aggregator_scarper for joomla 3 is available (is updated only joomla 3 version).

What's new in v.1.9.6?
Four new functions inside this Scraper version.


In "Publish" section:



1. Force protocol (Leave Intact, HTTP, HTTPS, Not Specified)
Set protocol in src and href attributes. Change protocol to secure(https) or not(http). When your site is running under https some resources which are using http can be blocked from browsers. Leave Intact will not change anything. HTTP or HTTPS will force protocol to respective one. Not Specified will remove protocol information. For example http://site.com will become //site.com. In this case protocol will be automatically determined by client browser. This is best option if content is provided in both protocols http and https. This is also valid for youtube videos.

2. Skip when forcing protocol
Protocol(http or https) will be forced in "scr" and "href" attributes for all tags except listed. Default value is "a". This will preserve changing links.


In "Image Download" Section

1. Skip smaller than(px)
Images with smaller size, than selected, will be skipped and removed from content. It is measured smaller side of the image. For example if you have image with dimensions 240x180 than 180 will be used for comparison.



2. Maximum Images
Maximum number of images, which to be left in content. All images after selected number will be removed from file system and content.


With these FOUR new options:


Posted by: pavelKukov Sep 15 2014, 01:16 PM

New versions of Aggregator Scrapper are available for both joomla 2.5 and 3.x
It is fixed bug affecting integration with kunena.
Number of new version is 1.9.7

Posted by: Web Design Seo Oct 6 2014, 09:52 AM

Joomla Scraper v.1.9.8 for Joomla3 is released. This release is bug fix, is updated only Joomla 3 version!

Цитат
Fixed: Strip content in multiple places. Separate markers by new line not working.

Posted by: Web Design Seo Jan 12 2016, 03:12 PM

12.01.2016: Released Joomla Scraper new version - v.1.9.9 for Joomla 3. Changes:

1. Added new option to limit length of imported content. Works with or without Scraper.



2. New function for images seo - to add relevant alt of imported pictures. There is no new option in feed configuration - alt tag is automatically added to images imported in content, to Intro image and to Full article image (see screen).


Posted by: Web Design Seo Feb 7 2018, 11:07 AM

Price of https://3dwebdesign.org/joomla-scraper was chnaged today - from 69 to 49 usd plus component is Open Source - pay once, use on unlimited number of websites.

Posted by: Web Design Seo Sep 14 2018, 08:35 AM

14.09.2018: Joomla Scraper v.1.9.9.1 released. Changes: No new functions, only updated to work with Php 7.2.

Posted by: Web Design Seo Oct 9 2019, 08:53 AM

If you have troubles with php 7.2 or php 7.3, you can try to change php version only for Joomla Scraper folder: create one new .htaccess file and place it in this directory:

Код
public_html/administrator/components/com_aggregator/


Code for this .htaccess file to change to php 7.2:
Код
AddHandler x-httpd-php72 .php


If code doesn't works, replace it with:
Код
AddHandler application/x-httpd-php72 .php


or try:
Код
AddType application/x-httpd-alt-php72 .php .php7 .phtml

Powered by Invision Power Board (http://www.invisionboard.com)
© Invision Power Services (http://www.invisionpower.com)