Printable Version of Topic

Click here to view this topic in its original format

Web Design Seo Forum _ Joomla Scraper _ Server Load And Cron In Joomla Scraper

Posted by: smok3r Feb 11 2013, 04:57 PM

aggregator scraper is only reading the last feed. I have 3 feeds in it and only the last one will work. If I turn off the last feed it will start working with the second feed. Turn the second one off and and now only the first will work. Ever with the 1st and 3rd feed enabled only the last one will work.

I saw others have had this problem. Is there a fix? I really need to get this working.

Posted by: Web Design Seo Feb 11 2013, 05:06 PM

There is no way, the only chance is some bug in latest version. We have your own 3 aggregator news websites that work with previous version - scraper v.1.8.6 (latest is 1.8.7 from today, will be public in next three days).

May be you have some huge feed or some extras switched on that load server too much like download of images, scraper, synonyms replacement from synonyms database.

Please, read how-to activate cron job first:

Цитат
You must configure cron on two different places:
- in every rss feed in Joomla Scraper configuration (different settings for every feed)
- in cronjob configuration in your hosting control panel (set to run every 2, every 3 or every 5 minutes)

Create file with name mycron.php. Code for this php file:

Код
<?php
$a = file_get_contents('http://yoursite.com/administrator/components/com_aggregator/cron.aggregator.php');
?>



Upload file mycron.php in your public_html. Create cronjob in your cpanel with command:
Код
/usr/local/bin/php -q /relative path to public_html/mycron.php >> /dev/null


Set cron to run in your control panel (cpanel or other) every 5 or 3 minutes (recommended). In Unix Style view this should be:

Код
*/3 * * * *



How to diagnose cron - Short Guide:

1. First check component requirements - memory limit (128 mb or more) and max execution time (60 seconds or more).
2. Than import manual feeds - one by one. Without extras like download images, synonyms replacement and scraper.
3. After this configure feeds one by one with extras that you want - like download images and scraper and test - again one by one.
4. Finally try automatic import with cron - after you see that any feed is working ok manual.

Цитат
Notice: Keep in mind that after every test of some feed, you must delete imported items. If you use joomla 2.5/1.7/3.0 you must delete items from trash also! If you use simplepie parser, you must wait min. 15 minutes! (cron cahe is set at 15 minutes, is not normal to import one feed every minute)



Short cron configuration guide:

Цитат
Recommended configuration for cron is to try to import one rss feed once on 2 hours or max. once on hour.


Configure feeds to be imported in different times, not at once! If in cpanel cron is activated cron on every 3rd minute and feed one is imported in 01:04, feed two is safe to import after min. 5 minutes, in 01:09.

If feeds are 7, i will configure it in these minutes (once on hour or once on 2 hours):
:5
:12
:19
:25
:35
:45
:55

------------End of Short cron configuration guide-----------

So, What are your memory limit and max execution time settings?

Posted by: smok3r Feb 11 2013, 05:34 PM

Well there is a way.

Its a new site and pulling only 3 articles per feed. Not a huge feed lol and no extras switched on. memory limit 64m... max execition 300.

So you have no idea it seems. If thats the case then i'd like to get a refund and you can revoke the licenses on both sites i'm licensed for

QUOTE (Web Design Seo @ Feb 11 2013, 12:06 PM) *
There is no way, the only chance is some bug in latest version. We have your own 3 aggregator news websites that work with previous version - scraper v.1.8.6 (latest is 1.8.7 from today, will be public in next three days).

May be you have some huge feed or some extras switched on that load server too much like download of images, scraper, synonyms replacement from synonyms database.

What are your memory limit and max execution time settings?


Posted by: smok3r Feb 11 2013, 05:38 PM

Oh and it does work great manually... but thats not how i want to use it.

You have any suggestions? I'll see if next version works. This kinda sucks

Posted by: Web Design Seo Feb 12 2013, 07:10 AM

1. Increase memory limit to 128 mb or more. This is the reason to stop work with many feeds.
2. If you use scraper version, try to use old version of simplepie - is better than new simplepie. Use new version of simplepie only if you got some php errors like deprecated.

Posted by: smok3r Feb 12 2013, 01:17 PM

Increased to 128 and will change to old version of simplepie. I'll see if this works when i get home.

I noticed something else. When i choose After N Chars and introtext length is set at any number... using 250 at the moment. The intro text is fine but the full length article picks up where the intro text left off. The full text is missing the intro text part. It was fine in the first version i was sent but i noticed this in v.1.8.6

Posted by: Web Design Seo Feb 12 2013, 01:43 PM

There is no warranty that introtext separator will be after closing of some html tag, so end html can be:

Код
<div>some contetnt, text....
<hr id="system-readmore" />
...and text....</div>

and this will broke your website look. Use option "limit chars" carefully.

Posted by: smok3r Feb 13 2013, 01:21 AM

Well i added another feed and now it only pulls that one and ignores the first 3.

2 of the feeds pull with no html content... 1 pulls only title and linkback... 1 is full text. Not over taxing at all. I tried all 3 parsers and still only last feed is pulled.

Posted by: Web Design Seo Feb 13 2013, 09:46 AM

If you want to help you, please, send on our email login data for admin account in your website.

Posted by: smok3r Feb 14 2013, 12:29 AM

I emailed you

Posted by: smok3r Feb 14 2013, 10:29 PM

Does aggregator platinum still have xajax? Because it worked perfectly. Pulled all 4 feeds.

I was reading about platinum and it looks like it can does have a feed limit like scraper. It also has the scraper function as an add-on.

Could I trade the aggregator scraper for the platinum and scraper add-on and only license rageandwar.com site? Remove licenses for scraper from both sites? That would be the same cost. (I'm not using th3gate.com anymore)

Scraper was $29.99 and the extra domain was $10.00. The platinum and scraper add-on would be the same amount $19.99 and $19.99. It would be an even exchange.

Posted by: Web Design Seo Feb 15 2013, 07:24 AM

Aggregator platinum is without many extras - load server in times less. There is no option to upgrade with scraper plugin. This is only the price for update to scraper version.

Posted by: Web Design Seo Feb 18 2013, 03:28 PM

See this scheme to know how work cron in joomla scraper:



Please, send us login data again (all login data sent to us are deleted immediately after i see that username and pass not work)

Posted by: smok3r Feb 18 2013, 06:09 PM

I'll sent it this afternoon when i get off work.

So you're saying using curl and excuting cron.aggregator.php won't work.... like below?
*/3 * * * * curl -sS www.my_website.com/administrator/components/com_aggregator/cron.aggregator.php >>/dev/null

This should work the same way? Its doing the same thing just going around mycron.php which is executing cron.aggregator.php and doing it directly?

Posted by: smok3r Feb 18 2013, 09:33 PM

I got this msg twice...
Your message did not reach some or all of the intended recipients.

Subject: RE: Heres login to site to look at aggregator scraper

Don't know if you got it or not. Let me know. I replied to your earlier email so it must be the correct email

Posted by: Web Design Seo Feb 19 2013, 04:34 PM

1. You have configured to use functions that use many server resources - download images local in combination with every feed with exact cron configuration. This mean that all feeds are trying to import at once.
In the world is no server that can do this.

Time of import of different feeds must be cron setting plus min. cron time from cpanel config plus 1!

If in cpanel cron is executed every third minute (*/3 * * * *), config of feeds must be something like:

feed one:

Код
5 0-23 * * *


feed two:
Код
9 0-23 * * *


feed three:
Код
13 0-23 * * *


feed four:
Код
17 0-23 * * *


feed five:
Код
21 0-23 * * *


In this way, feed one will be executed in every hour 05 minute, feed two will be executed in every hour 09 minute and so...

Posted by: smok3r Feb 19 2013, 08:26 PM

I had them all staggered except the first 2. One of those was just pulling title and link to it... no other scrapper setting was at the same time.

I'll try what you have though with the staggered time in order of feed

Posted by: smok3r Feb 19 2013, 09:36 PM

Ok... set it up the way you showed and went the whole hour pulling no other feeds until the last feed.

Again only the last feed was pulled. This is BS.

Posted by: smok3r Feb 22 2013, 08:45 PM

Maybe put xajax as another option? the trial Platinum worked.

You know i set this up and it does everything you could want except the not auto pulling feeds through cron... only the last one. I set it up the way you said and still only the last one. Pulling the feeds manually is for the birds.

Do you have any other suggestions? or are you gonna let me hang with it the way it is? Is the next release done so I can see if it works? This is kinda frustrating not getting a response or an email.

Posted by: Web Design Seo Feb 23 2013, 05:04 PM

xajax is not supported from php 5.3. And joomla 2.5 require php 5.3. And xajax have nothing to do with server limits and importing of many feeds at once (= needed huge server limits).

Posted by: smok3r Feb 23 2013, 05:37 PM

I use joomla 2.5.9 and on php 5.2.17. Works fine. Could the PHP version be the problem with scraper?

All i was saying is the trial platinum with xajax pulled all my feeds like it should. Just trying to figure a way to get this to pull feeds using cron.

Everything kicks ass with scraper except this one kinda major problem.

Posted by: smok3r Feb 23 2013, 06:25 PM

No . Updated to PHP version 5.3.20. Still no go. Only last feed

Posted by: Web Design Seo Feb 26 2013, 11:50 AM

@smok3r: Especially for your case and similar cases we develop this:

Цитат
26.02.2013: v.1.8.9 for Joomla 2.5 and v.1.6.8 for joomla 1.5. Added import time and memory usage functions in: manual import, preview, in email notification and in cron. With this improved statistic you can check and diagnose import problems and measure performance of different feeds between different parsers.


With this update you can see where is the problem and why (i think that your problem is max_execution time. When using download of images max_execution time must be 60 seconds or more.). Send me email from email used in your order to receive last version.

Posted by: smok3r Feb 26 2013, 09:00 PM

Fingers crossed. Email sent

Posted by: smok3r Feb 26 2013, 10:43 PM

Got this message again when i emailed you....

Your message did not reach some or all of the intended recipients.

Subject: email used in your order to receive last version.
Sent: 2/26/2013 3:59 PM

The following recipient(s) cannot be reached:

'3D Уеб дизайн' on 2/26/2013 3:59 PM
503 Valid RCPT command must precede DATA

Posted by: Web Design Seo Feb 27 2013, 09:25 AM

I send you PM. Please, read carefull this post: here are posted examples, stats and screenshots from import of one feed and with many rss feeds with and without extras switched on (like scraper and images download).

Цитат
Now the new version (Joomla Scraper 1.6.6 for Joomla 1.5 and Joomla Scraper 1.8.9 for Joomla 2.5 ) detects time for import and memory usage. With this improved statistic you can make more - accurate assessment and not allow too large or slow feeds.

As you can see from screenshots, time and memory usage depends on items number, usage of scraper, synonyms replacement and content shuffle, image download and feed and site response speed.

Cost more memory (you need to increase memory_limit of php 64 or more mb, recommended over 128):
- large feeds with many items
- usage of scraper, synonyms replacement and content shuffle
- import of many feeds at once

Cost more time (you need to increase max_execution_time of php to 120 seconds or more):
- large feeds with many items
- download of images
- slow websites (response speed). Website that is opened from server in your country is fast, websites from other continents are slower.
- scraper and synonyms replacement


Feed preview without scraper and content shuffle took around second for feed with 40 items from yahoo.



Feed import without scraper and content shuffle took around 6 seconds for feed with 40 items from yahoo.



Feed import with scraper and content shuffle and image download took around 30 seconds for feed with 10 items from iTunes.



Import time for 93 items from feeds with different configuration took around 1.30 - 2 minutes. On most hosting accounts normal php configuration is "max_execution_time = 30" - 60 seconds and "memory_limit = 16M" - 32M megabytes.



Import from crontab with scraper, content shuffle and image download from site with slow response time took around one minute for only 14 items.



Цитат
Check your php settings and made needed adjustments over custom php.ini file or just place new support ticket to your host support and ask to change these php settings!

Posted by: smok3r Feb 27 2013, 10:50 PM

1st feed.. Feed processed in 5 seconds and 887 milliseconds with RAM usage 12.32 MB 3 New content

2nd feed ... Feed processed in 7 seconds and 958 milliseconds with RAM usage 3.15 MB 3 New content

pulled 1st when only it was enabled. When both enabled it pulled the second feed and never pulled the first feed. Its only pulling last feed that's enabled still.

I'm only pulling 3 items per feed, same as when i first purchased it. So not server load. I've tried setting them at same time and separate times... still the same. Tried all 3 different parsers and still the same. Only the last feed enabled gets pulled. If i do it manually all feeds are processed just cron is doing this.

Uses its own php.ini with these settings
allow_url_fopen = On
max_execution_time = 300
memory_limit = 128M
max_input_time = 300
post_max_size = 50M
upload_max_filesize = 20M

Oh well... Don't know what else to do. Thanks for the help. Hopefully i'll stumble on the reason for this.

Posted by: Web Design Seo Feb 28 2013, 07:38 AM

Your case is just different - you use some custom php script on your server that parse rss feed and serve it to aggregator.

Possible thing 1: This php script use time and memory also. May be this script stop work.
Possible thing 2: If using cron and simplepie, one feed will be imported again only after min. 15 minutes! May be just simplepie think that urls like these are the same and the only one rss feed:

Код
mysite.com/script.php?url-of-some-rss-feed&result=rss
mysite.com/script.php?url-of-rss-feed-two&result=rss
mysite.com/script.php?url-of-rss-feed-three&result=rss


There is a option to use or not simplepie cache, but i don't think that simplepie work well with this cache.

The ways to solve problem:
1. I recommend you to try our custom RSS parser (you may need to configure it manual) - will work faster and for sure without cache.
2. Use in aggregator direct link to rss feed, not over your custom php script.

Posted by: smok3r Feb 28 2013, 06:53 PM

I tried the simplepie cache.

I've been using in cron
php -q /relative path to host directory/public_html/mycron.php >> /dev/null

also triedwith its own php.ini
php -c /relative path to host directory/public_html/administrator/php.ini /relative path to host directory/public_html/mycron.php >> /dev/null


In the mycron.php
<?php
$a = file_get_contents('http://yoursite.com/administrator/components/com_aggregator/cron.aggregator.php');
?>

I'll look into the custom rss parser
Oh and i don't use any scripts... direct link to feed in aggregator with the cron setting above and mycron.php above

Posted by: smok3r Mar 5 2013, 03:10 PM

Still a no go. Still only pulls last feed

To bad scrapper can't be set with an auto update by feed, without cronjob in control panel, just by setting the seconds (like 3600) for each feed like this other grabber i found.
Or setting multiple cronjobs in the Control Panel an adding the feeds ID number to the cron command like...
5 * * * * <command>/aggegator.php 1,4,6 to run at seperate times like one i used in the past on PHP.

Oh well. I'll manually update until can find something else. Looking at another grabber that the auto update works. scrapper is much easier to use and has more features but like i said the manual updating blows lol.

Posted by: smok3r Mar 29 2013, 07:55 PM

Sent you a PM about new update.

Maybe it will get scraper to auto update somehow. I'm still manually updating and using another aggregator thats not as good to auto

Posted by: Web Design Seo Apr 1 2013, 06:16 AM

This extension is updated only manual due on license generation. When you want latest version, send me email from email used in your order and we will send you latest version.

P.S. I see your email and will send you update in next hour.

Posted by: smok3r Apr 1 2013, 07:30 AM

Thank You. My email still gets returned when emailing you and anyone else... Its ok receiving. Working on getting it fixed

Posted by: Web Design Seo Apr 23 2013, 12:56 PM

Цитат(smok3r @ Mar 5 2013, 06:10 PM) *
Still a no go. Still only pulls last feed


Finally we found http://3dwebdesign.org/forum/index.php?showtopic=698&view=findpost&p=5992 and all versions of Com_Aggregator are fixed today - scraper and platinum version also.

Bug was hard to find - cron was not work well only in some combination of functions switched on in feed and only on some php versions. But now is fixed.

Request update on our email to receive update.


Attention: if you already have problems, check your file /administrator/components/com_aggregator/helpers/cron.php for http://3dwebdesign.org/forum/index.php?showtopic=698&view=findpost&p=5997.

Powered by Invision Power Board (http://www.invisionboard.com)
© Invision Power Services (http://www.invisionpower.com)