Welcome Guest ( Log In | Register )

 Forum Rules Aggregators support
Reply to this topicStart new topic
> Removing Html Or Javascript In Scrapped Content, Tips on how to parse page
Jeff Honeyager
post Apr 9 2014, 04:56 PM
Post #1


Group: Members
Posts: 2
Joined: 7-October 13
Member No.: 1,842

The only "unique" tag prior to an article is
< li class="print-icon"> space added after <

The scraper detects the full article, but the article also contains partial HTML and partial javascript

alt="Print" />

onclick="window.open(this.href,'win2','width=400,height=350,menubar=yes,resizable=yes'); return false;">

How can I remove these.

I successfully removed from another feed using Synonyms, but the Javascript contains semicolons and that will not work.

Any ideas?

Go to the top of the page
+Quote Post
Web Design Seo
post Apr 9 2014, 06:26 PM
Post #2

Web Design Seo

Group: Root Admin
Posts: 4,161
Joined: 29-April 09
From: Sofia
Member No.: 1

You can't remove only:
onclick="window.open(this.href,'win2','width=400,height=350,menubar=yes,resizable=yes'); return false;">

Start and end tags can be not full htmkl tags, can be just strings, so you can use both:
< li class="print-icon">

< li class="print-icon"

< li class="print

Is required only this tag to be one time on page. If this string is present two or more times on page, scraper will use first position.

Правила на форума | Forum Rules | How to receive support. 3D Web Design: Уеб дизайн, Seo оптимизация, Web Site Extensions, Oscommerce Addons, Wordpress plugins and Joomla Extensions. Изработка на уеб сайтове и оптимизация на сайт за търсачки и Seo услуги.
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:


> Similar Topics

  Topic Replies Topic Starter Views Last Action
No New Posts Html Mess Post By Email In Kunena
1 geejay 2,859 16th November 2015 - 09:57 AM
Last post by: Web Design Seo
No New Posts Pinned: Content Authors Plugin For Payments For Joomla
paid access to user groups in Joomla 2.5
0 Web Design Seo 6,971 2nd July 2012 - 08:33 AM
Last post by: Web Design Seo
No New Posts Content Time Stamp
Content Time Stamp
2 Xlibiris 3,671 4th May 2012 - 03:22 PM
Last post by: Xlibiris
No New Posts Content Seo, Problems With Text Align
problems with readmore and text align
1 papero de paperi 3,650 12th December 2010 - 09:55 AM
Last post by: 3D Web Design
No New Posts Content Seo Hack за Wordpress Aggregator Platinum
0 Web Design Seo 4,006 20th August 2010 - 09:44 AM
Last post by: 3D Web Design


RSS Lo-Fi Version Time is now: 21st October 2019 - 05:15 AM
Clicky Web Analytics