Welcome Guest ( Log In | Register )

 Forum Rules Aggregators support
 
Reply to this topicStart new topic
> Removing Html Or Javascript In Scrapped Content, Tips on how to parse page
Jeff Honeyager
post Apr 9 2014, 04:56 PM
Post #1


Newbie
*

Group: Members
Posts: 2
Joined: 7-October 13
Member No.: 1,842




The only "unique" tag prior to an article is
< li class="print-icon"> space added after <

The scraper detects the full article, but the article also contains partial HTML and partial javascript

alt="Print" />


onclick="window.open(this.href,'win2','width=400,height=350,menubar=yes,resizable=yes'); return false;">


How can I remove these.

I successfully removed from another feed using Synonyms, but the Javascript contains semicolons and that will not work.

Any ideas?


Go to the top of the page
 
+Quote Post
Web Design Seo
post Apr 9 2014, 06:26 PM
Post #2


Web Design Seo
****

Group: Root Admin
Posts: 4,003
Joined: 29-April 09
From: Sofia
Member No.: 1



You can't remove only:
Код
onclick="window.open(this.href,'win2','width=400,height=350,menubar=yes,resizable=yes'); return false;">


Start and end tags can be not full htmkl tags, can be just strings, so you can use both:
Код
< li class="print-icon">

or
Код
< li class="print-icon"

or
Код
< li class="print


Is required only this tag to be one time on page. If this string is present two or more times on page, scraper will use first position.


--------------------
Правила на форума | Forum Rules | How to receive support. 3D Web Design: Уеб дизайн, Seo оптимизация, Web Site Extensions, Oscommerce Addons, Wordpress plugins and Joomla Extensions. Изработка на уеб сайтове и оптимизация на сайт за търсачки и Seo услуги.
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

Collapse

> Similar Topics

  Topic Replies Topic Starter Views Last Action
No New Posts Html Mess Post By Email In Kunena
1 geejay 1,951 16th November 2015 - 09:57 AM
Last post by: Web Design Seo
No New Posts Pinned: Content Authors Plugin For Payments For Joomla
paid access to user groups in Joomla 2.5
0 Web Design Seo 6,092 2nd July 2012 - 08:33 AM
Last post by: Web Design Seo
No New Posts Content Time Stamp
Content Time Stamp
2 Xlibiris 2,860 4th May 2012 - 03:22 PM
Last post by: Xlibiris
No New Posts Content Seo, Problems With Text Align
problems with readmore and text align
1 papero de paperi 2,963 12th December 2010 - 09:55 AM
Last post by: 3D Web Design
No New Posts Content Seo Hack за Wordpress Aggregator Platinum
0 Web Design Seo 3,294 20th August 2010 - 09:44 AM
Last post by: 3D Web Design


 



RSS Lo-Fi Version Time is now: 19th July 2018 - 11:44 AM
Clicky Web Analytics