What is Full-Text RSS?

News enthusiasts
Full-Text RSS can transform partial web feeds — often summary-only feeds which expect you to visit cluttered, ad-ridden site to read the full story — to deliver the full content stripped of clutter and ads. Read articles in full, in peace, in your favourite news reading application.

Developers
Full-Text RSS is a free software PHP application to help you extract article content from web pages. Extract from a standard HTML page or transform partial feeds to full text. Designed to be run as a web service, but one which you control.


Features

Icon

Speedy article extraction

Extraction rules ensure accurate results for popular sites and blog platforms.

Icon

Multi-page support

Articles split across a number of pages can be joined back together.

Icon

Autodetection

Where extraction rules do not exist, Full-Text RSS relies on heuristics to detect content automatically.

Icon

Customisable

Add custom extraction rules for fine-grained extraction.

Icon

Language detection

Full-Text RSS can figure out the language of the article being processed.

Icon

Multiple formats

Extract articles from HTML pages and partial web feeds, and get result as RSS, JSON, or JSONP for easy parsing.

Icon

Easy hosting

Host on your own servers or deploy to the cloud. Pre-configured. No database required. See our hosting suggestions.

Icon

Freedom and transparency

Full-Text RSS is free software — no restrictive corporate APIs, no secret back doors.


Pricing

Basic

Free
  • We host it
  • Unlimited feeds
  • Language detection
  • 1-3 items per feed
  • Caching: 20 min
  • Links preserved
  • Link to FiveFilters.org
  • No JSON output

Premium

From 5€ per month
  • We host it
  • Unlimited feeds
  • Language detection
  • 1-10 items per feed
  • Caching: 10 min
  • Links preserved or removed
  • No link to FiveFilters.org
  • No JSON output

Developer

Pay as you go
  • We host it
  • Unlimited feeds
  • Language detection
  • 1-10 items per feed
  • Caching: 10 min
  • Links preserved or removed
  • No link to FiveFilters.org
  • JSON output

Download

Full-Text RSS 3.3

Released 13 May 2014What's new?Changelog

We offer two purchase options. They come with the same license, but if you intend to use Full-Text RSS as part of a commercial project, please purchase the one for business use.

Full-Text RSS v3.3

for personal or student use

zip package — 20 €

Buy Now

Full-Text RSS v3.3

for business use

zip package — 40 €

Buy Now

What you get

Full-Text RSS 3.3 from FiveFilters.org includes:

  • Easy installation (no database setup required)
  • Technical support
  • Free updates for 1 year (half price after that)
  • Custom site pattern for a site of your choice *
  • Full source code

* If extraction does not work well on a particular site, contact us with details of what you're trying to extract and we'll send you a custom site config file (for one site only).

After paying you will automatically receive an email with a download link to the zip package. The zip package contains a readme file with instructions for uploading the code to your web host via FTP.

Older versions

Older versions of Full-Text RSS — without site-specific extraction rules — can be downloaded free of charge from our code repository.

Note: we do not offer any support for these and for best extraction results we recommend buying the latest version.


More information

Documentation and support

Our help site covers most of what you'll need to know to get Full-Text RSS up and running and customised to work the way you want.

Our public forum is the place to ask questions and browse previous answers.

Hosted or self-hosted?

We want our users to be free to examine and run the code behind FiveFilters.org however they like. So rather than simply invite you to sign up for our premium hosted plan, we've gone to great effort to make the software easy to use and install on your own hosting account.

Using our hosted service (Free, Premium) is the easiest option as we manage everything. You do not have to worry about staying up to date because we maintain the code and any changes we make will automatically be made available to you.

If, however, you have your own hosting account or manage your own server, the self-hosted option gives you the freedom to run the code and manage things yourself — including writing custom extraction rules. We also have a help page on hosting options which should help you get started.

Note: We monitor our hosted service to prevent abuse. For developers needing to process very large amounts of data, we highly recommend downloading the self-hosted version.

API

The details here are mainly intended for developers using our self-hosted copy of Full-Text RSS for article extraction and feed conversion. News enthusiasts who simply want to subscribe to a full-text feed in their news reading application can safely ignore the details here and use the form above.

Full-Text RSS offers two endpoints: Article Extraction and Feed Conversion. If you've restricted access to Full-Text RSS, the final section on API keys will tell you how to pass your key along in the request.

1. Article Extraction

To extract article content from a web page and get a simple JSON response, use the following endpoint:

  • /extract.php?url=[url]

Request Parameters

When making HTTP requests, you can pass the following parameters to extract.php in a GET or POST request.

Note: for many of these parameters, the configuration file will ultimately determine if and how they can be used.

Parameter Value Description
url string (URL) This is the only required parameter. It should be the URL to a standard HTML page. You can omit the 'http://' prefix if you like.
inputhtml string (HTML) If you already have the HTML, you can pass it here. We will not make any HTTP requests for the content if this parameter is used. Note: The input HTML should be UTF-8 encoded. And you will still need to give us the URL associated with the content (the URL may determine how the content is extracted, if we have extraction rules associated with it).
content 0, 1 (default) If set to 0, the extracted content will not be included in the output.
links preserve (default), footnotes, remove Links can either be preserved, made into footnotes, or removed. None of these options affect the link text, only the hyperlink itself.
xss 0, 1 (default)

Use this to enable/disable XSS filtering. It is enabled by default, but if your application/framework/CMS already filters HTML for XSS vulnerabilities, you can disable XSS filtering here.

If enabled, we'll pass retrieved HTML content through htmLawed (safe flag on and style attributes denied). Note: when enabled this will remove certain elements you may want to preserve, such as iframes.

lang 0, 1 (default), 2, 3

Language detection. If you'd like Full-Text RSS to find the language of the articles it processes, you can use one of the following values:

0
Ignore language
1
Use article metadata (e.g. HTML lang attribute) (Default value)
2
As above, but guess the language if it's not specified.
3
Always guess the language, whether it's specified or not.
debug [no value], rawhtml, parsedhtml

If this parameter is present, Full-Text RSS will output the steps it is taking behind the scenes to help you debug problems.

If the parameter value is rawhtml, Full-Text RSS will output the HTTP response (headers and body) of the first response after redirects.

If the parameter value is parsedhtml, Full-Text RSS will output the reconstructed HTML (after its own parsing). This version is what the extraction rules are applied to, and it may differ from the original (rawhtml) output. If your extraction rules are not picking out any elements, this will likely help identify the problem.

Note: Full-Text RSS will stop execution after HTML output if one of the last two parameter values are passed. Otherwise it will continue showing debug output until the end.

parser html5php, libxml The default parser is libxml as it's the fastest. HTML5-PHP is an HTML5 parser implemented in PHP. It's slower than libxml, but can often produce better results. You can request HTML5-PHP be used as the parser in a site-specific config file (to ensure it gets used for all URLs for that site), or explicitly via this request parameter.
proxy 0, 1, string (proxy name) This parameter has no effect if proxy servers have not been entered in the config file. If they have been entered and enabled, you can pass the following values: 0 to disable proxy use (uses direct connection). 1 for default proxy behaviour (whatever is set in the config), or a string to identify a specific proxy server (has to match the name given to the proxy in the config file).

Response (example)

Simple JSON output containing extracted article title, content, and more. It was produced from the following input URL: http://chomsky.info/articles/20131105.htm

{
    "title": "De-Americanizing the World",
    "excerpt": "During the latest episode of the Washington farce that has astonish…",
    "date": null,
    "author": "Noam Chomsky",
    "language": "en",
    "url": "http://chomsky.info/articles/20131105.htm",
    "effective_url": "http://chomsky.info/articles/20131105.htm",
    "content": "<p>During the latest episode of the Washington farce that has aston…"
}

Note: For brevity the output above is truncated.


2. Feed Conversion

To transform a partial feed to a full-text feed, pass the URL (encoded) in the querystring to the following URL:

  • /makefulltextfeed.php?url=[url]

All the parameters in the form at the top of this page can be passed in this way. Examine the URL in the address bar after you click 'Create Feed' to see the values.

Request Parameters

When making HTTP requests, you can pass the following parameters to makefulltextfeed.php in a GET request. Most of these parameters have default values suitable for news enthusiasts who simply want to subscribe to a full-text feed in their news reading application. If that's what you're doing, you can safely ignore the details here. For developers, or others who need more control over the output produced by Full-Text RSS, this section should give you an idea of what you can do.

We do not provide form fields for all of these parameters, but you can modify the URL in your browser after clicking 'Create Feed' to use them.

Note: for many of these parameters, the configuration file will ultimately determine if and how they can be used.

Parameter Value Description
url string (URL) This is the only required parameter. It should be the URL to a partial feed or a standard HTML page. You can omit the 'http://' prefix if you like.
format rss (default), json The default Full-Text RSS output is RSS. The only other valid output format is JSON. To get JSON output, pass format=json in the querystring. Exclude it from the URL (or set it to ‘rss’) if you’d like RSS.
summary 0 (default), 1 If set to 1, an excerpt will be included for each item in the output.
content 0, 1 (default) If set to 0, the extracted content will not be included in the output.
links preserve (default), footnotes, remove Links can either be preserved, made into footnotes, or removed. None of these options affect the link text, only the hyperlink itself.
exc 0 (default), 1 If Full-Text RSS fails to extract the article body, the generated feed item will include a message saying extraction failed followed by the original item description (if present in the original feed). You ask Full-Text RSS to remove such items from the generated feed completely by passing 1 in this parameter.
html 0 (default), 1

Treat input source as HTML (or parse-as-html-first mode). To enable, pass html=1 in the querystring. If enabled, Full-Text RSS will not attempt to parse the response as a feed. This increases performance slightly and should be used if you know that the URL is not a feed.

Note: If excluded, or set to 0, Full-Text RSS first tries to parse the server's response as a feed, and only if it fails to parse as a feed will it revert to HTML parsing. In the default parse-as-feed-first mode, Full-Text RSS will identify itself as PHP first and only if a valid feed is returned will it identify itself as a browser in subsequent requests to fetch the feed items. In parse-as-html-first mode, Full-Text RSS will identify itself as a browser from the very first request.

xss 0 (default), 1

Use this to enable XSS filtering. We have not enabled this by default because we assume the majority of our users do not display the HTML retrieved by Full-Text RSS in a web page without further processing. If you subscribe to our generated feeds in your news reader application, it should, if it's good software, already filter the resulting HTML for XSS attacks, making it redundant for Full-Text RSS do the same. Similarly with frameworks/CMSs which display feed content - the content should be treated like any other user-submitted content.

If you are writing an application yourself which is processing feeds generated by Full-Text RSS, you can either filter the HTML yourself to remove potential XSS attacks or enable this option. This might be useful if you are processing our generated feeds with JavaScript on the client side - although there's client side xss filtering available too.

If enabled, we'll pass retrieved HTML content through htmLawed (safe flag on and style attributes denied). Note: if enabled this will also remove certain elements you may want to preserve, such as iframes.

callback string This is for JSONP use. If you're requesting JSON output, you can also specify a callback function (Javascript client-side function) to receive the Full-Text RSS JSON output.
lang 0, 1 (default), 2, 3

Language detection. If you'd like Full-Text RSS to find the language of the articles it processes, you can use one of the following values:

0
Ignore language
1
Use article metadata (e.g. HTML lang attribute) or feed metadata. (Default value)
2
As above, but guess the language if it's not specified.
3
Always guess the language, whether it's specified or not.

If language detection is enabled and a match is found, the language code will be returned in the <dc:language> element inside the <item> element.

debug [no value], rawhtml, parsedhtml

If this parameter is present, Full-Text RSS will output the steps it is taking behind the scenes to help you debug problems.

If the parameter value is rawhtml, Full-Text RSS will output the HTTP response (headers and body) of the first response after redirects.

If the parameter value is parsedhtml, Full-Text RSS will output the reconstructed HTML (after its own parsing). This version is what the extraction rules are applied to, and it may differ from the original (rawhtml) output. If your extraction rules are not picking out any elements, this will likely help identify the problem.

Note: Full-Text RSS will stop execution after HTML output if one of the last two parameter values are passed. Otherwise it will continue showing debug output until the end.

parser html5php, libxml The default parser is libxml as it's the fastest. HTML5-PHP is an HTML5 parser implemented in PHP. It's slower than libxml, but can often produce better results. You can request HTML5-PHP be used as the parser in a site-specific config file (to ensure it gets used for all URLs for that site), or explicitly via this request parameter.
proxy 0, 1, string (proxy name) This parameter has no effect if proxy servers have not been entered in the config file. If they have been entered and enabled, you can pass the following values: 0 to disable proxy use (uses direct connection). 1 for default proxy behaviour (whatever is set in the config), or a string to identify a specific proxy server (has to match the name given to the proxy in the config file).

Feed-only parameters — These parameters only apply to web feeds. They have no effect when the input URL points to a web page.

Parameter Value Description
use_extracted_title [no value] By default, if the input URL points to a feed, item titles in the generated feed will not be changed - we assume item titles in feeds are not truncated. If you'd like them to be replaced with titles Full-Text RSS extracts, use this parameter in the request (the value does not matter). To enable/disable this for for all feeds, see the config file - specifically $options->favour_feed_titles
max number The maximum number of feed items to process. (The default and upper limit will be found in the configuration file.)

Response (example)

JSON output produced for the BBC feed http://feeds.bbci.co.uk/news/rss.xml. You can also request regular RSS.

{
    "rss": {
        "@attributes": {
            "version": "2.0"
        }
,
        "channel": {
            "title": "BBC News - Home",
            "link": "http://www.bbc.co.uk/news/#sa-ns_mchannel=rss&amp;ns_source=PublicR…",
            "description": "The latest stories from the Home section of the BBC News web site.",
            "ttl": 15,
            "image": {
                "title": "BBC News - Home",
                "link": "http://www.bbc.co.uk/news/#sa-ns_mchannel=rss&amp;ns_source=PublicR…",
                "url": "http://news.bbcimg.co.uk/nol/shared/img/bbc_news_120x60.gif"
            }
,
            "item": [
                {
                    "title": "Russia's Putin visits annexed Crimea",
                    "link": "http://www.bbc.co.uk/news/world-europe-27344029#sa-ns_mchannel=rss&…",
                    "guid": "http://www.bbc.co.uk/news/world-europe-27344029#sa-ns_mchannel=rss&…",
                    "description": "President Putin: \"[Crimeans have] proved their loyalty to a histor…",
                    "content_encoded": "<!-- Adding hypertab -->&#13;\n&#13;\n&#13;\n<!-- end of hypertab -…",
                    "pubDate": "Fri, 09 May 2014 15:02:04 +0000",
                    "dc_language": "en-gb",
                    "dc_format": "text/html",
                    "dc_identifier": "http://www.bbc.co.uk/news/world-europe-27344029",
                    "media_thumbnail": [
                        {
                            "@attributes": {
                                "url": "http://news.bbcimg.co.uk/media/images/74751000/jpg/_74751301_ycst2i…"
                            }

                        }
,
                        {
                            "@attributes": {
                                "url": "http://news.bbcimg.co.uk/media/images/74751000/jpg/_74751302_ycst2i…"
                            }

                        }

                    ]

                }
,
                {
                    "title": "Harris 'assaulted daughter's friend'",
                    "link": "http://www.bbc.co.uk/news/uk-27340134#sa-ns_mchannel=rss&ns_source=…",
                    "guid": "http://www.bbc.co.uk/news/uk-27340134#sa-ns_mchannel=rss&amp;ns_sou…",
                    "description": "Rolf Harris arrives at court flanked by his wife and daughter Rolf …",
                    "content_encoded": "<!-- Embedding the video player -->&#13;\n<!-- This is the embedd…",
                    "pubDate": "Fri, 09 May 2014 15:21:52 +0000",
                    "dc_language": "en-gb",
                    "dc_format": "text/html",
                    "dc_identifier": "http://www.bbc.co.uk/news/uk-27340134",
                    "media_thumbnail": [
                        {
                            "@attributes": {
                                "url": "http://news.bbcimg.co.uk/media/images/74740000/jpg/_74740642_hi0221…"
                            }

                        }
,
                        {
                            "@attributes": {
                                "url": "http://news.bbcimg.co.uk/media/images/74740000/jpg/_74740643_hi0221…"
                            }

                        }

                    ]

                }
,
                {
                    "title": "Nigeria 'ignored' school warning",
                    "link": "http://www.bbc.co.uk/news/world-africa-27344863#sa-ns_mchannel=rss&…",
                    "guid": "http://www.bbc.co.uk/news/world-africa-27344863#sa-ns_mchannel=rss&…",
                    "description": "Nigeria's military had advance warning of the attack on a school at…",
                    "content_encoded": "<div class=\"caption full-width\">&#13;\n <img src=\"http://news.b…",
                    "pubDate": "Fri, 09 May 2014 15:48:34 +0000",
                    "dc_language": "en-gb",
                    "dc_format": "text/html",
                    "dc_identifier": "http://www.bbc.co.uk/news/world-africa-27344863",
                    "media_thumbnail": [
                        {
                            "@attributes": {
                                "url": "http://news.bbcimg.co.uk/media/images/74749000/jpg/_74749855_747495…"
                            }

                        }
,
                        {
                            "@attributes": {
                                "url": "http://news.bbcimg.co.uk/media/images/74749000/jpg/_74749856_747495…"
                            }

                        }

                    ]

                }

            ]

        }

    }

}

Note: For brevity the output above is truncated.


API Keys

To restrict access to your copy of Full-Text RSS, you can specify API keys in the config file.

Note: Full-text feeds produced by Full-Text RSS are intended to be publically accessible to work with feed readers. As such, the API key should not appear in the final URL for feeds.

Parameter Value Description
key string or number

This parameter has two functions.

If you're calling Full-Text RSS programattically, it's better to use this parameter to provide the API key index number together with the hash parameter (see below) so that the actual API key does not get sent in the HTTP request.

If you pass the actual API key in this parameter, the hash parameter is not required. If you pass the actual API key to makefulltextfeed.php, Full-Text RSS will find the index number and generate the hash value automatically and redirect to a new URL to hide the API key. If you'd like to link to a generated feed publically while protecting your API key, make sure you copy and paste the URL that results after the redirect.

If you've configured Full-Text RSS to require a key, an invalid key will result in an error message.

hash string A SHA-1 hash value of the API key (actual key, not index number) and the URL supplied in the url parameter, concatenated. This parameter must be passed along with the API key's index number using the key parameter (see above). In PHP, for example: $hash = sha1($api_key.$url);

System requirements

PHP 5.2 or above is required. The code has been tested on local, shared hosting and cloud environments. We recommend you download and run our simple compatibility test before purchasing. It's a single (zipped) PHP file you can upload to your server and access through your browser. It will tell you whether your server is capable of running Full-Text RSS.

On our help site, we have a list of recommended hosts.

Software Components

Full-Text RSS is written in PHP and relies on the following primary components:

Depending on your configuration, these secondary components may also be used:

License

AGPL logo
This web application is licensed under the AGPL version 3. (More on why this is important.)

The software components in this application are licensed as follows...


Support

Icon

Frequently Asked Questions

What is this? How does it work? How can I use it? Why is my content appearing on other sites? See our Frequently Asked Questions page for answers.

Icon

Help

Our help site contains articles to get you started, and a forum to ask question.

Icon

Email

Direct your questions to help@fivefilters.org.

Icon

Twitter

Direct your questions to @fivefilters. Why not follow us too?

Recommended articles and tweets

Follow us on Twitter for more