The new version of our Full-Text RSS application is out now. Full-Text RSS is used by news enthusiasts and software developers to extract article content from news sites and blogs, and to convert news feeds that contain only extracts to full-text feeds. This release contains a number of fixes and improvements. Here’s what’s new.
Improved HTML5 support
Previously you could request HTML5 parsing using HTML5-PHP. In this version we’ve added support for the Gumbo PHP extension, which uses Google’s fast HTML5 parser. If you set up your server using our VPS setup script (more on that below), Gumbo and Gumbo PHP will get installed automatically. If Full-Text RSS detects Gumbo, it will be used whether you request HTML5 parsing or not.
We now also allow you to request HTML5 output using the &content=html5
request parameter or editing the config file and setting $options->html5_output = true;
This shouldn’t produce anything dramatically different (and in many cases will produce output identical to what you had before), but in some cases you might find it useful. This could become the default in future versions of Full-Text RSS.
More developer-friendly output
We now include additional metadata when you call our extract.php
endpoint:
curl "http://example.com/ftr/extract.php?url=www.truthdig.com/report/item/make_america_ungovernable_2017020"
Response
{
"title": "Make America Ungovernable",
"excerpt": "By Chris Hedges Mr. Fish / Truthdig Donald Trump’s regime…",
"date": "2017-02-05T23:34:57+00:00",
"author": null,
"language": "en",
"url": "http://www.truthdig.com/report/item/make_america_ungovernable_20170205",
"effective_url": "http://www.truthdig.com/report/print/make_america_ungovernable_20170205",
"domain": "truthdig.com",
"word_count": 2346,
"og_url": "http://www.truthdig.com/report/print/make_america_ungovernable_20170205",
"og_title": "Make America Ungovernable: Chris Hedges",
"og_description": "The window to overthrow the Trump regime is rapidly closing…",
"og_image": null,
"og_type": "article",
"twitter_card": null,
"twitter_site": "@truthdig",
"twitter_creator": "@truthdig",
"twitter_image": null,
"twitter_title": "Make America Ungovernable | Truthdig: Drilling Beneath the Headlines",
"twitter_description": "The window to overthrow the Trump regime is rapidly closing…",
"content": "<h4 class="date">Posted on Feb 5, 2017</h4>…"
}
If you’re a developer interested in self-hosting the latest version of Full-Text RSS for your project, we’d love to send you a temporary URL for testing. It will contains a regular installation of Full-Text RSS running on a VPS. Email us at [email protected] with the subject ‘Try out Full-Text RSS’.
If you’d rather use our hosted service, we offer developer access through Mashape.
Improved VPS setup script
Full-Text RSS works on many hosting environments, but it works best when the server is running the components it’s been optimised for. We offer a server-initialisation script that will install all the necessary software for you on a new Ubuntu 16.04 server. For this release we’ve updated it to install PHP 7 and the latest version of the extensions we make use of. Gumbo PHP is now also installed for fast HTML5 parsing.
For instructions, please visit our hosting help page.
Let us install it
As of this release we’re also offering an installation service. If you’re not comfortable setting up a new server and installing our software, or if you’d just rather we do it, this is for you. How does it work?
- Create an account on Linode.
- Email us your username.
- We’ll send you a payment link. It currently costs 100 Euro, or 70 Euro if you’ve already purchased Full-Text RSS.
- Once payment is complete, we’ll open a ticket with Linode to transfer a working instance of Full-Text RSS to your account.
- We’ll then send you a URL where you can access it.
- From this point on, the server is managed by you in your Linode account. Linode will charge you $10 a month for running the server.
Full changelog
- Request HTML5 output using HTML5-PHP – new config option
$options->html5_output
and new request parameter&content=html5
- Improve support for lazy-loading images
- Feed preview now displays RTL content correctly (added dir=’auto’ to feed.xsl)
- New request parameter
&images=0
to remove all images from extracted content - Open Graph and Twitter card metadata now returned in JSON output (no longer in RSS output)
- Metadata now returned in extract.php even if article extraction fails
- Additional data returned in extract.php for developers: ‘domain’, ‘word_count’
- HTML5-PHP library updated
- SimplePie library updated (fixes PHP 7.1 issue)
- New VPS Puppet script (ubuntu-16.04.pp) – installs PHP 7 and Gumbo PHP extension for faster HTML5 parsing
- Bug fix: Language detection now works correctly with PHP 7
- Bug fix: Take base href URL into account when following next_page/single_page links (thanks Lukas!)
- Bug fix: VPS Puppet script installs new version of PECL HTTP extension that fixes problem when requesting punycode encoded domains
- Site config files updated for better extraction
- Compatibility test file updated (will tell you if Gumbo PHP will be used)
- Tidy won’t be used to repair HTML if using an HTML5 parser (unless explicitly requested in site config file – tidy: yes)
- New config option
$options->blocked_message
– set what a user will see when a URL is blocked by Full-Text RSS - Other fixes/improvements
Available to try and buy
Full-Text RSS 3.7 is now available to buy. If you’re an existing customer, you can download the latest version from our member page or upgrade at a discount.