April 2021 Update: Reuters have redesigned their homepage so if you’d like to follow the steps in this guide where we examine the HTML in the browser, you should load an older version via the Internet Archive. Older versions will have the same HTML structure shown in this guide.
Our Feed Creator application can turn a webpage into an RSS feed. It’s useful for sites that don’t offer their own feeds. In the last post we noted that Reuters killed off their official RSS feeds last year, and we provided alternative feeds made with our Feed Creator application.
What’s an RSS feed?
RSS feeds typically contain the most recent items associated with a resource. An RSS feed provided by a news site will contain its most recent news items. RSS feeds conform to a standardised, machine-readable XML-based format, allowing them to be read by and integrated into many different systems.
Feed Creator works by converting a set of items on a webpage into a standard RSS feed. The items don’t have to be news stories: they can be search results, job listings, blog posts, podcast episodes, anything really. At a minimum, a feed item should contain either a title or a description.
Use the generated RSS feed to monitor the page for new items, and integrate it with other applications and services that read feeds.
Some examples of what you can do using the services above once you’ve generated a feed with Feed Creator:
- Expand the feed using our Full-Text RSS service to include the full article content
- Subscribe to it in Feedly to stay up to date with new entries
- Post new items automatically to Facebook, Twitter or LinkedIn via Zapier, IFTTT or Integromat
- Share new items automatically with teammates by email or Slack
- Add new items automatically to a spreadsheet in Excel, Google Sheets, or Airtable
- Receive webhooks when new items are detected with our Feed Control application
Generating a feed from a webpage
In this post we’re going to show you how to create a feed, step by step. We’ll use Reuters as our source page, but the technique can be applied to any site.
Short on time?
If you’d rather have us create a feed for you, please submit a custom feed request.
What you’ll need
- Some basic knowledge of HTML
- The webpage address (URL) of the source page you want to create a feed from
- Our Feed Creator application (we offer a free, hosted service to get started, no signup required)
- Your browser’s developer tools to inspect the source page’s HTML (we’ll use Firefox’s Developer tools in this guide, but Chrome will be very similar)
Step 1: Load the source page and Feed Creator in two separate tabs
We’ll be switching between the source page and Feed Creator in the steps below, so we recommend you open them in two tabs (or have the windows side-by-side).
Tab 1: Reuters home page – reuters.com
Tab 2: Feed Creator – createfeed.fivefilters.org
Step 2: [Source page] Identify the items that should be used in the feed
In this example we’re using the Reuters front page, and the areas we’ve marked in red rectangles contain the items of interest.
Step 3: [Feed Creator] Enter the source page URL
Now switch to the Feed Creator tab and enter the Reuters URL in the field labeled ‘Enter web page URL’.
At this point if you click ‘Preview’, Feed Creator will fetch the page and extract the first set of links it finds. We don’t want these, so let’s instruct Feed Creator to use the links we’re interested in.
Feed Creator lets you use simple selectors or more flexible CSS selectors. We cover the simple mode in this post and in the next post we’ll cover advanced selectors.
Step 4: [Source page] Inspect desired item elements to identify attributes
Now let’s jump back to our source page and examine the HTML markup of our desired elements.
Move your mouse over one of the elements and right-click and choose ‘Inspect Element’ in Firefox (‘Inspect’ in Chrome). You’ll now see the item’s underlying HTML markup.
Feed Creator in simple selector mode uses link elements to construct feed items. The link URL becomes the feed item URL, and the link title becomes the item title. In HTML, these are marked up as follows:
<a href="[Link URL]">[Link title]</a>
But web pages typically contain many such links, for example as part of navigation menus, sidebars, footers. We don’t want all these links to end up in the feed, so we want to examine the HTML of our desired items to find an attribute value shared only by those items. Feed Creator can use this attribute value to extract links only from the desired items.
HTML documents often use class attributes to tag elements of the same type. In the screenshot above the class attribute value “story-content” serves that purpose.
Step 5: [Source page] Ensure chosen attribute value is common to all desired elements
We want to make sure that the other news items we want to include in the feed also have this attribute value.
Repeat step 4 (right-click and then ‘Inspect element’) on the other news items, and you’ll see that they have the same “story-content” class attribute value. Great so far.
Does “story-content” also appear on the top story item, which is presented differently to the other items?
It does. So we’ve found a class attribute value that’s common to the links we’re interested in, now let’s give it to Feed Creator.
Javascript-generated elements
At the moment Feed Creator only works with HTML elements that are returned by server in its initial response. Some sites rely on Javascript to construct elements and sometimes pull in the desired items via additional requests after the page has loaded in your browser. When you inspect elements using your browser’s developer tools, you’re seeing the final result after Javascript execution. This might not be what Feed Creator sees when it processes the page.
The easiest way to make sure you’re not using attributes that Feed Creator cannot see is to disable Javascript in your browser temporarily, reload the source page, and then inspect elements using your browser’s developer tools.
Step 6: [Feed Creator] Extract links from elements with a particular class attribute value
In Feed Creator, find the field labeled “Get links inside HTML elements with this id or class value” and enter “story-content”. Click the Preview button to see the results.
You can see that with only two pieces of input (source page URL and ‘story-content’), Feed Creator is able to produce a usable feed for the Reuters site.
Step 7: Removing elements
Notice, however, that in the image above, two items related to the top story also appear:
- U.S. says had serious talks despite ‘theatrics’
- U.S., China spar over racism at U.N. meeting
We hadn’t marked these as items of interest in Step 2, so let’s tell Feed Creator to exclude them from the feed output.
Using the method described in step 4, inspect the HTML elements for these two items.
You’ll notice that they appear inside the element marked “story-content”, presented as a bulleted list (<ul>
element in HTML).
Using Feed Creator’s cleanup feature, we can tell it to remove all <ul>
elements from the page, to ensure the links inside these elements aren’t extracted.
To do this, toggle the ‘Enable cleanup’ switch and enter ‘ul’ in the field labeled ‘Source HTML: Remove elements (CSS)’.
That’s it. Click Preview again and you should see those elements now no longer appear in the results list.
Done!
You can now use the buttons Feed Creator provides in the Result column to use your generated RSS feed in other applications.
The RSS feed button will load the feed in your browser or prompt you to open it in a supporting application (if you have one installed). You can copy the generated feed URL by right-clicking this button and choosing ‘Copy link location’.
The Subscribe button will open a panel with a list of feed readers. If you see one you use, click its name and we’ll pass the generated feed into the feed reader so you can subscribe to it and be notified of new items.
The Service shortcuts button opens a panel with shortcuts to some of our other applications that can take a feed as input. You can choose ‘RSS with full text’, for example, to have the generated feed passed to our Full-Text RSS application which will expand the feed by pulling in the article content for each item.
That’s it for now. To recap, we used Feed Creator to turn a webpage into an RSS feed by extracting elements from the source page (Reuters in this example).
In Part 2 we cover advanced selectors, where you’ll see how we can be much more specific in selecting items from the source page. We’ll also cover how to include item dates, summaries and images in the feed output.
Discuss
Please share any feedback on our forum.