Term Extraction
About
This is a free software project to enable easy term extraction through a web service. Given some text it will return a list of terms with (hopefully) the most relevant first. The list is returned in JSON format. It is a free alternative to Yahoo's Term Extraction service. It is being developed as part of the Five Filters project to promote alternative, non-corporate media.
Similar Software
Free software for term extraction:
- Maui — try online (Java, GPL)
- Kea (Java, GPL)
- Topia's Term Extraction (Python, ZPL)
- XML-RPC service using Kea (Java, New BSD)
Non-free web services for term extraction:
Olena Medelyan has more information and resources on her topic indexing blog. She is involved with the Maui and Kea projects. Joseph Turian gives a background to term extraction and links to related tools and research.
Source Code and Technologies
Source code available.
The application uses Python, Topia's Term Extraction and simplejson.
Installation and System Requirements
This code should run on most hosts with Python support. You should be able to test it on your own machine using web.py. I had it running on NearlyFreeSpeech.NET but they only offer Python access through CGI - which is quite slow. The version here is running on Google's App Engine.
I'm not a Python expert so the instruction below only show you how to download the files. Inside the term-extraction directory you'll find two sub-directories: web.py and google_app_engine. If you'd like to test on your own machine try installing web.py and running python code.py. If you'd like to host the code on Google App Engine, use the code inside google_app_engine. To download the code make sure you have the Bazaar client on your computer.
- Change to the directory where you want to place the application files
- Enter bzr co http://bazaar.launchpad.net/~keyvan/fivefilters/term-extraction term-extraction
- Look inside term-extraction and, depending on where you want to host the files, use the files in either web.py or google_app_engine
License

This web application is licensed under the AGPL version 3 (find out why). The bulk of the work, however, is carried out by libraries which are licensed as follows...
Author
Created by Keyvan Minoukadeh for the Five Filters project.
Email: fivefilters (at) fivefilters.org