Automatic Classification

Posted by

Automatic Classifier IPTC

Automatic classification of documents using standard IPTC (The International Press Telecommunications Council – the Global Standards Body of the News Media) taxonomy. Classification based on custom taxonomies (patents, cyber security, military intelligence or others) can be created on demand.


The out-of-the-box IPTC classifier is trained to classify documents per these top-level IPTC classes. 

IDLabelIPTC Link
01000000arts, culture and entertainmentMatters pertaining to the advancement and refinement of the human mind, of interests, skills, tastes and emotions.
02000000crime, law and justiceEstablishment and/or statement of the rules of behaviour in society, the enforcement of these rules, breaches of the rules and the punishment of offenders. Organizations and bodies involved in these activities.
03000000disaster and accidentMan made and natural events resulting in loss of life or injury to living creatures and/or damage to inanimate objects or property.
04000000economy, business and financeAll matters concerning the planning, production and exchange of wealth.
05000000educationAll aspects of furthering knowledge of human individuals from birth to death.
06000000environmental issueAll aspects of protection, damage, and condition of the ecosystem of the planet earth and its surroundings.
07000000healthAll aspects pertaining to the physical and mental welfare of human beings.
08000000human interestLighter items about individuals, groups, animals or objects.
09000000labourSocial aspects, organizations, rules and conditions affecting the employment of human effort for the generation of wealth or provision of services and the economic support of the unemployed.
10000000lifestyle and leisureActivities undertaken for pleasure, relaxation or recreation outside paid employment, including eating and travel.
11000000politicsLocal, regional, national and international exercise of power, or struggle for power, and the relationships between governing bodies and states.
12000000religion and beliefAll aspects of human existence involving theology, philosophy, ethics and spirituality.
13000000science and technologyAll aspects pertaining to human understanding of nature and the physical world and the development and application of this knowledge
14000000social issueAspects of the behaviour of humans affecting the quality of life.
15000000sportCompetitive exercise involving physical effort. Organizations and bodies involved in these activities.
16000000unrest, conflicts and warActs of socially or politically motivated protest and/or violence.
17000000weatherThe study, reporting and prediction of meteorological phenomena.

Supported Languages

Currently supporting the following languages (15): Arabic, Chinese, English, Farsi, French, German, Hebrew, Hungarian, Italian, Polish, Portuguese, Romanian, Russian, Spanish, Ukrainian, Turkish.

Usage Example

This is an example of calling the IPTC Classifier on an English text using curl:

curl -X POST "http://localhost:8989/rest/process" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"content\": \"Simona Halep qualified for the Australian Open quarterfinals on Sunday, February 14, after a three-set win against Iga Swiatek. In the quarterfina the Romanian will play against Serena Williams. It will be their first meeting since Halep defeating Williams in the 2019 Wimbledon final.\", \"language\": \"eng\"}"

Calling the Automatic Classifier IPTC as above will generate the simple JSON response below:

  "categories": [
      "label": "sport",
      "score": 0.9999985694885254

2 responses