Automatic Classification

Posted by

Automatic Classifier IPTC

Automatic classification of documents using standard IPTC (The International Press Telecommunications Council – the Global Standards Body of the News Media) taxonomy. Classification based on custom taxonomies (patents, cyber security, military intelligence or others) can be created on demand.

Taxonomy

The out-of-the-box IPTC classifier is trained to classify documents per these top-level IPTC classes. 

IDLabelIPTC Link
01000000arts, culture and entertainmentMatters pertaining to the advancement and refinement of the human mind, of interests, skills, tastes and emotions.http://cv.iptc.org/newscodes/subjectcode/01000000
02000000crime, law and justiceEstablishment and/or statement of the rules of behaviour in society, the enforcement of these rules, breaches of the rules and the punishment of offenders. Organizations and bodies involved in these activities.http://cv.iptc.org/newscodes/subjectcode/02000000
03000000disaster and accidentMan made and natural events resulting in loss of life or injury to living creatures and/or damage to inanimate objects or property.http://cv.iptc.org/newscodes/subjectcode/03000000
04000000economy, business and financeAll matters concerning the planning, production and exchange of wealth.http://cv.iptc.org/newscodes/subjectcode/04000000
05000000educationAll aspects of furthering knowledge of human individuals from birth to death.http://cv.iptc.org/newscodes/subjectcode/05000000
06000000environmental issueAll aspects of protection, damage, and condition of the ecosystem of the planet earth and its surroundings.http://cv.iptc.org/newscodes/subjectcode/06000000
07000000healthAll aspects pertaining to the physical and mental welfare of human beings.http://cv.iptc.org/newscodes/subjectcode/07000000
08000000human interestLighter items about individuals, groups, animals or objects.http://cv.iptc.org/newscodes/subjectcode/08000000
09000000labourSocial aspects, organizations, rules and conditions affecting the employment of human effort for the generation of wealth or provision of services and the economic support of the unemployed.http://cv.iptc.org/newscodes/subjectcode/09000000
10000000lifestyle and leisureActivities undertaken for pleasure, relaxation or recreation outside paid employment, including eating and travel.http://cv.iptc.org/newscodes/subjectcode/10000000
11000000politicsLocal, regional, national and international exercise of power, or struggle for power, and the relationships between governing bodies and states.http://cv.iptc.org/newscodes/subjectcode/11000000
12000000religion and beliefAll aspects of human existence involving theology, philosophy, ethics and spirituality.http://cv.iptc.org/newscodes/subjectcode/12000000
13000000science and technologyAll aspects pertaining to human understanding of nature and the physical world and the development and application of this knowledgehttp://cv.iptc.org/newscodes/subjectcode/13000000
14000000social issueAspects of the behaviour of humans affecting the quality of life.http://cv.iptc.org/newscodes/subjectcode/14000000
15000000sportCompetitive exercise involving physical effort. Organizations and bodies involved in these activities.http://cv.iptc.org/newscodes/subjectcode/15000000
16000000unrest, conflicts and warActs of socially or politically motivated protest and/or violence.http://cv.iptc.org/newscodes/subjectcode/16000000
17000000weatherThe study, reporting and prediction of meteorological phenomena.http://cv.iptc.org/newscodes/subjectcode/17000000

Supported Languages

Currently supporting the following languages (15): Arabic, Chinese, English, Farsi, French, German, Hebrew, Hungarian, Italian, Polish, Portuguese, Romanian, Russian, Spanish, Ukrainian, Turkish.

Usage Example

This is an example of calling the IPTC Classifier on an English text using curl:

curl -X POST "http://localhost:8989/rest/process" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"content\": \"Simona Halep qualified for the Australian Open quarterfinals on Sunday, February 14, after a three-set win against Iga Swiatek. In the quarterfina the Romanian will play against Serena Williams. It will be their first meeting since Halep defeating Williams in the 2019 Wimbledon final.\", \"language\": \"eng\"}"

Calling the Automatic Classifier IPTC as above will generate the simple JSON response below:

{
  "categories": [
    {
      "label": "sport",
      "score": 0.9999985694885254
    }
  ]
}

2 responses