InfoChem offers additional modules for ICANNOTATOR, enabling extraction of different entities in multiple languages.


We offer two language extension packs, the first one supports German, French and Russian texts. The second one supports Chinese, Japanese and Korean texts. Like the core module, these modules combine an algorithmic and a dictionary approach. The quality of these language packs has been checked by native speakers and resulted in an F-score between 0.8 and 0.9.


With this extension pack, you can extract inorganics, metal organics (formulas and names) and polymers. It is often not possible to create a clear structure for these substances. In these cases we offer various output formats, e.g. the structure of monomers in the case of polymers, and molecular formulas or element systems for inorganic substances.


This module extracts genes, proteins and disease names. It uses large dictionaries that mainly source from GenBankUniprot and MeSH. For ambiguous protein names, we use a machine learning approach to detect false positives.