Journal of Data and Information Science ›› 2022, Vol. 7 ›› Issue (3): 49-70.doi: 10.2478/jdis-2022-0015
• Research Paper • Previous Articles Next Articles
Roberto Henriques(), Adria Ferreira, Mauro Castelli
Received:
2022-03-12
Revised:
2022-06-09
Accepted:
2022-07-04
Online:
2022-07-20
Published:
2022-07-29
Contact:
Roberto Henriques
E-mail:roberto@novaims.unl.pt
Add to citation manager EndNote|Ris|BibTeX
URL: http://manu47.magtech.com.cn/Jwk3_jdis/EN/10.2478/jdis-2022-0015
http://manu47.magtech.com.cn/Jwk3_jdis/EN/Y2022/V7/I3/49
This work is licensed under the Creative Commons Attribution 4.0 International License.
Table 3.
Patent classification related studies.
Authors | Feature Engineering | Algorithm | Section | Language | Dataset size | Number of classes |
---|---|---|---|---|---|---|
(Trappey et al., | Key phrases frequency based on TF-IDF | Neural Networks | full document | English | 300 training 124 test | 9 |
(Derieux et al., | Terms extraction and semantic relation | SVM | full document | English, German, French | 985 training 2000 test | 630 |
(Trappey et al., | Key phrases frequency based on TF-IDF | Ontology-Based Neural Network | full document | English | 333 training 160 test | 23 |
(Zhang, | - | SVM | - | English | 5000 | 5 |
(Wu et al., | SOM, KPCA | SVM | full document | English | 60.000 | 7 |
(Li et al., | Skip-gram | CNN | title and abstract | English | 742.097 training 1350 test | 637 |
(Risch & Krestel, | Domain-specific FastText word embeddings | Bi-directional GRU | title and abstract | English | ~1.7M training ~300.000 test | 637 |
(Abdelgawad et al., | GloVe, Word2Vec, FastText | Hierarchical SVM and CNN with BOHB (Bayesian Optimization hyperband) | title, abstract, description, and claims | English | 75.000 training 28.926 test | 451 |
(Lee & Hsiang, | - | BERT-Base | claims | English | 1,950,247 training 150,000 test | 632 |
Table 4.
Features used in the analysis.
Feature | Description |
---|---|
id | Patent internal identification |
Title | Descriptive name of the patent |
Claims | The legal scope of the invention, including delimitations and application field |
Abstract | A brief description of the invention presented in the patent |
Section | IPC 1st level classification code |
Class | IPC 2nd level classification code |
Subclass | IPC 3rd level classification code |
Main group | IPC 4th level classification code |
Subgroup | IPC 5th level classification code |
[1] | Abdelgawad L., Kluegl, P., Genc, E., Falkner, S., & Hutter, F. (2020). Optimizing Neural Networks for Patent Classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). volume 11908 LNAI. doi:10.1007/978-3-030-46133-141. |
[2] | Aristodemou L., & Tietze, F. (2018). The state-of-the-art on Intellectual Property Analytics (IPA): A literature review on artificial intelligence, machine learning and deep learning methods for analysing intellectual property (IP) data. World Patent Information, 55, 37-51. doi:10.1016/J.WPI.2018.07.002. |
[3] | Bispo T.D., Macedo H.T., Santos F.D.O., Da Silva R.P., Matos L.N., Prado B.O., Da Silva G.J., &Guimarães A. (2019). Long short-term memory model for classification of english-PtBR cross-lingual hate speech. Journal of Computer Science, 15. doi:10.3844/jcssp.2019.1546.1571. |
[4] | Quinta de Castro P.V., Félix Felipe da Silva N., & da Silva Soares A. (2018). Portuguese Named Entity Recognition Using LSTM-CRF. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). volume 11122 LNAI. doi:10.1007/978-3-319-99722-39. |
[5] | De Castro P.V.Q., Da Silva N.F.F., & Da Silva Soares A. (2019). Contextual representations and semi-supervised named entity recognition for Portuguese language. In CEUR Workshop Proceedings. volume 2421. |
[6] | Derieux F., Bobeica M., Pois D., & Raysz J.P. (2010). Combining semantics and statistics for patent classification. In CEUR Workshop Proceedings. volume 1176. |
[7] | Devlin J., Chang M.W., Lee K., & Toutanova K. (2019). BERT:Pre-training of deep bidirectional transformers for language understanding. In NAACL HLT 2019-2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies—Proceedings of the Conference. volume 1. |
[8] | Espacenet (2021). Espacenet Patent search. URL: https://lp.espacenet.com/?locale=pt_LP. |
[9] | Feldman R., & Sanger J. (2006). The Text Mining Handbook. doi:10.1017/cbo9780511546914. |
[10] | Gomez J.C., & Moens, M.F. (2014). A survey of automated hierarchical classification of patents. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8830. doi:10.1007/978-3-319-12511-4. |
[11] | Gonçalves T., Silva C., Quaresma P., & Vieira R. (2006). Analysing part-ofspeech for Portuguese text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).volume 3878 LNCS. |
[12] | Hu J., Li, S.B., Hu, J.J., & Yang, G.C. (2018). A hierarchical feature extraction model for multi-label mechanical patent classification. Sustainability (Switzerland), 10. doi:10.3390/su10010219. |
[13] | Instituto Nacional da Propriedade Intelectual (2018). C´odigo da Propriedade Industrial. URL: https://inpi.justica.gov.pt/Portals/6/PDF%20INPI/Legisla%C3%A7%C3%A3o%20e%20outros%20documentos/CPI%20-%202018.pdf?ver=2019-06-28-153157-733. |
[14] | IP5 (2019). IP5 Statistics Report 2018 Edition. URL: https://www.fiveipoffices.org/statistics/statisticsreports/2019edition |
[15] | Kowsari K., Meimandi K.J., Heidarysafa M., Mendu S., Barnes L., & Brown D. (2019). Text classification algorithms: A survey. doi:10.3390/info10040150. |
[16] | Krestel R., Chikkamath R., Hewel C., & Risch J. (2021). A survey on deep learning for patent analysis. World Patent Information, 65, 102035. |
[17] | Lai K., & Wu S.J. (2005). Using the patent co-citation approach to establish a new patent classification system. Information Processing and Management, 41(2), 313-330 |
[18] | Lee J.S., & Hsiang J. (2020). Patent classification by fine-tuning BERT language model. World Patent Information, 61. doi:10.1016/j.wpi.2020.101965. |
[19] | Li S.B., Hu J., Cui Y.X., & Hu J.J. (2018). DeepPatent: patent classification with convolutional neural networks and word embedding. Scientometrics, 117. doi:10.1007/s11192-018-2905-5. |
[20] | Liddy E.D.(2001). Natural Language Processing. In Encyclopedia of Library and Information Science. Encyclopedia of Library and Information Science. |
[21] | Manning C.D., Raghavan P., & Schutze H. (2008). Introduction to Information Retrieval. doi:10.1017/cbo9780511809071. |
[22] | Pan S.J., & Yang Q. (2010). A survey on transfer learning. doi:10.1109/TKDE.2009.191. |
[23] | Peters M.E., Neumann M., Iyyer M., Gardner M., Clark C., Lee K., & Zettlemoyer L. (2018). Deep contextualized word representations. In NAACL HLT 2018—2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference. volume 1. doi:10.18653/v1/n18-1202. |
[24] | Risch J., & Krestel R. (2019). Domain-specific word embeddings for patent classification. Data Technologies and Applications, 53. doi:10.1108/DTA-01-2019-0002. |
[25] | Rodrigues R.C., Rodrigues J., de Castro P.V.Q., da Silva N.F.F., & Soares A. (2020). Portuguese language models and word embeddings: Evaluating on semantic similarity tasks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). volume 12037 LNAI. doi:10.1007/978-3-030-41505-123. |
[26] | dos Santos C., & Guimarães, V. (2015). Boosting Named Entity Recognition with Neural Character Embeddings. doi:10.18653/v1/w15-3904. |
[27] | Silva C., & Ribeiro, B.(2010).Inductive Inference for Large Scale Text Classification: Kernel Approaches and Techniques. volume 255.doi: 1007/978-3-642-04533-2. |
[28] | Souza F., Nogueira R., & Lotufo R. (2019). Portuguese Named Entity Recognition using BERT-CRF. arXiv. URL: https://arxiv.org/abs/1909.10649v2. |
[29] | Trappey A.J., Hsu F C., Trappey C.V., & Lin C.I. (2006). Development of a patent document classification and search platform using a back-propagation network. Expert Systems with Applications, 31. doi:10.1016/j.eswa.2006.01.013. |
[30] | Trappey A.J., Trappey C.V., Chiang T.A., & Huang Y.H. (2013). Ontology-based neural network for patent knowledge management in design collaboration. International Journal of Production Research, 51. doi:10.1080/00207543.2012.701775. |
[31] | Trappey A.J.C., Trappey C.V., Wu C.-Y., & Lin C.-W. (2012). A patent quality analysis for innovative technology and product development. Advanced Engineering Informatics, 26, 26-34. doi:10.1016/j.aei.2011.06.005. |
[32] | Wagner Filho J.A., Wilkens R., Idiart M., & Villavicencio A. (2019). The BRWAC corpus: A new open resource for Brazilian Portuguese. In LREC 2018—11th International Conference on Language Resources and Evaluation. |
[33] | World Intellectual Property Organization (2008). WIPO Intellectual Property Handbook: Policy, Law and Use. https://www.wipo.int/edocs/pubdocs/en/intproperty/489/wipo_pub_489.pdf. |
[34] | Wu J.L., Chang P.C., Tsao C.C., & Fan C.Y.(2016). A patent quality analysis and classification system using self-organizing maps with support vector machine. Applied Soft Computing Journal, 41. doi:10.1016/j.asoc.2016.01.020. |
[35] | Zhang X.Y.(2014). Interactive patent classification based on multi-classifier fusion and active learning. Neurocomputing, 127. doi:10.1016/j.neucom.2013.08.013. |
[36] | Zhuang F.Z., Qi Z.Y., Duan K.Y., Xi D.B., Zhu Y.C., Zhu H.S., Xiong H., & He Q.(2021). A comprehensive survey on transfer learning, in Proceedings of the IEEE, 109(1), Jan. 2021. doi:10.1109/JPROC.2020.3004555. |
No related articles found! |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||