I have a dataset of (token, Part-of-Speech Tag, OBI label). What are the options of training a multiclass classifier to OBI labeling (Name Entity Recognition) using the structure of neural network? It's a sequence labeling task so context must be taken into account. Is there a faster way than BERT in common practice?
# OBT Format
DATA = [ [('I','PRP','O'), ('had','VBD','O'), ('pills','NNS','O')],
[('Microsoft', 'NNP', 'B-COMPANY'), ('and', 'CC', 'O'), ('BillGates', 'NNP','B-PERSON')],
[('Cortisone', 'NNP', 'B-DRUG'), ('shot', 'NN', 'I-DRUG'), ('hurts', 'NNS', 'O')]
]
BTW, what if the data is in HTML format?
# HTML Format
I had pills. <COMPANY>Microsoft</COMPANY> and <PERSON>BillGates</PERSON>. <DRUG>Cortisone shot</DRUG> hurts.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…