zjunlp
/

OneKE

@@ -28,7 +28,7 @@ OneKE is a new bilingual knowledge extraction large model developed jointly by Z
 ## How is OneKE trained?
-OneKE mainly focuses on schema-generalizable information extraction. Due to issues such as non-standard formats, noisy data, and lack of diversity in existing extraction instruction data, OneKE adopted techniques such as normalization and cleaning of extraction instructions, difficult negative sample collection, and schema-based polling instruction construction, as shown in the illustration. For more detailed information, refer to the paper "[IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus](https://arxiv.org/abs/2402.14710) [[Github](https://github.com/zjunlp/IEPile)]".
 The zero-shot generalization comparison results of OneKE with other large models are as follows:
 * `NER-en`: CrossNER_AI, CrossNER_literature, CrossNER_music, CrossNER_politics, CrossNER_science
@@ -268,7 +268,7 @@ split_num_mapper = {
 ```
-Since predicting all schemas in the label set at once is too challenging and not easily scalable, OneKE uses a polling approach during training. It divides the number of schemas asked in the instructions, querying a fixed number of schemas at a time. Hence, if the label set of a piece of data is too long, it will be split into multiple instructions that the model will address in turns.
 **schema格式**:
@@ -281,7 +281,7 @@ EEA: [{"event_type": "Finance/Trading - Interest Rate Hike", "arguments": ["Time
 ```
-Below is a simple Polling Instruction Generation script:
 ```python
 def get_instruction(language, task, schema, input):
@@ -359,7 +359,7 @@ for split_schema in split_schemas:
 <details>
-  <summary><b>Event Extraction (EE) Explanation Instructions</b></summary>
 ```json
 {
@@ -407,7 +407,7 @@ for split_schema in split_schemas:
 <details>
-  <summary><b>Knowledge Graph Construction (KGC) Explanation Instructions</b></summary>
 ```json
 {

 ## How is OneKE trained?
+OneKE mainly focuses on schema-generalizable information extraction. Due to issues such as non-standard formats, noisy data, and lack of diversity in existing extraction instruction data, OneKE adopted techniques such as normalization and cleaning of extraction instructions, difficult negative sample collection, and schema-based batched instruction construction, as shown in the illustration. For more detailed information, refer to the paper "[IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus](https://arxiv.org/abs/2402.14710) [[Github](https://github.com/zjunlp/IEPile)]".
 The zero-shot generalization comparison results of OneKE with other large models are as follows:
 * `NER-en`: CrossNER_AI, CrossNER_literature, CrossNER_music, CrossNER_politics, CrossNER_science
 ```
+Since predicting all schemas in the label set at once is too challenging and not easily scalable, OneKE uses a batched approach during training. It divides the number of schemas asked in the instructions, querying a fixed number of schemas at a time. Hence, if the label set of a piece of data is too long, it will be split into multiple instructions that the model will address in turns.
 **schema格式**:
 ```
+Below is a simple Batched Instruction Generation script:
 ```python
 def get_instruction(language, task, schema, input):
 <details>
+  <summary><b>Event Extraction (EE) Description Instructions</b></summary>
 ```json
 {
 <details>
+  <summary><b>Knowledge Graph Construction (KGC) Description Instructions</b></summary>
 ```json
 {