Update README.md
Browse files
README.md
CHANGED
@@ -28,7 +28,7 @@ OneKE is a new bilingual knowledge extraction large model developed jointly by Z
|
|
28 |
|
29 |
|
30 |
## How is OneKE trained?
|
31 |
-
OneKE mainly focuses on schema-generalizable information extraction. Due to issues such as non-standard formats, noisy data, and lack of diversity in existing extraction instruction data, OneKE adopted techniques such as normalization and cleaning of extraction instructions, difficult negative sample collection, and schema-based
|
32 |
|
33 |
The zero-shot generalization comparison results of OneKE with other large models are as follows:
|
34 |
* `NER-en`: CrossNER_AI, CrossNER_literature, CrossNER_music, CrossNER_politics, CrossNER_science
|
@@ -268,7 +268,7 @@ split_num_mapper = {
|
|
268 |
```
|
269 |
|
270 |
|
271 |
-
Since predicting all schemas in the label set at once is too challenging and not easily scalable, OneKE uses a
|
272 |
|
273 |
**schema格式**:
|
274 |
|
@@ -281,7 +281,7 @@ EEA: [{"event_type": "Finance/Trading - Interest Rate Hike", "arguments": ["Time
|
|
281 |
```
|
282 |
|
283 |
|
284 |
-
Below is a simple
|
285 |
|
286 |
```python
|
287 |
def get_instruction(language, task, schema, input):
|
@@ -359,7 +359,7 @@ for split_schema in split_schemas:
|
|
359 |
|
360 |
|
361 |
<details>
|
362 |
-
<summary><b>Event Extraction (EE)
|
363 |
|
364 |
```json
|
365 |
{
|
@@ -407,7 +407,7 @@ for split_schema in split_schemas:
|
|
407 |
|
408 |
|
409 |
<details>
|
410 |
-
<summary><b>Knowledge Graph Construction (KGC)
|
411 |
|
412 |
```json
|
413 |
{
|
|
|
28 |
|
29 |
|
30 |
## How is OneKE trained?
|
31 |
+
OneKE mainly focuses on schema-generalizable information extraction. Due to issues such as non-standard formats, noisy data, and lack of diversity in existing extraction instruction data, OneKE adopted techniques such as normalization and cleaning of extraction instructions, difficult negative sample collection, and schema-based batched instruction construction, as shown in the illustration. For more detailed information, refer to the paper "[IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus](https://arxiv.org/abs/2402.14710) [[Github](https://github.com/zjunlp/IEPile)]".
|
32 |
|
33 |
The zero-shot generalization comparison results of OneKE with other large models are as follows:
|
34 |
* `NER-en`: CrossNER_AI, CrossNER_literature, CrossNER_music, CrossNER_politics, CrossNER_science
|
|
|
268 |
```
|
269 |
|
270 |
|
271 |
+
Since predicting all schemas in the label set at once is too challenging and not easily scalable, OneKE uses a batched approach during training. It divides the number of schemas asked in the instructions, querying a fixed number of schemas at a time. Hence, if the label set of a piece of data is too long, it will be split into multiple instructions that the model will address in turns.
|
272 |
|
273 |
**schema格式**:
|
274 |
|
|
|
281 |
```
|
282 |
|
283 |
|
284 |
+
Below is a simple Batched Instruction Generation script:
|
285 |
|
286 |
```python
|
287 |
def get_instruction(language, task, schema, input):
|
|
|
359 |
|
360 |
|
361 |
<details>
|
362 |
+
<summary><b>Event Extraction (EE) Description Instructions</b></summary>
|
363 |
|
364 |
```json
|
365 |
{
|
|
|
407 |
|
408 |
|
409 |
<details>
|
410 |
+
<summary><b>Knowledge Graph Construction (KGC) Description Instructions</b></summary>
|
411 |
|
412 |
```json
|
413 |
{
|