license: apache-2.0
datasets:
- synapsecai/synthetic-sensitive-information
language:
- en
metrics:
- accuracy
pipeline_tag: text-classification
Model Information
model_name = "NousResearch/Llama-2-7b-chat-hf"
dataset_name = "synapsecai/synthetic-sensitive-information"
QLoRA parameters
lora_r = 32
lora_alpha = 8
lora_dropout = 0.1
BitsAndBytes parameters
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
Training Arguments parameters
num_train_epochs = 1
fp16 = False
bf16 = False
per_device_train_batch_size = 32
per_device_eval_batch_size = 8
gradient_accumulation_steps = 4
gradient_checkpointing = True
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "cosine"
max_steps = -1
warmup_ratio = 0.03
group_by_length = True
save_steps = 0
logging_steps = 25
SFT parameters
max_seq_length = None
packing = False This model is an ethically fine-tuned version of Llama 2, specifically trained to detect and flag private or sensitive information within natural text. It serves as a powerful tool for data privacy and security, capable of identifying potentially vulnerable data such as:
API keys Personally Identifiable Information (PII) Financial data Confidential business information Login credentials
Key Features:
Analyzes natural language input to identify sensitive content Provides explanations for detected sensitive information Helps prevent accidental exposure of private data Supports responsible data handling practices
Use Cases:
Content moderation Data loss prevention Compliance checks for GDPR, HIPAA, etc. Security audits of text-based communications
This model aims to enhance data protection measures and promote ethical handling of sensitive information in various applications and industries.