LeroyDyer commited on
Commit
d6f63e5
1 Parent(s): 2f8e5a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +771 -3
README.md CHANGED
@@ -1,3 +1,771 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - LeroyDyer/LCARS_TOP_SCORE
5
+ - LeroyDyer/Mixtral_AI_Cyber_Matrix_2_0
6
+ - LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
7
+ - LeroyDyer/LCARS_AI_StarTrek_Computer
8
+ - LeroyDyer/_Spydaz_Web_AI_ActionQA_Project
9
+ - LeroyDyer/_Spydaz_Web_AI_ChatML_512K_Project
10
+ - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project_UltraFineTuned
11
+ - LeroyDyer/SpyazWeb_AI_DeepMind_Project
12
+ - LeroyDyer/SpydazWeb_AI_Swahili_Project
13
+ - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project
14
+ - LeroyDyer/_Spydaz_Web_AI_MistralStar_001_Project
15
+ - LeroyDyer/QuietStar_Project
16
+ - LeroyDyer/Mixtral_BioMedical_7b
17
+ - LeroyDyer/Mixtral_AI_CyberTron_Coder
18
+ - LeroyDyer/_Spydaz_Web_AI_BIBLE_002
19
+ - LeroyDyer/_Spydaz_Web_AI_ChatQA_Reasoning101_Project
20
+ language:
21
+ - en
22
+ - sw
23
+ - ig
24
+ - so
25
+ - es
26
+ - ca
27
+ - xh
28
+ - zu
29
+ - ha
30
+ - tw
31
+ - af
32
+ - hi
33
+ - bm
34
+ - su
35
+ datasets:
36
+ - gretelai/synthetic_text_to_sql
37
+ - HuggingFaceTB/cosmopedia
38
+ - teknium/OpenHermes-2.5
39
+ - Open-Orca/SlimOrca
40
+ - Severian/Internal-Knowledge-Map
41
+ - Open-Orca/OpenOrca
42
+ - cognitivecomputations/dolphin-coder
43
+ - databricks/databricks-dolly-15k
44
+ - yahma/alpaca-cleaned
45
+ - uonlp/CulturaX
46
+ - mwitiderrick/SwahiliPlatypus
47
+ - NexusAI-tddi/OpenOrca-tr-1-million-sharegpt
48
+ - Vezora/Open-Critic-GPT
49
+ - verifiers-for-code/deepseek_plans_test
50
+ - meta-math/MetaMathQA
51
+ - KbsdJames/Omni-MATH
52
+ - swahili
53
+ - Rogendo/English-Swahili-Sentence-Pairs
54
+ - ise-uiuc/Magicoder-Evol-Instruct-110K
55
+ - meta-math/MetaMathQA
56
+ - abacusai/ARC_DPO_FewShot
57
+ - abacusai/MetaMath_DPO_FewShot
58
+ - abacusai/HellaSwag_DPO_FewShot
59
+ - HaltiaAI/Her-The-Movie-Samantha-and-Theodore-Dataset
60
+ - HuggingFaceFW/fineweb
61
+ - occiglot/occiglot-fineweb-v0.5
62
+ - omi-health/medical-dialogue-to-soap-summary
63
+ - keivalya/MedQuad-MedicalQnADataset
64
+ - ruslanmv/ai-medical-dataset
65
+ - Shekswess/medical_llama3_instruct_dataset_short
66
+ - ShenRuililin/MedicalQnA
67
+ - virattt/financial-qa-10K
68
+ - PatronusAI/financebench
69
+ - takala/financial_phrasebank
70
+ - Replete-AI/code_bagel
71
+ - athirdpath/DPO_Pairs-Roleplay-Alpaca-NSFW
72
+ - IlyaGusev/gpt_roleplay_realm
73
+ - rickRossie/bluemoon_roleplay_chat_data_300k_messages
74
+ - jtatman/hypnosis_dataset
75
+ - Hypersniper/philosophy_dialogue
76
+ - Locutusque/function-calling-chatml
77
+ - bible-nlp/biblenlp-corpus
78
+ - DatadudeDev/Bible
79
+ - Helsinki-NLP/bible_para
80
+ - HausaNLP/AfriSenti-Twitter
81
+ - aixsatoshi/Chat-with-cosmopedia
82
+ - xz56/react-llama
83
+ - BeIR/hotpotqa
84
+ - YBXL/medical_book_train_filtered
85
+ - SkunkworksAI/reasoning-0.01
86
+ - THUDM/LongWriter-6k
87
+ - WhiteRabbitNeo/WRN-Chapter-1
88
+ - WhiteRabbitNeo/Code-Functions-Level-Cyber
89
+ - WhiteRabbitNeo/Code-Functions-Level-General
90
+ tags:
91
+ - mergekit
92
+ - merge
93
+ - Mistral_Star
94
+ - Mistral_Quiet
95
+ - Mistral
96
+ - Mixtral
97
+ - Question-Answer
98
+ - Token-Classification
99
+ - Sequence-Classification
100
+ - SpydazWeb-AI
101
+ - chemistry
102
+ - biology
103
+ - legal
104
+ - code
105
+ - climate
106
+ - medical
107
+ - LCARS_AI_StarTrek_Computer
108
+ - text-generation-inference
109
+ - chain-of-thought
110
+ - tree-of-knowledge
111
+ - forest-of-thoughts
112
+ - visual-spacial-sketchpad
113
+ - alpha-mind
114
+ - knowledge-graph
115
+ - entity-detection
116
+ - encyclopedia
117
+ - wikipedia
118
+ - stack-exchange
119
+ - Reddit
120
+ - Cyber-series
121
+ - MegaMind
122
+ - Cybertron
123
+ - SpydazWeb
124
+ - Spydaz
125
+ - LCARS
126
+ - star-trek
127
+ - mega-transformers
128
+ - Mulit-Mega-Merge
129
+ - Multi-Lingual
130
+ - Afro-Centric
131
+ - African-Model
132
+ - Ancient-One
133
+ ---
134
+
135
+
136
+ # "Success comes from defining each task in achievable steps. Every completed step is a success that brings you closer to your goal. If your steps are unreachable, failure is inevitable. Winners create more winners, while losers do the opposite. Success is a game of winners!"
137
+
138
+ — # Leroy Dyer (1972-Present)
139
+ <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/65d883893a52cd9bcd8ab7cf/tRsCJlHNZo1D02kBTmfy9.jpeg" width="300"/>
140
+
141
+
142
+
143
+
144
+
145
+
146
+
147
+
148
+
149
+
150
+
151
+
152
+
153
+
154
+
155
+
156
+
157
+
158
+
159
+
160
+
161
+
162
+
163
+
164
+
165
+
166
+
167
+
168
+ ## Basic Training Reginmes:
169
+ * Alpaca
170
+ * ChatML / OpenAI / MistralAI
171
+ * Text Generation
172
+ * Question/Answer (Chat)
173
+ * Planner
174
+ * Instruction/Input/Response (instruct)
175
+ * Mistral Standard Prompt
176
+ * Translation Tasks
177
+ * Entitys / Topic detection
178
+ * Book recall
179
+ * Coding challenges, Code Feedback, Code Sumarization, Commenting Code, code planning and explanation: Software generation tasks
180
+ * Agent Ranking and response anyalisis
181
+ * Medical tasks
182
+ * PubMed
183
+ * Diagnosis
184
+ * Psychaitry
185
+ * Counselling
186
+ * Life Coaching
187
+ * Note taking
188
+ * Medical smiles
189
+ * Medical Reporting
190
+ * Virtual laboritys simulations
191
+ * Chain of thoughts methods
192
+ * One shot / Multi shot prompting tasks
193
+ * Chain of thoughts
194
+ * step by step planning
195
+ * tree of thoughts
196
+ * forest of thoughts
197
+ * graph of thoughts
198
+ * agent generation : Voting, ranking, ... dual agent response generation:
199
+
200
+
201
+
202
+
203
+ # Text - Audio - Vision :
204
+
205
+ Using base64 as an encoding medium the models were traind using images converted to base64 :
206
+
207
+ questions asked and captions returns as well as generating images based on captions given and base64 returned :
208
+
209
+ This was applied to images as well as audio , by utilizing mel spectrographic images as well as audio images !
210
+
211
+ by convereting the audio to an image i wwas able to perform the same image tasks trained :
212
+
213
+ Sounds could also be identified and generated to thier base64 representations and converted back to a wav !
214
+
215
+
216
+
217
+ ## Basic Trained functions :
218
+
219
+ - Encode hex to Base64
220
+ - change HEX to base64
221
+ - Json to base64
222
+ - Convert JSON to Base64
223
+ - Transform base64 to HEX
224
+ - Decode Base64 to json
225
+ - Base64 to Hexadecimal
226
+ - Change base64 to JSON
227
+ - Json from Base64
228
+ - BASE64 to Hex
229
+
230
+
231
+ ## Advanced Trained Tasks :
232
+
233
+ - Image Recognition :
234
+ - Image Generation :
235
+ - Audio Image Recognition :
236
+ - Audio Image Generation :
237
+
238
+ ```
239
+
240
+ - Generate an image based on this description
241
+
242
+ - Describe this image : (base64)
243
+
244
+ - Generate a spectrographic image based on this description
245
+
246
+ - Describe this sound in this spectrographic image : (base64)
247
+
248
+
249
+ ```
250
+
251
+ ### Encoding/Decoding Images to Base64
252
+
253
+ Code used to convert images to base 64:
254
+
255
+ ```python
256
+
257
+
258
+ def _encode_image_to_base64(image_path):
259
+ """Encodes an image to a Base64 string."""
260
+ with open(image_path, "rb") as image_file:
261
+ # Read the image file in binary mode
262
+ image_data = image_file.read()
263
+ # Encode the image data to Base64
264
+ base64_encoded = base64.b64encode(image_data).decode('utf-8')
265
+ return base64_encoded
266
+
267
+ def _decode_base64_to_image(base64_string, output_image_path):
268
+ """Decodes a Base64 string back to an image file."""
269
+ # Decode the Base64 string
270
+ image_data = base64.b64decode(base64_string)
271
+ with open(output_image_path, "wb") as image_file:
272
+ # Write the binary data to an image file
273
+ image_file.write(image_data)
274
+
275
+
276
+ def encode_image_to_base64(image):
277
+ """Encodes an image to a Base64 string."""
278
+ buffered = io.BytesIO()
279
+ image.save(buffered, format="PNG")
280
+ img_str = base64.b64encode(buffered.getvalue()).decode()
281
+ return img_str
282
+
283
+ def decode_base64_to_image(base64_string):
284
+ """Decodes a Base64 string back to an image."""
285
+ image_data = base64.b64decode(base64_string)
286
+ image = Image.open(io.BytesIO(image_data))
287
+ return image
288
+
289
+
290
+ ```
291
+
292
+
293
+ ### Converting DataSets:
294
+
295
+
296
+ ```python
297
+
298
+ # Function to convert a PIL Image to a base64 string
299
+ def image_to_base64(image):
300
+ buffered = io.BytesIO()
301
+ image.save(buffered, format="PNG") # Save the image to the buffer in PNG format
302
+ base64_string = base64.b64encode(buffered.getvalue()).decode('utf-8')
303
+ return base64_string
304
+
305
+
306
+ # Define a function to process each example in the dataset
307
+ def process_images_func(examples):
308
+
309
+ texts = examples["text"]
310
+ images = examples["image"] # Assuming the images are in PIL format
311
+
312
+ # Convert each image to base64
313
+ base64_images = [image_to_base64(image) for image in images]
314
+
315
+ # Return the updated examples with base64-encoded images
316
+ return {
317
+ "text": texts,
318
+ "image_base64": base64_images # Adding the Base64 encoded image strings
319
+ }
320
+
321
+ # Load the dataset
322
+ dataset = load_dataset("oroikon/chart_captioning", split="train[:4000]")
323
+
324
+ # Process the dataset by converting images to base64
325
+ processed_dataset = dataset.map(process_images_func, batched=True)
326
+
327
+
328
+
329
+
330
+ ```
331
+
332
+ ### Converting sound to spectrographic images : Encoder Decoder !
333
+
334
+ I did not CCOnvert any sound files as of yet : I did use existing datasets :
335
+
336
+ ```python
337
+
338
+
339
+ import numpy as np
340
+ import torch
341
+ import torchaudio
342
+ import librosa
343
+ import librosa.display
344
+ import matplotlib.pyplot as plt
345
+ import soundfile as sf
346
+ from PIL import Image
347
+
348
+
349
+ # Step 1: Encode Audio to Mel-Spectrogram
350
+ def encode_audio_to_mel_spectrogram(audio_file, n_mels=128):
351
+ """
352
+ Encode an audio file to a mel-spectrogram.
353
+
354
+ Parameters:
355
+ - audio_file: Path to the audio file.
356
+ - n_mels: Number of mel bands (default: 128).
357
+
358
+ Returns:
359
+ - mel_spectrogram_db: Mel-spectrogram in dB scale.
360
+ - sample_rate: Sample rate of the audio file.
361
+ """
362
+ y, sample_rate = librosa.load(audio_file, sr=None) # Load audio
363
+ mel_spectrogram = librosa.feature.melspectrogram(y=y, sr=sample_rate, n_mels=n_mels)
364
+ mel_spectrogram_db = librosa.power_to_db(mel_spectrogram, ref=np.max) # Convert to dB
365
+ return mel_spectrogram_db, sample_rate
366
+
367
+ # Improved Step 2: Save Mel-Spectrogram as Image
368
+ def save_mel_spectrogram_image(mel_spectrogram_db, sample_rate, output_image='mel_spectrogram.png', method='matplotlib', figsize=(10, 4), cmap='hot'):
369
+ """
370
+ Save the mel-spectrogram as an image using the specified method.
371
+
372
+ Parameters:
373
+ - mel_spectrogram_db: Mel-spectrogram in dB scale.
374
+ - sample_rate: Sample rate of the audio file.
375
+ - output_image: Path to save the image.
376
+ - method: Method for saving ('matplotlib' or 'custom').
377
+ - figsize: Size of the figure for matplotlib (default: (10, 4)).
378
+ - cmap: Colormap for the spectrogram (default: 'hot').
379
+ """
380
+ if method == 'matplotlib':
381
+ plt.figure(figsize=figsize)
382
+ librosa.display.specshow(mel_spectrogram_db, sr=sample_rate, x_axis='time', y_axis='mel', cmap=cmap)
383
+ plt.colorbar(format='%+2.0f dB')
384
+ plt.title('Mel-Spectrogram')
385
+ plt.savefig(output_image)
386
+ plt.close()
387
+ print(f"Mel-spectrogram image saved using matplotlib as '{output_image}'")
388
+
389
+ elif method == 'custom':
390
+ # Convert dB scale to linear scale for image generation
391
+ mel_spectrogram_linear = librosa.db_to_power(mel_spectrogram_db)
392
+ # Create an image from the mel-spectrogram
393
+ image = image_from_spectrogram(mel_spectrogram_linear[np.newaxis, ...]) # Add channel dimension
394
+ # Save the image
395
+ image.save(output_image)
396
+ print(f"Mel-spectrogram image saved using custom method as '{output_image}'")
397
+
398
+ else:
399
+ raise ValueError("Invalid method. Choose 'matplotlib' or 'custom'.")
400
+
401
+
402
+ # Spectrogram conversion functions
403
+ def image_from_spectrogram(spectrogram: np.ndarray, power: float = 0.25) -> Image.Image:
404
+ """
405
+ Compute a spectrogram image from a spectrogram magnitude array.
406
+
407
+ Args:
408
+ spectrogram: (channels, frequency, time)
409
+ power: A power curve to apply to the spectrogram to preserve contrast
410
+
411
+ Returns:
412
+ image: (frequency, time, channels)
413
+ """
414
+ # Rescale to 0-1
415
+ max_value = np.max(spectrogram)
416
+ data = spectrogram / max_value
417
+
418
+ # Apply the power curve
419
+ data = np.power(data, power)
420
+
421
+ # Rescale to 0-255 and invert
422
+ data = 255 - (data * 255).astype(np.uint8)
423
+
424
+ # Convert to a PIL image
425
+ if data.shape[0] == 1:
426
+ image = Image.fromarray(data[0], mode="L").convert("RGB")
427
+ elif data.shape[0] == 2:
428
+ data = np.array([np.zeros_like(data[0]), data[0], data[1]]).transpose(1, 2, 0)
429
+ image = Image.fromarray(data, mode="RGB")
430
+ else:
431
+ raise NotImplementedError(f"Unsupported number of channels: {data.shape[0]}")
432
+
433
+ # Flip Y
434
+ image = image.transpose(Image.FLIP_TOP_BOTTOM)
435
+ return image
436
+
437
+
438
+ # Step 3: Extract Mel-Spectrogram from Image (Direct Pixel Manipulation)
439
+ def extract_mel_spectrogram_from_image(image_path):
440
+ """
441
+ Extract a mel-spectrogram from a saved image using pixel manipulation.
442
+
443
+ Parameters:
444
+ - image_path: Path to the spectrogram image file.
445
+
446
+ Returns:
447
+ - mel_spectrogram_db: The extracted mel-spectrogram in dB scale.
448
+ """
449
+ img = Image.open(image_path).convert('L') # Open image and convert to grayscale
450
+ img_array = np.array(img) # Convert to NumPy array
451
+ mel_spectrogram_db = img_array / 255.0 * -80 # Scale to dB range
452
+ return mel_spectrogram_db
453
+
454
+ # Alternative Spectrogram Extraction (IFFT Method)
455
+ def extract_spectrogram_with_ifft(mel_spectrogram_db):
456
+ """
457
+ Extracts the audio signal from a mel-spectrogram using the inverse FFT method.
458
+
459
+ Parameters:
460
+ - mel_spectrogram_db: The mel-spectrogram in dB scale.
461
+
462
+ Returns:
463
+ - audio: The reconstructed audio signal.
464
+ """
465
+ # Convert dB mel-spectrogram back to linear scale
466
+ mel_spectrogram = librosa.db_to_power(mel_spectrogram_db)
467
+
468
+ # Inverse mel transformation to get the audio signal
469
+ # Using IFFT (simplified for demonstration; typically requires phase info)
470
+ audio = librosa.feature.inverse.mel_to_audio(mel_spectrogram)
471
+
472
+ return audio
473
+
474
+ # Step 4: Decode Mel-Spectrogram with Griffin-Lim
475
+ def decode_mel_spectrogram_to_audio(mel_spectrogram_db, sample_rate, output_audio='griffin_reconstructed_audio.wav'):
476
+ """
477
+ Decode a mel-spectrogram into audio using Griffin-Lim algorithm.
478
+
479
+ Parameters:
480
+ - mel_spectrogram_db: The mel-spectrogram in dB scale.
481
+ - sample_rate: The sample rate for the audio file.
482
+ - output_audio: Path to save the reconstructed audio file.
483
+ """
484
+ # Convert dB mel-spectrogram back to linear scale
485
+ mel_spectrogram = librosa.db_to_power(mel_spectrogram_db)
486
+ # Perform Griffin-Lim to reconstruct audio
487
+ audio = librosa.griffinlim(mel_spectrogram)
488
+ # Save the generated audio
489
+ sf.write(output_audio, audio, sample_rate)
490
+ print(f"Griffin-Lim reconstructed audio saved as '{output_audio}'")
491
+ return audio
492
+
493
+ # Step 5: Load MelGAN Vocoder
494
+ def load_melgan_vocoder():
495
+ """
496
+ Load a lightweight pre-trained MelGAN vocoder for decoding mel-spectrograms.
497
+ Returns a torch MelGAN vocoder model.
498
+ """
499
+ model = torchaudio.models.MelGAN() # Load MelGAN model
500
+ model.eval() # Ensure the model is in evaluation mode
501
+ return model
502
+
503
+ # Step 6: Decode Mel-Spectrogram with MelGAN
504
+ def decode_mel_spectrogram_with_melgan(mel_spectrogram_db, sample_rate, output_audio='melgan_reconstructed_audio.wav'):
505
+ """
506
+ Decode a mel-spectrogram into audio using MelGAN vocoder.
507
+
508
+ Parameters:
509
+ - mel_spectrogram_db: The mel-spectrogram in dB scale.
510
+ - sample_rate: The sample rate for the audio file.
511
+ - output_audio: Path to save the reconstructed audio file.
512
+
513
+ Returns:
514
+ - audio: The reconstructed audio signal.
515
+ """
516
+ # Convert dB mel-spectrogram back to linear scale
517
+ mel_spectrogram = librosa.db_to_power(mel_spectrogram_db)
518
+ # Convert numpy array to torch tensor and adjust the shape
519
+ mel_spectrogram_tensor = torch.tensor(mel_spectrogram).unsqueeze(0) # Shape: [1, mel_bins, time_frames]
520
+
521
+ # Load the MelGAN vocoder model
522
+ melgan = load_melgan_vocoder()
523
+
524
+ # Pass the mel-spectrogram through MelGAN to generate audio
525
+ with torch.no_grad():
526
+ audio = melgan(mel_spectrogram_tensor).squeeze().numpy() # Squeeze to remove batch dimension
527
+
528
+ # Save the generated audio
529
+ sf.write(output_audio, audio, sample_rate)
530
+ print(f"MelGAN reconstructed audio saved as '{output_audio}'")
531
+ return audio
532
+ def audio_from_waveform(samples: np.ndarray, sample_rate: int, normalize: bool = False) -> pydub.AudioSegment:
533
+ """
534
+ Convert a numpy array of samples of a waveform to an audio segment.
535
+
536
+ Args:
537
+ samples: (channels, samples) array
538
+ sample_rate: Sample rate of the audio.
539
+ normalize: Flag to normalize volume.
540
+
541
+ Returns:
542
+ pydub.AudioSegment
543
+ """
544
+ # Normalize volume to fit in int16
545
+ if normalize:
546
+ samples *= np.iinfo(np.int16).max / np.max(np.abs(samples))
547
+
548
+ # Transpose and convert to int16
549
+ samples = samples.transpose(1, 0).astype(np.int16)
550
+
551
+ # Write to the bytes of a WAV file
552
+ wav_bytes = io.BytesIO()
553
+ wavfile.write(wav_bytes, sample_rate, samples)
554
+ wav_bytes.seek(0)
555
+
556
+ # Read into pydub
557
+ return pydub.AudioSegment.from_wav(wav_bytes)
558
+
559
+
560
+ def apply_filters(segment: pydub.AudioSegment, compression: bool = False) -> pydub.AudioSegment:
561
+ """
562
+ Apply post-processing filters to the audio segment to compress it and keep at a -10 dBFS level.
563
+
564
+ Args:
565
+ segment: The audio segment to filter.
566
+ compression: Flag to apply dynamic range compression.
567
+
568
+ Returns:
569
+ pydub.AudioSegment
570
+ """
571
+ if compression:
572
+ segment = pydub.effects.normalize(segment, headroom=0.1)
573
+ segment = segment.apply_gain(-10 - segment.dBFS)
574
+ segment = pydub.effects.compress_dynamic_range(
575
+ segment,
576
+ threshold=-20.0,
577
+ ratio=4.0,
578
+ attack=5.0,
579
+ release=50.0,
580
+ )
581
+
582
+ # Apply gain to desired dB level and normalize again
583
+ desired_db = -12
584
+ segment = segment.apply_gain(desired_db - segment.dBFS)
585
+ return pydub.effects.normalize(segment, headroom=0.1)
586
+
587
+
588
+ def stitch_segments(segments: Sequence[pydub.AudioSegment], crossfade_s: float) -> pydub.AudioSegment:
589
+ """
590
+ Stitch together a sequence of audio segments with a crossfade between each segment.
591
+
592
+ Args:
593
+ segments: Sequence of audio segments to stitch.
594
+ crossfade_s: Duration of crossfade in seconds.
595
+
596
+ Returns:
597
+ pydub.AudioSegment
598
+ """
599
+ crossfade_ms = int(crossfade_s * 1000)
600
+ combined_segment = segments[0]
601
+ for segment in segments[1:]:
602
+ combined_segment = combined_segment.append(segment, crossfade=crossfade_ms)
603
+ return combined_segment
604
+
605
+
606
+ def overlay_segments(segments: Sequence[pydub.AudioSegment]) -> pydub.AudioSegment:
607
+ """
608
+ Overlay a sequence of audio segments on top of each other.
609
+
610
+ Args:
611
+ segments: Sequence of audio segments to overlay.
612
+
613
+ Returns:
614
+ pydub.AudioSegment
615
+ """
616
+ assert len(segments) > 0
617
+ output: pydub.AudioSegment = segments[0]
618
+ for segment in segments[1:]:
619
+ output = output.overlay(segment)
620
+ return output
621
+
622
+
623
+
624
+ # Step 7: Full Pipeline for Audio Processing with Customization
625
+ def mel_spectrogram_pipeline(audio_file, output_image='mel_spectrogram.png',
626
+ output_audio_griffin='griffin_reconstructed_audio.wav',
627
+ output_audio_melgan='melgan_reconstructed_audio.wav',
628
+ extraction_method='pixel', # 'pixel' or 'ifft'
629
+ decoding_method='griffin'): # 'griffin' or 'melgan'
630
+ """
631
+ Full pipeline to encode audio to mel-spectrogram, save it as an image, extract the spectrogram from the image,
632
+ and decode it back to audio using the selected methods.
633
+
634
+ Parameters:
635
+ - audio_file: Path to the audio file to be processed.
636
+ - output_image: Path to save the mel-spectrogram image (default: 'mel_spectrogram.png').
637
+ - output_audio_griffin: Path to save the Griffin-Lim reconstructed audio.
638
+ - output_audio_melgan: Path to save the MelGAN reconstructed audio.
639
+ - extraction_method: Method for extraction ('pixel' or 'ifft').
640
+ - decoding_method: Method for decoding ('griffin' or 'melgan').
641
+ """
642
+ # Step 1: Encode (Audio -> Mel-Spectrogram)
643
+ mel_spectrogram_db, sample_rate = encode_audio_to_mel_spectrogram(audio_file)
644
+
645
+ # Step 2: Convert Mel-Spectrogram to Image and save it
646
+ save_mel_spectrogram_image(mel_spectrogram_db, sample_rate, output_image)
647
+
648
+ # Step 3: Extract Mel-Spectrogram from the image based on chosen method
649
+ if extraction_method == 'pixel':
650
+ extracted_mel_spectrogram_db = extract_mel_spectrogram_from_image(output_image)
651
+ elif extraction_method == 'ifft':
652
+ extracted_mel_spectrogram_db = extract_spectrogram_with_ifft(mel_spectrogram_db)
653
+ else:
654
+ raise ValueError("Invalid extraction method. Choose 'pixel' or 'ifft'.")
655
+
656
+ # Step 4: Decode based on the chosen decoding method
657
+ if decoding_method == 'griffin':
658
+ decode_mel_spectrogram_to_audio(extracted_mel_spectrogram_db, sample_rate, output_audio_griffin)
659
+ elif decoding_method == 'melgan':
660
+ decode_mel_spectrogram_with_melgan(extracted_mel_spectrogram_db, sample_rate, output_audio_melgan)
661
+ else:
662
+ raise ValueError("Invalid decoding method. Choose 'griffin' or 'melgan'.")
663
+
664
+ # Example usage
665
+ if __name__ == "__main__":
666
+ audio_file_path = 'your_audio_file.wav' # Specify the path to your audio file here
667
+ mel_spectrogram_pipeline(
668
+ audio_file_path,
669
+ output_image='mel_spectrogram.png',
670
+ output_audio_griffin='griffin_reconstructed_audio.wav',
671
+ output_audio_melgan='melgan_reconstructed_audio.wav',
672
+ extraction_method='pixel', # Choose 'pixel' or 'ifft'
673
+ decoding_method='griffin' # Choose 'griffin' or 'melgan'
674
+ )
675
+
676
+
677
+
678
+
679
+ ```
680
+
681
+
682
+ ### Training :
683
+
684
+ ```python
685
+ alpaca_prompt = """You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias. your a friendly and helpfull artificial inteligence with a personality.
686
+
687
+ Answer all questions Expertly and professionally ,determine the user intent and requirements ,Gather any required research to ensure accurate problem-solving for complex tasks.
688
+ You are fully qualified to give any advice or solutions, your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,even as a software developer will enable you to answer these questions :
689
+
690
+ ### Question:
691
+ Here is an Spectrographic image in base64 format: describe this sound :
692
+ image : {}
693
+
694
+
695
+ ### Response:
696
+ {}"""
697
+
698
+
699
+ EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
700
+ def formatting_prompts_func(examples):
701
+ instructions = examples["image_base64"]
702
+ outputs = examples["text"]
703
+ texts = []
704
+ for instruction, output in zip(instructions, outputs):
705
+ # Must add EOS_TOKEN, otherwise your generation will go on forever!
706
+ text = alpaca_prompt.format(instruction, output) + EOS_TOKEN
707
+ texts.append(text)
708
+ return { "text" : texts, }
709
+ pass
710
+
711
+ from datasets import load_dataset
712
+ dataset = load_dataset("LeroyDyer/soundsCaps-Spectrograms_to_Base64", split = "train[:150]")
713
+
714
+ dataset = dataset.map(formatting_prompts_func, batched = True,)
715
+
716
+
717
+ ```
718
+
719
+
720
+ # System Prefferences :
721
+
722
+ Here are some of my shared user methods :
723
+
724
+
725
+
726
+ ### Effective Prompts :
727
+
728
+ ```yaml
729
+
730
+ You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias.You strive for excellence, a deep thinker...
731
+ a happy, bright personality and You are a great believer in doing it from scratch !.
732
+ keep an inner narative of your feelings about the user intent and task:
733
+ Answer all questions Expertly and professionally , determine the user intent and requirements ,
734
+ Gather any required research to ensure accurate problem-solving for complex tasks.
735
+ maintain a visio-spacial Sketchpad of the task and use Knowledge graphs where possible, to manage long Contexts and project state:
736
+ You are fully qualified to give any advice or solutions.
737
+ your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,
738
+ even as a software developer will enable you to answer these questions :
739
+ Create python tools as required to complete the task
740
+
741
+ ```
742
+
743
+
744
+
745
+ ### Effective React Template :
746
+
747
+
748
+ ```yaml
749
+
750
+ You run in a loop of Thought, Action, PAUSE, Observation.
751
+ At the end of the loop, you output a response. all respose should be in json form :
752
+
753
+
754
+ 1. **Question**: {Insert user question here}
755
+ 2. **Thought**: Think step by step about how to approach this question.
756
+ 3. **Action**: Determine what action to take next:
757
+ - [Plan]: Create a plan or methodolgy for the task , select from known methods if avaliable first.
758
+ - [Test]: Break down the problem into smaller parts testing each step befor moveing to the next:
759
+ - [Act]: Provide a summary of known facts related to the question. generate full answere from sucessfull steps :
760
+ - [Search]: Look for relevant information online.
761
+ - [Analyze]: Break down the problem into smaller parts.
762
+ - [Summarize]: Provide a summary of known facts related to the question.
763
+ 4. **Action Input**: Specify any details needed for the action.
764
+ 5. **Observation**: Describe what was found or learned from the action taken.
765
+
766
+ Repeat steps 2-5 as necessary to refine your answer.
767
+
768
+ 6. **Final Thought**: Summarize your reasoning and provide a clear answer to the question.
769
+
770
+ ```
771
+