Aug. 2, 2023, 6:20 a.m. | 1littlecoder

1littlecoder www.youtube.com

FalconLite is a quantized version of the Falcon 40B SFT OASST-TOP1 model, capable of processing long (i.e. 11K tokens) input sequences while consuming 4x less GPU memory. By utilizing 4-bit GPTQ quantization and adapted dynamic NTK RotaryEmbedding, FalconLite achieves a balance between latency, accuracy, and memory efficiency. With the ability to process 5x longer contexts than the original model, FalconLite is useful for applications such as topic retrieval, summarization, and question-answering.

FalconLite - https://huggingface.co/amazon/FalconLite

AutoGPT Q https://github.com/PanQiWei/AutoGPTQ

Fine-tuned Falcon model …

accuracy amazon context dynamic efficiency falcon falcon 40b gpu latency llm memory process processing quantization tokens

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US