What's Changed
* Load finetuned weights by aniketmaurya in https://github.com/aniketmaurya/LLaMA-Inference-API/pull/2
* Refactor serve by aniketmaurya in https://github.com/aniketmaurya/LLaMA-Inference-API/pull/3
For inference
python
from llama_inference import LLaMAInference
import os
WEIGHTS_PATH = os.environ["WEIGHTS"]
checkpoint_path = f"{WEIGHTS_PATH}/lit-llama/7B/state_dict.pth"
tokenizer_path = f"{WEIGHTS_PATH}/lit-llama/tokenizer.model"
model = LLaMAInference(checkpoint_path=checkpoint_path, tokenizer_path=tokenizer_path, dtype="bfloat16")
print(model("New York is located in"))
For serving a REST API
python
app.py
from llama_inference.serve import ServeLLaMA, Response
import lightning as L
component = ServeLLaMA(input_type=PromptRequest, output_type=Response)
app = L.LightningApp(component)
**Full Changelog**: https://github.com/aniketmaurya/LLaMA-Inference-API/compare/v0.0.1...v0.0.2