Releases: HKUDS/LightRAG
v1.4.14
What's New
- feat(setup): support Atlas Local Docker for Mongo vector storage by @danielaskdd in #2925
- perf: batch graph operations in ainsert_custom_kg for large-scale imports by @nszhsl in #2910
- examples: add AG2 multi-agent demo with LightRAG retrieval by @faridun-ag2 in #2867
What's Fixed
- fix: remove redundant file_path_placeholder lookup in _merge_edges_then_upsert by @jwchmodx in #2877
- chore: remove dead config.ini / configparser code by @jwchmodx in #2887
- chore: remove dead OLLAMA_NUM_CTX / args.ollama_num_ctx assignment by @jwchmodx in #2888
- fix(webui): resolve all bun run lint errors in lightrag_webui by @danielaskdd in #2891
- chore(webui): migrate ESLint stylistic plugin by @danielaskdd in #2893
- docs(readme): restructure documentation and consolidate core api by @danielaskdd in #2896
- fix pipeline status history trimming by @danielaskdd in #2897
- fix: Corrected exception handling for LLM API timeouts where the subclass incorrectly passes keyword args by @hillct in #2902
- Improve LLM API failure diagnostics by @danielaskdd in #2903
- docs: deprecate config.ini in documentation by @danielaskdd in #2905
- fix(auth): prevent JWT algorithm confusion attack (GHSA-8ffj-4hx4-9pgf) by @danielaskdd in #2907
- fix(api): add missing metadata field in /query/data error response by @lawrence3699 in #2923
- fix(utils): prevent remove_think_tags from truncating responses with embedded tags by @sjhddh in #2900
- fix(opensearch): ensure consistent by lazy index refresh and real-time edge lookups by @danielaskdd in #2926
- fix(kg): correct omission of isolated nodes in
get_knowledge_graphduring full graph retrieval by @danielaskdd in #2928
New Contributors
- @jwchmodx made their first contribution in #2877
- @hillct made their first contribution in #2902
- @faridun-ag2 made their first contribution in #2867
- @lawrence3699 made their first contribution in #2923
- @sjhddh made their first contribution in #2900
- @nszhsl made their first contribution in #2910
Full Changelog: v1.4.13...v1.4.14
v1.4.13
What's New
- Feat: add make dev bootstrap target by @danielaskdd in #2870
- Feat: Add PostgreSQL performance timing instrumentation by @danielaskdd in #2855
What's Changed
- perf(storage): add cooperative yielding to prevent event loop blocking by @danielaskdd in #2847
- fix: sanitize entity_type in Memgraph upsert_node to prevent Cypher injection (CWE-89) by @sebastiondev in #2849
- perf(doc-status): add get_docs_by_statuses to all backends and fix PG pool/pagination bugs by @danielaskdd in #2853
- Fix setup .env regeneration for preserved custom variables by @danielaskdd in #2854
- Fix missing file_path in entity merge upserts by @danielaskdd in #2857
- fix(memgraph): preserve start node in knowledge graph query by @danielaskdd in #2868
- fix(auth): reject default JWT secret when AUTH_ACCOUNTS is configured by @danielaskdd in #2869
- fix(postgres): handle quoted AGE entity ids in edge retrieval by @danielaskdd in #2872
New Contributors
- @sebastiondev made their first contribution in #2849
Full Changelog: v1.4.12...v1.4.13
v1.4.12
Hot Fixes
- Optimize Postgres Vector DB upsert performance by increasing the batch size to 200 records per operation.
- Resolved issue with opening PDF files protected by permission-only restrictions
- Fixed pipeline cancellation causing filenames to be incorrectly changed to 'unknown_source'
What's Changed
- feat(auth): add bcrypt-prefixed password hashing support by @danielaskdd in #2813
- ⚡ Bolt: Optimize text sanitization in utils.py by @danielaskdd in #2814
- Fix loginToServer content type and missing grant_type by @danielaskdd in #2821
- Postgress/pgvector backend now allows index large embeddings with HALFVEC by @Daggle24 in #2663
- fix: make document deletion retry-safe by @danielaskdd in #2826
- fix(pdf): handle permission-only encrypted PDFs without password by @danielaskdd in #2827
- Add driver info to AsyncMongoClient instantiation by @alexbevi in #2834
New Contributors
Full Changelog: v1.4.11...v1.4.12rc1
v1.4.11
Important Notes
- Integrated OpenSearch as a unified storage backend, providing comprehensive support for all four LightRAG storage types: KV, Vector, Graph, and DocStatus.
- Introduced an interactive setup wizard to streamline configuration, replacing manual .env file editing. Support for local deployment of embedding, reranking, and storage backends via Docker Compose is now available. For further details, please refer to Interactive Setup Guide.
What's New
- Add OpenSearch as unified storage backend by @LantaoJin in #2739
- Add OpenSearchKVStorage support to LLM cache tools by @danielaskdd in #2790
- Add Makefile for quick deployment by @mlimarenko in #2548
- Refactor(Makefile): split monolithic wizard into modular env-base/storage/server targets by @danielaskdd in #2763
- Add OpenSearch storage configuration support to the interactive setup wizard by @LantaoJin in #2797
- perf(postgres): optimize KV storage upsert using executemany by @wkpark in #2742
What's Fixed
- Fix Qdrant large upsert payload failures with bounded batching by @danielaskdd in #2740
- build(deps): bump the github-actions group with 2 updates by @dependabot[bot] in #2737
- perf: use deque for BFS queue in get_knowledge_subgraph() by @giulio-leone in #2725
- perf: batch pre-compute query embeddings to eliminate sequential API round-trips by @errajibadr in #2729
- fix: reduce FaissVectorDBStorage meta.json file size by excluding vectors by @Br1an67 in #2733
- Enhance current MilvusVectorDBStorage with parameterized configuration by @hanlianlu in #2672
- fix: preserve failed-doc chunk metadata for reliable deletion cleanup by @danielaskdd in #2749
- fix: align zhipu adapter with official thinking and dimensions api by @ChenJiahao1 in #2775
- fix(operate,utils): correct typos in log messages and remove dead code by @lailoo in #2781
- perf(opensearch): Remove refresh="wait_for" from OpenSearch storage backends by @LantaoJin in #2786
- fix(api): sanitize workspace from CLI args and HTTP headers to prevent injection by @danielaskdd in #2792
- fix(api): normalize missing document file paths by @danielaskdd in #2793
- fix: prevent None file_path from propagating as unknown_source by @he-yufeng in #2796
New Contributors
- @giulio-leone made their first contribution in #2725
- @errajibadr made their first contribution in #2729
- @Br1an67 made their first contribution in #2733
- @hanlianlu made their first contribution in #2672
- @LantaoJin made their first contribution in #2739
- @ChenJiahao1 made their first contribution in #2775
- @lailoo made their first contribution in #2781
- @he-yufeng made their first contribution in #2796
Full Changelog: v1.4.10...v1.4.11
v1.4.11rc2
What's New
- Add Makefile for quick deployment by @mlimarenko in #2548
- Refactor(Makefile): split monolithic wizard into modular env-base/storage/server targets by @danielaskdd in #2763
For detail information about setup wizard, pls refer to: InteractiveSetup.md
What's Changed
- Fix Qdrant large upsert payload failures with bounded batching by @danielaskdd in #2740
- perf: use deque for BFS queue in get_knowledge_subgraph() by @giulio-leone in #2725
- perf: batch pre-compute query embeddings to eliminate sequential API round-trips by @errajibadr in #2729
- fix: reduce FaissVectorDBStorage meta.json file size by excluding vectors by @Br1an67 in #2733
- Enhance current MilvusVectorDBStorage with parameterized configuration by @hanlianlu in #2672
- fix: preserve failed-doc chunk metadata for reliable deletion cleanup by @danielaskdd in #2749
- build(deps-dev): bump the ui-components group in /lightrag_webui with 2 updates by @dependabot[bot] in #2750
- build(deps): bump the frontend-minor-patch group in /lightrag_webui with 3 updates by @dependabot[bot] in #2751
- build(deps): bump the github-actions group with 4 updates by @dependabot[bot] in #2759
New Contributors
- @giulio-leone made their first contribution in #2725
- @errajibadr made their first contribution in #2729
- @Br1an67 made their first contribution in #2733
- @hanlianlu made their first contribution in #2672
Full Changelog: v1.4.10...v1.4.11rc2
v1.4.10
What's New
- feat: Add POSTGRES_ENABLE_VECTOR option to conditionally disable pgvector extension by @StoreksFeed in #2683
- Add i18n support for Vietnamese by @zAcherttp in #2708
What's Changed
- Fix: Content Duplicate Detection for Document Upload Now Trackable by @danielaskdd in #2591
- Add Claude Code GitHub Workflow by @danielaskdd in #2601
- Fix/anthropic api compatibility by @skogsbaeck in #2603
- add support for C/C++ header files by @Mjemec in #2614
- Add LightRAG workspace management demo script by @vishvaRam in #2615
- Enhance README with usage example for workspaces by @vishvaRam in #2618
- feat(api): Add async streaming file upload with configurable size limit by @danielaskdd in #2622
- Update installation instructions in README by @Krytos in #2624
- Update Litewrite Link by @LarFii in #2628
- fix: Add default value for max_file_paths to prevent TypeError by @danielaskdd in #2641
- Fix: Add MAX_EXTRACT_INPUT_TOKENS to prevent gleaning context overflow (#2472) by @Odin233 in #2630
- fix: correct typos 'seperator', 'descpriton', and 'seperate' by @thecaptain789 in #2685
- Validate description fields in graph CRUD paths by @danielaskdd in #2706
- fix: pass embedding_dim to Azure OpenAI embedding API by @danielaskdd in #2721
- fix: use WindowsSelectorEventLoopPolicy on Windows to fix server port… by @Sampriti2803 in #2704
- fix: sanitize comma-separated entity types to prevent Neo4j CypherSyntaxError by @danielaskdd in #2722
- fix(postgres): make PGVectorStorage table/index creation idempotent (fixes #2702) by @danielaskdd in #2723
- fix(webui): make build runtime-agnostic by fixing Bun-only imports in… by @Pranavh-2004 in #2703
- fix(webui): wrap ReactMarkdown with div to fix className prop crash in dev mode by @danielaskdd in #2724
New Contributors
- @skogsbaeck made their first contribution in #2603
- @Mjemec made their first contribution in #2614
- @Krytos made their first contribution in #2624
- @Odin233 made their first contribution in #2630
- @thecaptain789 made their first contribution in #2685
- @StoreksFeed made their first contribution in #2683
- @Sampriti2803 made their first contribution in #2704
- @zAcherttp made their first contribution in #2708
- @Pranavh-2004 made their first contribution in #2703
Full Changelog: v1.4.9.11...v1.4.10
v1.4.9.11
Hot Fixed
- Fix OpenAI LLM binding options not loaded from environment variables by @danielaskdd in #2585
What's New
-
feat(gemini): Add Vertex AI support for Gemini LLM binding by @danielaskdd in #2529
-
refact(gemini): Migrate Gemini LLM to native async Google GenAI client by @danielaskdd in #2531
-
Refact: Change DOCX extraction to use HTML tags for whitespace by @danielaskdd in #2550
-
feat: add Korean localization by @jhchoi1182 in #2571
-
Add support for mdx file type by @coldfire-x in #2566
-
Add i18n support for German, Ukrainian, Russian, and Japanese languages by @mlimarenko in #2547
What's Fixed
- docs: fix the simple program rag init function return value in README.md by @Peefy in #2532
- docs: fix the simple program rag init function return value in README-zh.md by @Peefy in #2534
- feat: Implement WebUI Token Auto-Renewal (Sliding Window Expiration) by @danielaskdd in #2543
- Fixes the Gemini integration example in the README by @vishvaRam in #2537
- Add Gemini demo for LightRAG by @vishvaRam in #2538
- Add LightRAG demo with PostgreSQL and Gemini integration by @vishvaRam in #2556
- Update PostgreSQL demo script reference in README.md by @vishvaRam in #2557
- Fix: Enhance PostgreSQL Reconnection Tolerance for HA Deployments by @danielaskdd in #2562
- Add NEO4J_DATABASE variable to README by @vishvaRam in #2578
- Bump the frontend-minor-patch group in /lightrag_webui with 2 updates by @dependabot[bot] in #2577
- Add LightRAG demo script with vLLM integration by @vishvaRam in #2582
New Contributors
- @Peefy made their first contribution in #2532
- @vishvaRam made their first contribution in #2537
- @mlimarenko made their first contribution in #2547
- @jhchoi1182 made their first contribution in #2571
- @coldfire-x made their first contribution in #2566
Full Changelog: v1.4.9.10...v1.4.9.11
v1.4.9.10
What's Changed
- Hot Fix AttributeError in Neo4JStorage and MemgraphStorage when using storage specified workspace env var by @danielaskdd in #2526
Full Changelog: v1.4.9.9...v1.4.9.10
v1.4.9.9
Release Note V1.4.9.9
Important Notes
-
Add Workspace Isolation for Pipeline Status and In-memory Storage: Multiple LightRAG instances with distinct workspaces can be created simultaneously, marking a significant advancement toward seamless workspace switching within a single LightRAG server.
-
Add Workspace Vector Data Isolation by Model Name and Dimension for PostgreSQL and Qdrant: Previously, LightRAG used a single collection/table for difference embedding model and dimension, which caused dimension mismatch crashes or data pollution in multi-workspace.
-
Dimension Selection is Supported for OpenAI and Gemini Embedding model with new env var introduced:
EMBEDDING_SEND_DIM -
Add LLM Cache Migration and LLM Query Cache Cleanup Tools Between Different KV Storage: enabling users to switch storage backends without losing cached extraction and summary data.
-
Enhanced Enhanced DOCX Extraction with Table Content Support.
-
Enhanced XLSX extraction with proper handling of
tabandnewlinecharacters within cells. -
Fix Critical Security Vulnerability in React Server Components: #2494
-
Add Automatic Text Truncation Support for Embedding Functions: OpenAI embedding function now respect max_token_size value in EmbeddingFunc, and automatic truncate input text to prevent API errors caused by texts exceeding model token limits.
What's Breaking (for LightRAG Core integration only)
- Rename params of chunking function: If you incorporate the chunking function into LightRAG and pass parameters by name, corresponding code updates are required.
def chunking_by_token_size(
tokenizer: Tokenizer,
content: str,
split_by_character: str | None = None,
split_by_character_only: bool = False,
chunk_overlap_token_size: int = 100,
chunk_token_size: int = 1200,
) -> list[dict[str, Any]]:- Inject an embedding_func with model_name to LightRAG using wrap_embedding_func_with_attrs:
@wrap_embedding_func_with_attrs(
embedding_dim=1536, max_token_size=8192, model_name="text-embedding-3-small"
)
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=4, max=60),
retry=(
retry_if_exception_type(RateLimitError)
| retry_if_exception_type(APIConnectionError)
| retry_if_exception_type(APITimeoutError)
),
)
async def embedding_func(texts: list[str]) -> np.ndarray:
client = AzureOpenAI(
api_key=AZURE_OPENAI_API_KEY,
api_version=AZURE_EMBEDDING_API_VERSION,
azure_endpoint=AZURE_OPENAI_ENDPOINT,
)
embedding = client.embeddings.create(model=AZURE_EMBEDDING_DEPLOYMENT, input=texts)
embeddings = [item.embedding for item in embedding.data]
return np.array(embeddings)
rag = LightRAG(
working_dir=WORKING_DIR,
llm_model_func=llm_model_func,
embedding_func=embedding_func,
)
To ensure seamless transition, legacy code injecting embedding_func without model_name will continue to interface with the original non-suffixed vector tables.
What's New
-
Feat: Add Chain of Thought Support for Gemini LLM by @danielaskdd in #2326
-
Feat: Add Optional Embedding Dimension Control with OpenAI API by @danielaskdd in #2328
-
Feat: Add Gemini Embedding Support to LightRAG by @danielaskdd in #2329
-
Feat: Add LLM Cache Migration Tool by @danielaskdd in #2330
-
Feat: Add LLM Query Cache Cleanup Tool by @danielaskdd in #2335
-
Support async chunking func to improve processing performance when a heavy
chunking_funcis passed in by user by @tongda in #2336 -
Add ollama cloud support by @LacombeLouis in #2348
-
Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage by @danielaskdd in #2369
-
feat: add vchordrq vector index support for PostgreSQL by @wmsnp in #2378
-
Feat: Enhanced DOCX Extraction with Table Content Support by @danielaskdd in #2383
-
Feat: Enhance XLSX Extraction by Adding Separators and Escape Special Characters by @danielaskdd in #2386
-
Optimize for OpenAI Prompt Caching: Restructure entity extraction pro… by @Ghazi-raad in #2426
-
feat: Vector Storage Model Isolation with Automatic Migration by @BukeLy in #2391
-
feat: Implement Vector Database Model Isolation and Auto-Migration by @danielaskdd in #2513
-
feat: Add Automatic Text Truncation Support for Embedding Functions by @danielaskdd in #2523
What's Changed
-
Fix: Remove Duplicate Entity/Realtion Tracking Deletion in adelete_by_doc_id by @danielaskdd in #2322
-
Fix spelling errors in the "使用PostgreSQL存储" section of README-zh.md by @huangbhan in #2327
-
Add dimensions parameter support to openai_embed() by @yrangana in #2323
-
Fix Gemini driver retry mechanism by @danielaskdd in #2331
-
HotFix: Restore OpenAI Streaming Response & Refactor keyword_extraction Parameter by @danielaskdd in #2334
-
Refactor: Migrate PDF processing dependency from
pypdf2to activelypypdfby @danielaskdd in #2338 -
Fix: Prevent UnicodeEncodeError in JSON storage operations by @danielaskdd in #2344
-
Remove deprecated response_type parameter from query settings UI by @danielaskdd in #2345
-
Refactor: Optimize write_json for Memory Efficiency and Performance by @danielaskdd in #2346
-
Refact: Remove blocking dependency installation from document upload handlers by @danielaskdd in #2350
-
Refact: Implement Lazy Configuration Initialization for API Server by @danielaskdd in #2351
-
Refact: Enhance DOCLING integration with lazy loading and macOS safeguards by @danielaskdd in #2352
-
Fix: Robust error handling for async database operations in graph storage by @danielaskdd in #2356
-
Update the value corresponding to the extracted entity relationship keywords by @sleeepyin in #2358
-
Add macOS fork safety check for Gunicorn multi-worker mode by @danielaskdd in #2360
-
Refact: Add Embedding Token Limit Configuration and Improve Error Handling by @danielaskdd in #2359
-
Refact: Add Embedding Dimension Validation in EmbeddingFunc by @danielaskdd in #2368
-
test: Convert test_workspace_isolation.py to pytest style by @BukeLy in #2371
-
refactor(chunking): rename params and improve docstring for chunking by @EightyOliveira in #2379
-
Fix: Add chunk token limit validation with detailed error reporting by @danielaskdd in #2389
-
Fix: Remove redundant exception logging to eliminate pytest shutdown errors by @danielaskdd in #2390
-
issue-2394: use deployment variable instead of model for embeddings API call by @Amrit75 in #2395
-
Refactor: Centralize keyword_extraction parameter handling in OpenAI LLM implementations by @danielaskdd in #2401
-
Refact: Consolidate Azure OpenAI and OpenAI implementations by @danielaskdd in #2403
-
Update README.md by @chaohuang-ai in #2408
-
Update README.md by @chaohuang-ai in #2409
-
Fix: Add Comprehensive Retry Mechanism for Neo4j Storage Operations by @danielaskdd in #2417
-
Refact: Allow API Server to Start Without Built WebUI Assets by @danielaskdd in #2418
-
fix:exception handling order error by @EightyOliveira in #2421
-
Doc: Update README examples to prevent double-wrapping of embedding functions by @danielaskdd in #2432
-
Fix: Add configurable model support for Jina embedding by @danielaskdd in #2433
-
Update README.md by @chaohuang-ai in #2439
-
Fix KaTeX chemistry formula rendering (\ce command) not working by @danielaskdd in #2443
-
fix(postgres): Add CASCADE to AGE extension creation for automatic dependency resolution by @danielaskdd in #2446
-
Add Python 3.13 and 3.14 to the testing by @cclauss in #2436
-
Keep GitHub Actions up to date with GitHub's Dependabot by @cclauss in #2435
-
chore: optimize Dependabot configuration with dependency grouping and PR limits by @danielaskdd in #2447
...
v1.4.9.8
What's New
- Feat: Add PDF Decryption Support for Password-Protected Files by @danielaskdd in #2296
- Feat: Add optional Langfuse observability integration by @anouar-bm in #2298
- Feat: Add RAGAS evaluation framework for RAG quality assessment by @anouar-bm in #2297
- Feat: Add native gemini LLM support by @Humphryshikunzi in #2305
What's Changed
- Refact: Auto-refresh of Popular Labels When Pipeline Completes by @danielaskdd in #2291
- Fix empty context validation bug and improve naming consistency in query context building by @danielaskdd in #2295
- Refact: Enhanced RAG Evaluation CLI with Two-Stage Pipeline and Improved UX by @danielaskdd in #2311
- Refact: Separate Configuration of RAGAS for LLM and Embeddings by @danielaskdd in #2314
- Refactor: Remove Deprecated Chunk-Based Query Methods and Improve Graph Unit Test by @danielaskdd in #2319
- Fix node retrieval fail with special characters in IDs for Postgres AGE GraphStorage by @danielaskdd in #2320
- Fix performance bottleneck in document deletion by @danielaskdd in #2321
New Contributors
- @anouar-bm made their first contribution in #2298
- @Humphryshikunzi made their first contribution in #2305
Full Changelog: v1.4.9.7...v1.4.9.8