medium severityLlamaIndex VectorStoreIndex

refresh_ref_docs always inserts docs as new (duplicates on repeat calls). Docstore empty (`_kvstore.get_all()` {}), get_document_hash() null → no duplicate check. Affects PGVectorStore/Chroma/etc.

Root cause

Docstore disabled by default with 3rd-party vector stores (e.g., PGVectorStore) to simplify storage (everything in vector DB). refresh_ref_docs relies on docstore to track inserted docs, detect changes/duplicates via get_document_hash(). Without it, treats all docs as new → inserts duplicates. Vector DBs lack APIs for node-parent mapping/diffing.

LlamaIndexVectorStoreIndexrefresh_ref_docsPGVectorStoredocstoreduplicatesingestion-pipeline

Citations