Skip to content

Commit 13b909e

Browse files
feat: added example for document saver (#164)
1 parent 2de3cba commit 13b909e

2 files changed

Lines changed: 140 additions & 5 deletions

File tree

docs/document_loader.ipynb

Lines changed: 131 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@
150150
"id": "f8f2830ee9ca1e01"
151151
},
152152
"source": [
153-
"## Basic Usage"
153+
"## Create PostgresLoader"
154154
]
155155
},
156156
{
@@ -187,7 +187,7 @@
187187
"id": "QuQigs4UoFQ2"
188188
},
189189
"source": [
190-
"### Cloud SQL Engine\n",
190+
"### Cloud SQL Engine Connection Pool\n",
191191
"\n",
192192
"One of the requirements and arguments to establish PostgreSQL as a document loader is a `PostgresEngine` object. The `PostgresEngine` configures a connection pool to your Cloud SQL for PostgreSQL database, enabling successful connections from your application and following industry best practices.\n",
193193
"\n",
@@ -229,6 +229,34 @@
229229
")"
230230
]
231231
},
232+
{
233+
"cell_type": "markdown",
234+
"metadata": {},
235+
"source": [
236+
"### Create a table (if not already exists)"
237+
]
238+
},
239+
{
240+
"cell_type": "code",
241+
"execution_count": null,
242+
"metadata": {},
243+
"outputs": [],
244+
"source": [
245+
"from langchain_google_cloud_sql_pg import Column\n",
246+
"\n",
247+
"await engine.ainit_document_table(\n",
248+
" table_name=TABLE_NAME,\n",
249+
" content_column=\"product_name\",\n",
250+
" metadata_columns=[\n",
251+
" Column(\"id\", \"SERIAL\", nullable=False),\n",
252+
" Column(\"content\", \"VARCHAR\", nullable=False),\n",
253+
" Column(\"description\", \"VARCHAR\", nullable=False),\n",
254+
" ],\n",
255+
" metadata_json_column=\"metadata\",\n",
256+
" store_metadata=True,\n",
257+
")"
258+
]
259+
},
232260
{
233261
"cell_type": "markdown",
234262
"metadata": {
@@ -260,7 +288,9 @@
260288
"source": [
261289
"### Load Documents via default table\n",
262290
"The loader returns a list of Documents from the table using the first column as page_content and all other columns as metadata. The default table will have the first column as\n",
263-
"page_content and the second column as metadata (JSON). Each row becomes a document. Please note that if you want your documents to have ids you will need to add them in."
291+
"page_content and the second column as metadata (JSON). Each row becomes a document. \n",
292+
"\n",
293+
"Please note that if you want your documents to have ids you will need to add them in.\n"
264294
]
265295
},
266296
{
@@ -325,13 +355,109 @@
325355
"source": [
326356
"loader = await PostgresLoader.create(\n",
327357
" engine,\n",
328-
" table_name=\"products\",\n",
358+
" table_name=TABLE_NAME,\n",
329359
" content_columns=[\"product_name\", \"description\"],\n",
330360
" format=\"YAML\",\n",
331361
")\n",
332362
"docs = await loader.aload()\n",
333363
"print(docs)"
334364
]
365+
},
366+
{
367+
"cell_type": "markdown",
368+
"metadata": {},
369+
"source": [
370+
"### Create PostgresSaver\n",
371+
"The `PostgresSaver` allows for saving of pre-processed documents to the table using the first column as page_content and all other columns as metadata. This table can easily be loaded via a Document Loader or updated to be a VectorStore. The default table will have the first column as page_content and the second column as metadata (JSON)."
372+
]
373+
},
374+
{
375+
"cell_type": "code",
376+
"execution_count": null,
377+
"metadata": {},
378+
"outputs": [],
379+
"source": [
380+
"from langchain_google_cloud_sql_pg import PostgresDocumentSaver\n",
381+
"\n",
382+
"# Creating a basic PostgresDocumentSaver object\n",
383+
"saver = await PostgresDocumentSaver.create(\n",
384+
" engine,\n",
385+
" table_name=TABLE_NAME,\n",
386+
" content_column=\"product_name\",\n",
387+
" metadata_columns=[\"description\", \"content\"],\n",
388+
" metadata_json_column=\"metadata\",\n",
389+
")"
390+
]
391+
},
392+
{
393+
"cell_type": "markdown",
394+
"metadata": {},
395+
"source": [
396+
"### Save Documents to default table\n",
397+
"Each document becomes a row in the table."
398+
]
399+
},
400+
{
401+
"cell_type": "code",
402+
"execution_count": null,
403+
"metadata": {},
404+
"outputs": [],
405+
"source": [
406+
"from langchain_core.documents import Document\n",
407+
"\n",
408+
"test_docs = [\n",
409+
" Document(\n",
410+
" page_content=\"Red Apple\",\n",
411+
" metadata={\"description\": \"red\", \"content\": \"1\", \"category\": \"fruit\"},\n",
412+
" ),\n",
413+
" Document(\n",
414+
" page_content=\"Banana Cavendish\",\n",
415+
" metadata={\"description\": \"yellow\", \"content\": \"2\", \"category\": \"fruit\"},\n",
416+
" ),\n",
417+
" Document(\n",
418+
" page_content=\"Orange Navel\",\n",
419+
" metadata={\"description\": \"orange\", \"content\": \"3\", \"category\": \"fruit\"},\n",
420+
" ),\n",
421+
"]\n",
422+
"await saver.aadd_documents(test_docs)"
423+
]
424+
},
425+
{
426+
"cell_type": "markdown",
427+
"metadata": {},
428+
"source": [
429+
"### Load the documents with PostgresLoader\n",
430+
"PostgresLoader can be used with `TABLE_NAME` to query and load the whole table."
431+
]
432+
},
433+
{
434+
"cell_type": "code",
435+
"execution_count": null,
436+
"metadata": {},
437+
"outputs": [],
438+
"source": [
439+
"loader = await PostgresLoader.create(engine, table_name=TABLE_NAME)\n",
440+
"docs = await loader.aload()\n",
441+
"\n",
442+
"print(docs)"
443+
]
444+
},
445+
{
446+
"cell_type": "markdown",
447+
"metadata": {},
448+
"source": [
449+
"### Delete Documents from default table\n",
450+
"The saver deletes a list of Documents, one document at a time internally."
451+
]
452+
},
453+
{
454+
"cell_type": "code",
455+
"execution_count": null,
456+
"metadata": {},
457+
"outputs": [],
458+
"source": [
459+
"await saver.adelete(test_docs)"
460+
]
335461
}
336462
],
337463
"metadata": {
@@ -359,4 +485,4 @@
359485
},
360486
"nbformat": 4,
361487
"nbformat_minor": 4
362-
}
488+
}

src/langchain_google_cloud_sql_pg/engine.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -371,6 +371,9 @@ async def ainit_vectorstore_table(
371371
overwrite_existing (bool): Whether to drop existing table. Default: False.
372372
store_metadata (bool): Whether to store metadata in the table.
373373
Default: True.
374+
375+
Raises:
376+
:class:`DuplicateTableError <asyncpg.exceptions.DuplicateTableError>`: if table already exists and overwrite flag is not set.
374377
"""
375378
await self._aexecute("CREATE EXTENSION IF NOT EXISTS vector")
376379

@@ -482,10 +485,16 @@ async def ainit_document_table(
482485
Args:
483486
table_name (str): The PgSQL database table name.
484487
content_column (str): Name of the column to store document content.
488+
Default: "page_content".
485489
metadata_columns (List[sqlalchemy.Column]): A list of SQLAlchemy Columns
486490
to create for custom metadata. Optional.
491+
metadata_json_column (str): The column to store extra metadata in JSON format.
492+
Default: "langchain_metadata". Optional.
487493
store_metadata (bool): Whether to store extra metadata in a metadata column
488494
if not described in 'metadata' field list (Default: True).
495+
496+
Raises:
497+
:class:`DuplicateTableError <asyncpg.exceptions.DuplicateTableError>`: if table already exists.
489498
"""
490499

491500
query = f"""CREATE TABLE "{table_name}"(

0 commit comments

Comments
 (0)