# How To

This section is designed to help you quickly find clear, actionable guidance on how to use the Enterprise Knowledge Garden (EKG) tool effectively.

Refer to the following sections to know more:

  • Ingest Data into an EKG
  • Manage Data Chunks
  • Perform Retrieval Tests
  • Manage Data Library

# Ingest Data into an EKG

You can import the unstructured data into an EKG using the Manage Data section inside your knowledge garden. There are three primary methods for ingesting data.

# From Local Files

This method allows you to upload files directly from your local system.

  1. Navigate to the Manage Data section and select the Import tab
  2. You can either drag and drop your files directly onto the upload area or click to browse and select files from your local system
  1. Once selected, the files will automatically be uploaded, chunked, and embedded according to the pre-configured chunking and embedding strategies for the EKG

# From the Data Library

This method allows you to import documents that are already stored within the Data Library

  1. Navigate to the Manage Data section and select the Import tab
  2. Below the local file upload section, click the Data Library icon
  3. Select the required document set, and then choose the specific files you wish to import from that set
  1. Click Import. The files will then be automatically processed, chunked, and embedded

# From Google Drive

This method allows you to import documents directly from a linked Google Drive account into the system.

  1. Navigate to the Manage Data section and select the Import tab
  2. Below the local file upload section, click the Google Drive icon Follow the on-screen instructions to connect your account and grant permissions
  3. Once connected, browse the folders and files in your Google Drive and select the specific files you wish to import
  4. Click Import. The selected data will then be automatically processed, chunked, and embedded

# Using the Flow Designer

For more advanced or customized data ingestion pipelines, you can use the Flow Designer. This allows you to build a custom flow to pull data from a wide variety of sources, such as Amazon S3, Google Drive, etc and ingest it directly into the EKG.

For detailed guidelines on building an ingestion pipeline, please refer to the Automated Workflow

# Manage Data Chunks

After your data has been ingested and processed, you can review and refine the resulting data chunks using the Chunk Editor tab within your Enterprise Knowledge Garden (EKG). This interface provides powerful tools to ensure the quality and accuracy of the data before it is used by AI agents or other applications.

# Chunk Editor Interface

The main view of the Chunk Editor displays all the chunks generated from your source documents.

  • Chunk Overview: At the top, you can see the total number of chunks, the currently applied Chunk strategy (e.g., Block), and an option to Refresh for new Chunks
  • Document Grouping: Chunks are grouped by their source document (e.g., Invoice test.pdf), with a total count of chunks for each file
  • Chunk Cards: Each individual chunk is presented on a card, showing a preview of its text content, its "Embedded" status, and icons for management actions
  • Chunk Search: Chunk search relies on keyword matching to find information within predetermined text segments

#

# Editing and Managing Individual Chunks

You have granular control over each chunk:

  • Edit a Chunk: Click the edit icon on a chunk card to modify its text content directly
  • Add Metadata and Tags: Use the more options (three-dot) menu to add specific metadata or tags to a chunk, enhancing its searchability and providing additional context

# Verifying Chunk Source

To ensure full traceability and understand the context of a chunk, you can view its exact origin within the source document.

  1. Click on any chunk card in the Chunk Editor
    A detailed split-screen view appears
  2. This view displays the full source document on the left and the content of the selected chunk on the right
  3. A highlighted box is overlaid on the source document, visually pinpointing the precise section from which the chunk was extracted. This enables you to easily verify the context and accuracy of every chunk

# Perform Retrieval Tests

You can evaluate how effectively your Enterprise Knowledge Garden (EKG) retrieves relevant information by using the Retrieval Testing tab. This feature allows you to simulate user queries and analyze the returned chunks for relevance and accuracy.

# How to Perform a Retrieval Test

  1. From the main interface of your EKG, navigate to the Retrieval Testing tab
  2. In the search bar provided, enter the query you want to test and press Enter. The search function utilizes semantic understanding to find relevant information based on the meaning and context of a query

# Analyzing the Results

The system will display a list of Retrieved Chunks that it considers relevant to your query. Each chunk is presented on a card, giving you the following information:

  • Content: A preview of the text within the chunk
  • Similarity Score: A score (e.g., Similarity: 88) that indicates how closely the chunk's content matches your query. A higher score means a stronger match
  • Management Options: You have the same options to edit or add Metadata as you do in the Chunk Editor. Refer to Editing and Managing individual chunks for more details

# Refining Your Test

To refine the displayed chunks and test different retrieval scenarios, you can use the available options in the Retrieve Settings. For a detailed explanation of these concepts, please refer to the Core Concepts documentation

Click the Options button (or an equivalent control) to open the refinement settings. From here, you can:

  • Adjust the Similarity Score: Control how closely a query must match the stored information
  • Set the Top K: Define the initial number of content chunks retrieved for your query
  • Enable the Reranker: When enabled, chunks are reordered for better relevance. You can also select a specific Rerank Model
  • Define the Top N: Specify the number of chunks the reranker will display
  • Filter by Metadata and Tags: Select specific metadata and tags to narrow down the results to only those chunks that are associated with them

Click Apply to view the newly filtered chunks based on your selections

# Manage Data Library

The Purple Fabric Data Library is designed to serve as a centralized, secure, and organized repository for all essential files. It empowers organizations to streamline information access, enhance collaboration, and compliance across all teams.

# Key Features

# Centralized Access

The Purple Fabric Data Library eliminates information silos by consolidating data from various departments and projects into a single, unified platform. This ensures that users can easily locate and retrieve the files they need, when they need them.

# Document Upload & Storage

Our platform offers a seamless and user-friendly experience for uploading and storing files. Key capabilities include:

  • Multiple File Type Support: Upload and manage a wide range of file formats, including PDF, PNG, JPEG, TIFF, DOC and DOCX
  • Convenient Uploading: Utilize the intuitive drag-and-drop interface or perform bulk uploads to efficiently add multiple files at once

# Metadata & Tagging

Enhance the organization and discoverability of your data with our advanced metadata and tagging features. Users can assign critical metadata such as author, creation date, and project name, as well as custom tags, to classify and categorize files for easier searchability

# Collaboration & Integration

Facilitate seamless teamwork and enhance your existing workflows with our collaboration and integration features

  • Shared Access: Enable shared access to files and real-time updates to foster effective team collaboration
  • Third-Party Integration: The Purple Fabric Data Library is designed to integrate smoothly with other essential business tools, including Amazon S3 and other API based platforms

# Creating a Document Set

This guide will walk you through the process of setting up your first document set.

# Step 1: Review Document Requirements

Before uploading the required files, check the files with the following accepted formats and storage limitations:

  • File Integrity: Files must not be password-protected, zipped, or protected against page extractions
  • Image Quality: Images and files should have a minimum resolution of 200 DPI (300 DPI is recommended for optimal quality)
  • Content: files should not contain watermarks

# Step 2: Create a New Document Set

  1. Navigate to the Data Library module
  2. Click the Create document set button
  3. In the Create document set window that appears, provide a unique Name and a brief Description for your document set
  4. Click Create to finalize the new document set

# Upload Files to the Document Set

You can add files to your set using one of the following methods:

  1. Manual Upload: Upload files directly from your local system
  2. Amazon S3 Upload: Import files from a configured AWS S3 connector
  3. Web Crawler Upload: Fetch and store web pages and files from specified websites

# Manual Upload Procedure

  1. From the main Data Library page, select the desired Document Set
  2. On the Document Set page, click Import
  3. Select the Files option to upload files from your local system

# Amazon S3 Upload Procedure

  1. From the main Data Library page, select the desired Document Set
  2. On the Document Set page, click Import and select the Amazon S3 option
  3. In the Amazon S3 connector window, configure the following:
    • Select the desired connection from the Choose your connection dropdown
    • Enter the specific Bucket Name
    • Provide the Folder or File path for the files
    • Use the Test connection feature to verify the configuration
  4. Add any necessary metadata to categorize the imported files
  5. Click Start import to begin the process

# Web Crawler Upload Procedure

  1. From the main Data Library page, select the desired Document Set
  2. On the Document Set page, click Import and select the Web Crawler option
  3. In the Name field, provide a unique and descriptive name for your crawler configuration
  4. Provide Source Content: You can specify the content to be crawled in two ways: by providing direct URLs or by using a search query
    • To use direct URLs:

      1. Select the URL tab
      2. Input the full web address of the site you wish to crawl
      3. To add more websites, click the + URL button
    • To use a search query:

      1. Select the Search tab
      2. In the Search Query box, enter your query
      3. Click Run to use the search results as the source for the crawl
  5. Configure Crawl Options (Optional): Click the Crawl options dropdown to refine the crawler's behavior with the following settings:
    • Include paths: Specify the only URL paths the crawler is allowed to visit (e.g., about/)
    • Has content: Define the URL paths that contain the primary content to be extracted (e.g., article/)
    • Links: Enter the URL paths where the crawler should look for more links to follow (e.g., more/)
    • Scrap level: Set the crawl depth. A level of 0 crawls only the starting URL, while 1 includes pages linked from the start page, and so on
    • Maximum Urls: Set a firm limit on the total number of unique pages the crawler will process during its run
  6. Add Metadata (Optional): Click the + Metadata button to add custom tags or metadata fields to help in organizing and searching for the crawled content later
  7. Initiate the Crawl: Once all required information is entered, click the Start Crawl button to begin the process. To discard the configuration, click Cancel