Uploaded image for project: 'PUBLIC - Liferay Portal Community Edition'
  1. PUBLIC - Liferay Portal Community Edition
  2. LPS-163688

Semantic Search Platform Capabilities Using txtai and Hugging Face's Inference API

Details

    Description

      Highlights

      This initial phase is introducing the foundations to extend Liferay Enterprise Search Experiences with semantic search capabilities providing administrators new avenues and tools powered by Machine Learning (ML) to improve the search experience of their visitors.*

      To build a semantic search experience leveraging sentence embeddings and vector search, Liferay DXP ships with a new out-of-the-box element called "Rescore by Text Embedding" that can be used in search blueprints. This element is improving the results by reordering the top matching items using cosine similarity or dot product function over a vector field we populate at indexing time and other configurable factors to re-score the documents. Thanks to this element and the visual query builder, users can easily configure the different aspects of the search query and test how it performs to build the right solution for their data and use-cases.

      How it Works

      Watch our Liferay /dev/24 session at https://youtu.be/sr7_aNWNzPY?t=4041

      When this feature is enabled and configured, the platform produces a numeric (vector) representation of the input text (for supported content types). The output of this transformation process is called embedding (aka. vector embedding or sentence embedding) and it is stored in the index document for each supported and configured type in Elasticsearch.

      Embeddings are meant to capture the meaning and the context of the input they are generated from (say, a the title and the first few sentences of the content of a Web Content Article) and it can provide better results for user searches over the traditional keyword matching. In order to achieve this, at search time, the keywords entered by users go through the same process making it possible to perform a similarity search from Liferay DXP to provide better, semantically more relevant results for users.

      Liferay DXP supports txtai (self-hosted / self-managed) and Hugging Face's Inference API as sentence transformers. At heart of the transformation process, there is always an ML model doing the heavy-lifting. Administrators can choose** from a wide range of pre-trained models from Hugging Face's Models Hub and configure the properties of the transformer connection through the System/ Instance Settings of Liferay DXP.

      Future development phases are planed to add support for more content types (like Documents and Media, Commerce Products), custom Web Content Structures and Objects, Hugging Face's Inference Endpoints as much more.

        • As models are capable of doing more than just sentence transformation, refer to our documentation for further details on finding a suitable model.

          Documentation

      https://learn.liferay.com/dxp/latest/en/using-search/liferay-enterprise-search/search-experiences/semantic-search.html (Will be published in a later phase.)

      Feature Flag

      • The feature has to be enabled:
        portal-ext.properties
        feature.flag.LPS-163688=true
        

        Background

      https://liferay.atlassian.net/l/cp/T0ZT0YE0

      Dev Channel

      #t-search-semantic-search

      Attachments

        Issue Links

          Activity

            People

              team-search Product Team Search
              tibor.lipusz Tibor Lipusz
              Tibor Lipusz Tibor Lipusz
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Packages

                  Version Package
                  7.4.13 DXP U47
                  7.4.13 DXP U53