Uploaded image for project: 'PUBLIC - Liferay Portal Community Edition'
  1. PUBLIC - Liferay Portal Community Edition
  2. LPS-100272

Reindexing search indices deletes all Synonym Sets

    Details

      Description

      Solution notes

      When upgrading to a patch level where this issue is already fixed, upon portal startup, a new index will be created for each Virtual Instance (company) called liferay-search-tuning-synonyms-liferay-<companyId> where all existing Synonym Sets will be carried over to. Subsequent updates, deletions will be saved both to the index settings of the given Virtual Index and to the corresponding synonyms index. Synonym sets will be preserved across full reindex operations.

      Note: if you have applied the workaround described below in your environment, please remove the related settings from the "Additional Index Configurations" before the next full reindex.

       ----

      Steps to reproduce

      1. Navigate to Control Panel > Search Tuning > Synonyms
      2. Create a Synonym Set with a single entry with the words "car" and "automobile"
      3. Create a blog post with title "automobiles are useful" and content "i like automobiles"
      4. Navigate to Control Panel > Configuration > Search
      5. Reindex all search indices
      6. Navigate to Control Panel > Search Tuning > Synonyms
      7. Search for "car"

      Expected Result

      • The Synonym Set created in step 2 is still present in step 6.
      • The blog entry appears in the search results from step 7.

      Actual Result

      • The Synonym Set created in step 2 is not present anymore.
      • The blog entry does not appear in the results

      Explanation

      We store synonym sets directly into index settings themselves instead of a database or a separate index. When reindexing, we delete the old index and regenerate everything.


      Workaround: Backup & preserve Synonym Sets through the "Additional Index Configurations" for the Elasticsearch connector

      Backup and migrate existing Synonym sets - One company (Virtual Instance)

      After you add/update/remove a Synonym Set, you can get the Index Settings for the given company through Elaticsearch's REST API which also contains the "analysis" settings with your synonym sets. Once you have this JSON you can put it to the "Additional Index Configurations" of the "Elasticsearch 6" on the System Settings so next time you execute a reindex your Synonym Sets will be re-added.

      1. We assume you added 2 Synonyms Sets: "journal,article" and "liferay,company".
      2. We assume your example company ID is "20096" (You can obtain it from the "Virtual Instances" on the Control Panel)
      3. Get the Index Settings through the REST API:
        Request
        GET /liferay-20096/_settings
        

        For example http://localhost:9200/liferay-20096/_settings

        Response
        {
            "liferay-20096":{
                "settings":{
                    "index":{
                        "mapping":{
                            "total_fields":{
                                "limit":"7500"
                            }
                        },
                        "number_of_shards":"1",
                        "provided_name":"liferay-20096",
                        "creation_date":"1568629797046",
                        "analysis":{
                            "filter":{
                                "english_stemmer":{
                                    "type":"stemmer",
                                    "language":"english"
                                },
                                "english_stop":{
                                    "type":"stop",
                                    "stopwords":"_english_"
                                },
                                "liferay_filter_synonym_es":{
                                    "type":"synonym_graph",
                                    "lenient":"true",
                                    "synonyms":[
                                        "journal,article",
                                        "liferay, company"
                                    ]
                                },
                                "spanish_stemmer":{
                                    "type":"stemmer",
                                    "language":"light_spanish"
                                },
                                "spanish_stop":{
                                    "type":"stop",
                                    "stopwords":"_spanish_"
                                },
                                "english_possessive_stemmer":{
                                    "type":"stemmer",
                                    "language":"possessive_english"
                                },
                                "liferay_filter_synonym_en":{
                                    "type":"synonym_graph",
                                    "lenient":"true",
                                    "synonyms":[
                                        "journal,article",
                                        "liferay, company"
                                    ]
                                }
                            },
                            "analyzer":{
                                "liferay_analyzer_en":{
                                    "filter":[
                                        "english_possessive_stemmer",
                                        "lowercase",
                                        "liferay_filter_synonym_en",
                                        "english_stop",
                                        "english_stemmer"
                                    ],
                                    "tokenizer":"standard"
                                },
                                "keyword_lowercase":{
                                    "filter":"lowercase",
                                    "tokenizer":"keyword"
                                },
                                "liferay_analyzer_es":{
                                    "filter":[
                                        "lowercase",
                                        "spanish_stop",
                                        "liferay_filter_synonym_es",
                                        "spanish_stemmer"
                                    ],
                                    "tokenizer":"standard"
                                }
                            }
                        },
                        "number_of_replicas":"0",
                        "uuid":"6G-We50XRoqqOFBQBwnIPA",
                        "version":{
                            "created":"6050099"
                        }
                    }
                }
            }
        }
        
      1. Copy the "analysis" from the JSON
      2. Go to Control Panel - System Settings - Search - Elasticsearch 6/7.
      3. Add the JSON to "Additional Index Configurations", it looks something like this:
        "Elasticsearch 6/7 - Additional Index Configurations"
        {
        	"analysis": {
        		"analyzer": {
        			"keyword_lowercase": {
        				"filter": "lowercase",
        				"tokenizer": "keyword"
        			},
        			"liferay_analyzer_en": {
        				"filter": [
        					"english_possessive_stemmer",
        					"lowercase",
        					"liferay_filter_synonym_en",
        					"english_stop",
        					"english_stemmer"
        				],
        				"tokenizer": "standard"
        			},
        			"liferay_analyzer_es": {
        				"filter": [
        					"lowercase",
        					"spanish_stop",
        					"liferay_filter_synonym_es",
        					"spanish_stemmer"
        				],
        				"tokenizer": "standard"
        			}
        		},
        		"filter": {
        			"english_possessive_stemmer": {
        				"language": "possessive_english",
        				"type": "stemmer"
        			},
        			"english_stemmer": {
        				"language": "english",
        				"type": "stemmer"
        			},
        			"english_stop": {
        				"stopwords": "_english_",
        				"type": "stop"
        			},
        			"liferay_filter_synonym_en": {
        				"lenient": true,
        				"synonyms": ["journal,article","liferay, company"],
        				"type": "synonym_graph"
        			},
        			"liferay_filter_synonym_es": {
        				"lenient": true,
        				"synonyms": ["journal,article","liferay, company"],
        				"type": "synonym_graph"
        			},
        			"spanish_stemmer": {
        				"language": "light_spanish",
        				"type": "stemmer"
        			},
        			"spanish_stop": {
        				"stopwords": "_spanish_",
        				"type": "stop"
        			}
        		}
        	}
        }
        

        Important: You need to update this JSON each time you make changes to the Synonym Sets otherwise you'll restore an older configuration.

      1. Save
      2. Reindex
        Actual Result : Previously added Synonym Sets are preserved. You can see them on the Synonym Sets UI.

      Backup and migrate existing Synonym sets - Multiple company (Virtual Instances)

      1. Get the "analysis" settings from the index settings of each of your companies from the Elasticsearch server by following step 1) from the previous section
      2. Perform a full reindex from the Search admin
      3. Reindex spell check dictionaries too
      4. Restore the analysis settings you obtained in step 1) for each company index by using the Update index settings API:
        PUT /liferay-20123/_settings
        {
            "analysis" : {
                "analyzer":{
                     //...
            }
        }
        
      1. Go to Control Panel - Search Tuning - Synonyms and verify that the Synonym sets are there.

      Tested on:
      Tomcat 9.0.17 + MySQL 5.7
      Portal master-private SHA: f7dd6670d1fb4f84519e9c1fd0e768348fffb990
      Portal 7.2.x-private SHA: df31a3432ceeace3ca7f43189e72b5fc45625fd2

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              joshua.chong Joshua Chong
              Reporter:
              brian.lee Brian Lee
              Participants of an Issue:
              Recent user:
              Brian Wulbern
              Engineering Assignee:
              Adam Brandizzi
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Days since last comment:
                19 weeks ago

                  Packages

                  Version Package
                  7.2.10 DXP FP5
                  7.2.10.2 DXP SP2
                  7.2.X
                  Master