-
Type:
Bug
-
Status: Closed
-
Resolution: Fixed
-
Affects Version/s: 6.2.10 EE GA1
-
Fix Version/s: 6.2.X EE, 7.0.0 DXP FP8, 7.0.3 CE GA4
-
Component/s: Search Infrastructure, Search Infrastructure > Solr Connector
-
Branch Version/s:6.2.x
-
Backported to Branch:Committed
-
Story Points:1.5
-
Fix Priority:3
Solution & QA Notes
The following portal property has been added to 6.2 besides the code changes:
# # Set this to true to enable # com.liferay.portal.search.lucene.PerFieldAnalyzer to analyze index fields # both at index and query time using the mappings defined in # META-INF/search-spring.xml. # # Setting this to false is recommended when using a pluggable enterprise # search engine with its own mappings (e.g. as Solr) to avoid indexing # terms twice which may result in incorrect tokenization. # # Deprecated as of 7.0.0. # index.portal.field.analyzer.enabled=true
Liferay Solr 4 Search Engine 2.1.1 (https://web.liferay.com/marketplace/-/mp/application/30365680) sets this property to "false" by default.
Steps to reproduce (with groovy script)
- Configure Liferay with Solr 4
Reproduced with Liferay Solr 4 Search Engine 2.1.0
- Create a new site
- Configure the site with only Spanish language (es_ES)
- Go to webcontent and create two basic webcontents:
- First WC: Title=>"plantea" Content=> empty
- Second WC: Title=>"planteo" Content=> empty
- Execute attached groovy script from control panel: queryIndexLPS-67124.groovy
(the script executes search title_es_ES:planteo)
- Expected behavior: Both webcontents with "plantea" and "planteo" titles are found
- Wrong behavior: No webcontent is found
- Expected behavior: Both webcontents with "plantea" and "planteo" titles are found
- Configure Liferay with Lucene and repeat steps 1 to 5: the behaviour with Lucene should be correct
Steps to reproduce (with search portlet)
- Configure Liferay with Solr 4
Reproduced with Liferay Solr 4 Search Engine 2.1.0
- Create a new site
- Configure the site with only Spanish language (es_ES)
- Go to webcontent and create two basic webcontents:
- First WC: Title=>"plantea" Content=> empty
- Second WC: Title=>"planteo" Content=> empty
- Add a page with search portlet
- Execute a search with keyword: "plantea"
- Expected behavior: Both webcontents with "plantea" and "planteo" titles are found
(es_ES tokenizer removes last 'a' / 'o' as it is genre termination of the word)
- Wrong behavior: Only the webcontent with "plantea" title is found
- Expected behavior: Both webcontents with "plantea" and "planteo" titles are found
- Check query at Solr server:
- Expected behavior: Query is executed with 'plante' word
(es_ES tokenizer removes last 'a' / 'o' as it is genre termination of the word)
- Wrong behavior: Query is executed with 'plant' word
- Expected behavior: Query is executed with 'plante' word
- Configure Liferay with Lucene and repeat steps 1 to 6: the behaviour with Lucene should be correct
Technical Background
Result of search term analysis is not consistent:
- During indexation process: content is analyzed at Search engine (Lucene or Solr)
- During search process: query is analyzed in Liferay code before sending it to Search engine, but after that, Solr analyzes again the query
Most of the times, analysis of a term more than one time is not problematic, but for example in spanish there are words ending in more than one vowel that fails, for example: plantea or planteo:
- Lucene:
- Indexation with _es_ES tokenizer: plantea ==> (solr) plante
- Search query with _es_ES tokenizer: plantea ==> (liferay) plante
- plante == plante
- SOLR:
- Indexation with _es_ES tokenizer: plantea ==> (solr) plante
- Search query with _es_ES tokenizer: plantea ==> (liferay) plante ==> (solr) ==> plant
- plante != plant
As a solution, we shouldn't parse the query in liferay when using solr
master and 7.0.x are fixed because Lucene was removed from the core and the search functionality was rewritten in modules so take advantage of the query dialect of the underlying search engines (Elasticsearch, Solr)