SOLR search seemingly highlights pseudo random results



      I spent some time troubleshooting this issues with packet captures, and finally found the solution which I will post below. Here is the observation:

      === how to reproduce ==
      1) added various PDF documents to documents library (the docs would contain the word "test" in them)
      2) SOLR would index the documents fine
      3) Search for "test" in the search portlet
      4) return results would show PDF documents and highlight "test" by wrapping it with <em> </em> tags (as expected)
      5) ... however..., it would also highlight "20" and "2.0". eg <em>20</em> <em>2.0</em> even though I did not put "20" in the search query

      == why 20 and 2.0 were getting highlighting ===

      • it turns out that that the portletId was also getting passed to the SOLR search web query.. My search portlet happened to be "20".. There is another companyID parameter that get's passed as well, however, I didn't see any false return hits because it was 10365. However, if a document had 10365 in it, it would also get highlighted.

      == how to fix it ====

      • the hl.requireFieldMatch search parameter must be set to true
      • modify: src/com/liferay/portal/search/solr/SolrIndexSearcherImpl.java (around line 238. translateQuery() function)
      • add: solrQuery.setRequireFieldMatch(true);

      === other note ===

      • this can be a particularly annoying bug because many documents seem to contain "20" or "20" or "%20".
      • it would be nice to be able to set all of the solrQuery.setBlah() parameters in a .properties file somewhere.. all of the properties are hard-coded right now.


