Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 4.4.2, 5.0.1, 5.1.0
    • Fix Version/s: 5.1.0
    • Component/s: None
    • Labels:
      None
    • Environment:
      All
    • Similar Issues:
      Show 5 results 

      Description

      Lucene is holding on all requests in which some indexing in lucene needs to take place...
      It boils down to LuceneUtil.getWriter(companyId) never ever comming back and waiting forever.
      The issues is caused by a stale write.lock index in the ~/liferay/lucene/COMPANYID/ folder

      Here is what i'm doing:
      1 - I am simulating the lock by just adding it manually. This works just as if the lock was left there by a leaking process.
      2 - I restart the app server and attempt to save something that needs a lucene writer.
      3 - The very first time, there is a lock timeout exception in IndexWriterFactory, around line 180 in 4.3.1, 188 on trunk:

      IndexWriter writer = new IndexWriter
      LuceneUtil.getLuceneDir(companyId)
      LuceneUtil.getAnalyzer(), create);

      Which makes sense...
      4 - The main problen now is that, around line 204, in the finally clause, the releaseLock method is never called, hasError == true, but newWriter == false... So there is a proces which now has a lock on the writer, that didn't get the writer and will never get the writer... It is just a roadblock for anywone requesting the writer in the future...

      finally {
      if (hasError && newWriter) {
      try

      { releaseLock(companyId); }

      catch (Exception e) {
      }
      }
      }

      5 - So... we are left with what seems to be a deadlock... Removing the stale lock doesn't do anything, cuz now it is the Semaphore holding the queue, not the lock itself...

      I kind of wish there was a max time for a process to be in the semaphore queue, but it doesn't look like the semaphore being used has that capability...
      Another issue would be do identify a stale lock and actually erase it, but seems pointless w/o making sure that the locks liferay is creating get released propperly...
      It seems simple enough to modify the code to release the lock on any error, regardless of being a newWriter or not, but, i just don't know, liferay must have put that there for some reason!

        Activity

        Hide
        Alexander Wallace added a comment - - Restricted to

        The issue described above, which I was experiencing, and which Raju kindly filed a ticket for, happens in LR 4.3.1 ...

        There are 2 issues here...

        1 - the semaphore lock is never release if there is an error in trying to get the writer in IndexWriterFactory... This issue is resolved in a later version of liferay... I didn't check all versions, but 5.0.x doesn't have this issue.

        2 - The stale lock will remain there forever until manually removed... I a couple of simple enhancements that I will just paste here in case they can serve to others...

        In IndexWriterFactory.java i added a method:
        [code]

        /**

        • Enhancement to remove stale locks
          */
          private void deleteLuceneWriteLock(Exception e)
          Unknown macro: { if (!"true".equals(PropsUtil.get("lucene.stale.lock.delete"))) return; // added a property to disable this feature // We only attempt this in known cases if (e.getClass().equals(org.apache.lucene.store.LockObtainFailedException.class)) { _log.warn("deleteLuceneWriteLock: Will attempt to delete write lock due to "+e.getClass()); String lock = e.toString().substring(e.toString().lastIndexOf("@/")+1); File f = new File(lock); f.delete(); _log.warn("deleteLuceneWriteLock: lock file "+lock+" delete called"); } }

        [/code]

        I realized that in my case, the exception being thrown hinted me of a stale lock...

        I call this method from inside the getWriter(long, boolean) when an exception is caught:
        [code]

        } catch (Exception e) {
        hasError = true;

        _log.error(e, e);

        // Enhancement
        // Attempt to delete lucene lock if appropiate
        _log.warn(" Error while obtaining Lucene Writer, May attempt to delete lock if enabled");
        try

        { deleteLuceneWriteLock(e); }

        catch (Exception ee)

        { _log.error(" Error attempting to delete lucene write lock ", ee); }

        return null;

        [/code]

        That gets rid of the stale locks... However, the first thread that encountered the problem will get a null writer... To contrarrest this, i just added a simply retry process to LuceneUtil.java in the getWriter(long, boolean) method:

        [code]

        public static IndexWriter getWriter(long companyId, boolean create)
        throws IOException {

        // enhancement to retry getting a writer
        // specifically for when stale locks are found and deleted after first attempt
        int retryCount = 0;
        try

        { retryCount = Integer.parseInt(PropsUtil.get("lucene.writer.retry")); }

        catch (NumberFormatException e)

        { /* not worth doing much */ }

        int retryAttempts = 0;
        IndexWriter writer = null;
        while (retryAttempts < retryCount)

        { writer = _instance._sharedWriter.getWriter(companyId, create); if (writer != null) break; retryAttempts++; }

        return writer;

        }

        [/code]

        I hope this can help somebody...

        Show
        Alexander Wallace added a comment - - Restricted to The issue described above, which I was experiencing, and which Raju kindly filed a ticket for, happens in LR 4.3.1 ... There are 2 issues here... 1 - the semaphore lock is never release if there is an error in trying to get the writer in IndexWriterFactory... This issue is resolved in a later version of liferay... I didn't check all versions, but 5.0.x doesn't have this issue. 2 - The stale lock will remain there forever until manually removed... I a couple of simple enhancements that I will just paste here in case they can serve to others... In IndexWriterFactory.java i added a method: [code] /** Enhancement to remove stale locks */ private void deleteLuceneWriteLock(Exception e) Unknown macro: { if (!"true".equals(PropsUtil.get("lucene.stale.lock.delete"))) return; // added a property to disable this feature // We only attempt this in known cases if (e.getClass().equals(org.apache.lucene.store.LockObtainFailedException.class)) { _log.warn("deleteLuceneWriteLock: Will attempt to delete write lock due to "+e.getClass()); String lock = e.toString().substring(e.toString().lastIndexOf("@/")+1); File f = new File(lock); f.delete(); _log.warn("deleteLuceneWriteLock: lock file "+lock+" delete called"); } } [/code] I realized that in my case, the exception being thrown hinted me of a stale lock... I call this method from inside the getWriter(long, boolean) when an exception is caught: [code] } catch (Exception e) { hasError = true; _log.error(e, e); // Enhancement // Attempt to delete lucene lock if appropiate _log.warn(" Error while obtaining Lucene Writer, May attempt to delete lock if enabled"); try { deleteLuceneWriteLock(e); } catch (Exception ee) { _log.error(" Error attempting to delete lucene write lock ", ee); } return null; [/code] That gets rid of the stale locks... However, the first thread that encountered the problem will get a null writer... To contrarrest this, i just added a simply retry process to LuceneUtil.java in the getWriter(long, boolean) method: [code] public static IndexWriter getWriter(long companyId, boolean create) throws IOException { // enhancement to retry getting a writer // specifically for when stale locks are found and deleted after first attempt int retryCount = 0; try { retryCount = Integer.parseInt(PropsUtil.get("lucene.writer.retry")); } catch (NumberFormatException e) { /* not worth doing much */ } int retryAttempts = 0; IndexWriter writer = null; while (retryAttempts < retryCount) { writer = _instance._sharedWriter.getWriter(companyId, create); if (writer != null) break; retryAttempts++; } return writer; } [/code] I hope this can help somebody...
        Hide
        Brian Chan added a comment - - Restricted to

        Found a cleaner fix via LuceneUtil.java.

        Under checkLuceneDir

        I added:

        if (luceneDir.fileExists("write.lock"))

        { luceneDir.deleteFile("write.lock"); }

        See commit logs.

        Show
        Brian Chan added a comment - - Restricted to Found a cleaner fix via LuceneUtil.java. Under checkLuceneDir I added: if (luceneDir.fileExists("write.lock")) { luceneDir.deleteFile("write.lock"); } See commit logs.
        Hide
        Alexander Wallace added a comment - - Restricted to

        I don't see the fix (either mine or Brian's) in LuceneUtil.java Was this taken car of some other way?

        Show
        Alexander Wallace added a comment - - Restricted to I don't see the fix (either mine or Brian's) in LuceneUtil.java Was this taken car of some other way?
        Hide
        Alexander Wallace added a comment - - Restricted to

        rever mind last comment... i was looking @ the wrong file... Brian's fix is in...

        Show
        Alexander Wallace added a comment - - Restricted to rever mind last comment... i was looking @ the wrong file... Brian's fix is in...
        Hide
        Roberto Tellado added a comment -

        Where to put the property value lucene.writer.retry and you give?

        I have the version 5.2.3 but this problem is not resolved, at least for JDBC.

        Tank's.

        Show
        Roberto Tellado added a comment - Where to put the property value lucene.writer.retry and you give? I have the version 5.2.3 but this problem is not resolved, at least for JDBC. Tank's.
        Hide
        Roberto Tellado added a comment -

        I have the version 5.2.3 but this problem is not resolved, at least for JDBC.

        Show
        Roberto Tellado added a comment - I have the version 5.2.3 but this problem is not resolved, at least for JDBC.

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development

                Structure Helper Panel