Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-5452

Query Execution using Hibernate Search slow for large volume data

    XMLWordPrintable

Details

    • Bug
    • Resolution: Obsolete
    • Major
    • None
    • 7.2.1.Final, 8.0.0.Final
    • None
    • Hide

      1. Create a Batch job to Create 240 million entries in Infinispan with all entries Indexed

      2. Execute the following query through hotrod remote client

      String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI from com.subex.spark.common.distributedcaching.data.Subscriber where PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and SUBSCRIBER_TYPE = 0";  
      QueryFactory qf = Search.getQueryFactory(distcacheclient.getRemoteCache());
      RemoteQuery remoteQuery = new RemoteQuery(qf,(RemoteCacheImpl)distcacheclient.getRemoteCache(), distcacheclient.getSerializationContext(), queryString, 0, (int)numOfRecords);
      List<Subscriber> list = remoteQuery.list();

      3. Following Query was executed in Embedded mode

      String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI  from com.subex.spark.common.distributedcaching.data.Subscriber where " + "PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and SUBSCRIBER_TYPE = 0"; 
      QueryFactory qf = org.infinispan.query.Search.getQueryFactory(ispnCacheServer.getCache("SUBSCRIBER"));
      Query query = qf.from("com.subex.spark.common.distributedcaching.data.Subscriber").setProjection("ACCOUNT_ID","ID","PRODUCT_TYPE","FIRST_CALL","ID","IMEI").
      			having("PHONE_NUMBER").eq(prefix+ i).and().having("STATUS").in(1,2).and().having("SUBSCRIBER_TYPE").eq(0).toBuilder().build();
      			 
      List<Subscriber> list = query.list();
      rs = query.list().iterator();

      4. Hibernate Search Query

      QueryContextBuilder queryBuilder = searchSession.getSearchFactory().buildQueryBuilder();
      Query query = queryBuilder.forEntity(Subscriber.class).get().keyword().onField("ID").matching(Long.parseLong(prefix+i)).createQuery();
      
      org.hibernate.Query hibernateQuery=searchSession.createFullTextQuery(query, Subscriber.class).setProjection("ACCOUNT_ID","ACCOUNT_NAME","CONNECTION_TYPE","CONTACT_PHONE_NUMBER","CURRENT_BALANCE","CUST_ALERT_CONTACT_NUMBER","CUST_ALERT_EMAIL_ID","DATE_OF_BIRTH","DEALER_NAME","DS_NAME","FILE_NAME","FIRST_CALL","GROUPS","PVN","HOME_PHONE_NUMBER","ID","ID_NUMBER","IMEI","IMSI","IS_UPDATE","MCN1","MCN2","MODIFIED_DATE","NETWORK_ID","NOTIFICATION_GROUPS","OFFICE_PHONE_NUMBER","OPTIONAL_FIELD_1","OPTIONAL_FIELD_10","OPTIONAL_FIELD_11","OPTIONAL_FIELD_12","OPTIONAL_FIELD_13","OPTIONAL_FIELD_14","OPTIONAL_FIELD_15","OPTIONAL_FIELD_2","OPTIONAL_FIELD_3","OPTIONAL_FIELD_4","OPTIONAL_FIELD_5","OPTIONAL_FIELD_6","OPTIONAL_FIELD_7","OPTIONAL_FIELD_8","OPTIONAL_FIELD_9","PHONE_NUMBER","PRODUCT_TYPE","QOS","SERVICES","SERVICE_NUMBER_TYPE","SSID","STATUS","SUBSCRIBER_DOA","SUBSCRIBER_TYPE","SUBSCRIBER_UID","SUBSCRIBER_UID_DOA"); 

      This contains the list of all fields in the Cache all of which are indexed.

      Show
      1. Create a Batch job to Create 240 million entries in Infinispan with all entries Indexed 2. Execute the following query through hotrod remote client String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI from com.subex.spark.common.distributedcaching.data.Subscriber where PHONE_NUMBER= '" +prefix+ i+ "' AND STATUS in (1,2) and SUBSCRIBER_TYPE = 0" ; QueryFactory qf = Search.getQueryFactory(distcacheclient.getRemoteCache()); RemoteQuery remoteQuery = new RemoteQuery(qf,(RemoteCacheImpl)distcacheclient.getRemoteCache(), distcacheclient.getSerializationContext(), queryString, 0, ( int )numOfRecords); List<Subscriber> list = remoteQuery.list(); 3. Following Query was executed in Embedded mode String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI from com.subex.spark.common.distributedcaching.data.Subscriber where " + "PHONE_NUMBER= '" +prefix+ i+ "' AND STATUS in (1,2) and SUBSCRIBER_TYPE = 0" ; QueryFactory qf = org.infinispan.query.Search.getQueryFactory(ispnCacheServer.getCache( "SUBSCRIBER" )); Query query = qf.from( "com.subex.spark.common.distributedcaching.data.Subscriber" ).setProjection( "ACCOUNT_ID" , "ID" , "PRODUCT_TYPE" , "FIRST_CALL" , "ID" , "IMEI" ). having( "PHONE_NUMBER" ).eq(prefix+ i).and().having( "STATUS" ).in(1,2).and().having( "SUBSCRIBER_TYPE" ).eq(0).toBuilder().build(); List<Subscriber> list = query.list(); rs = query.list().iterator(); 4. Hibernate Search Query QueryContextBuilder queryBuilder = searchSession.getSearchFactory().buildQueryBuilder(); Query query = queryBuilder.forEntity(Subscriber.class).get().keyword().onField( "ID" ).matching( Long .parseLong(prefix+i)).createQuery(); org.hibernate.Query hibernateQuery=searchSession.createFullTextQuery(query, Subscriber.class).setProjection( "ACCOUNT_ID" , "ACCOUNT_NAME" , "CONNECTION_TYPE" , "CONTACT_PHONE_NUMBER" , "CURRENT_BALANCE" , "CUST_ALERT_CONTACT_NUMBER" , "CUST_ALERT_EMAIL_ID" , "DATE_OF_BIRTH" , "DEALER_NAME" , "DS_NAME" , "FILE_NAME" , "FIRST_CALL" , "GROUPS" , "PVN" , "HOME_PHONE_NUMBER" , "ID" , "ID_NUMBER" , "IMEI" , "IMSI" , "IS_UPDATE" , "MCN1" , "MCN2" , "MODIFIED_DATE" , "NETWORK_ID" , "NOTIFICATION_GROUPS" , "OFFICE_PHONE_NUMBER" , "OPTIONAL_FIELD_1" , "OPTIONAL_FIELD_10" , "OPTIONAL_FIELD_11" , "OPTIONAL_FIELD_12" , "OPTIONAL_FIELD_13" , "OPTIONAL_FIELD_14" , "OPTIONAL_FIELD_15" , "OPTIONAL_FIELD_2" , "OPTIONAL_FIELD_3" , "OPTIONAL_FIELD_4" , "OPTIONAL_FIELD_5" , "OPTIONAL_FIELD_6" , "OPTIONAL_FIELD_7" , "OPTIONAL_FIELD_8" , "OPTIONAL_FIELD_9" , "PHONE_NUMBER" , "PRODUCT_TYPE" , "QOS" , "SERVICES" , "SERVICE_NUMBER_TYPE" , "SSID" , "STATUS" , "SUBSCRIBER_DOA" , "SUBSCRIBER_TYPE" , "SUBSCRIBER_UID" , "SUBSCRIBER_UID_DOA" ); This contains the list of all fields in the Cache all of which are indexed.

    Description

      While benchmarking Infinispan we found that Querying is very slow when compared with Hibernate Search in Isolation
      Single node of Infinispan
      Memory allocated 230GB. No GC seen throughout query operation.
      Total required after full GC was 122GB.
      Setup 240 million records each of avg size 330 bytes .
      System has 16 cores and 40 worker threads were allocated at server side.
      With Single Client thread throughput was 900 req/sec in remote and 3k per sec in embedded more same request with Hibernate Search in Isolation gives throughput of 14000 req/sec.
      For 50 threads of clients the throughput was limited to 15k req/sec while hibernate search gives 80k req/sec for 10 threads.

      Attachments

        Activity

          People

            Unassigned Unassigned
            prashant.thakur_jira Prashant Thakur (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: