Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2980

sqlite support

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

      It would be very nice is we have SQLite support for infinispan. SQLite is a powerful database supporting terabyte sized databases in a file with competitive performance.

      I tried to use it as a JDBC store but the best driver I find in the internet (xerial sqlite jdbc driver) does not implement full jdbc specification and trying to use it results in exceptions.

      I think that perhaps using the non-jdbc wrapper sqlite4java may make sense for infinispan because:
      1. it promises better performance
      2. it allows using the sqlite library from OS (xerial driver uses a customized build of sqlite)

      FYI here is how I setup sqlite for infinispan (unsuccessfully):

      jboss as cli commands:
      /subsystem=datasources/jdbc-driver=sqlite:add(driver-name="sqlite",driver-module-name="org.xerial",driver-class-name=org.sqlite.JDBC)
      data-source add --name=SQLiteDS --connection-url="jdbc:sqlite:${sqlite.database.string}" --jndi-name=java:jboss/datasources/SQLiteDS --driver-name="sqlite"
      /subsystem=datasources/data-source=SQLiteDS/connection-properties=journal_mode:add(value="WAL")
      /subsystem=datasources/data-source=SQLiteDS:enable
      
      JBoss AS module definition (modules/org/xerial/main/module.xml):
      <?xml version="1.0" encoding="UTF-8"?>
      <module xmlns="urn:jboss:module:1.0" name="org.xerial">
      	<resources>
      		<resource-root path="sqlite-jdbc.jar" />
      	</resources>
      	<dependencies>
      		<module name="javax.api" />
      		<module name="javax.transaction.api"/>
      	</dependencies>
      </module>
      
      cache store/loader configuration snippet:
               <stringKeyedJdbcStore xmlns="urn:infinispan:config:jdbc:5.2" fetchPersistentState="false" ignoreModifications="false" purgeOnStartup="false" key2StringMapper="com.jboss.datagrid.chunchun.util.TwoWayKey2StringChunchunMapper">
                  <dataSource jndiUrl="java:jboss/datasources/SQLiteDS" />
                  <stringKeyedTable dropOnExit="false" createOnStart="true" prefix="ispn">
                     <idColumn name="ID_COLUMN" type="VARCHAR(255)" />
                     <dataColumn name="DATA_COLUMN" type="BLOB" />
                     <timestampColumn name="TIMESTAMP_COLUMN" type="BIGINT" />
                  </stringKeyedTable>
               </stringKeyedJdbcStore>
            </loaders>
      

      sql driver needs to be copied in the same directory as module.xml

      UPDATE: the exception is fixed with latest dev code of xerial jdbc driver, please look at comments to see remaining problems.

      The Exception I'm getting is:

      12:53:10,683 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (MSC service thread 1-3) ISPN000136: Execution error: org.infinispan.loaders.CacheLoaderException: Error while storing string key to database; key: 'user41', buffer size of value: 4918 bytes
      	at org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore.storeLockSafe(JdbcStringBasedCacheStore.java:253) [infinispan-cachestore-jdbc-5.2.5.Final.jar:5.2.5.Final]
      ...
      Caused by: java.sql.SQLException: not implemented by SQLite JDBC driver
      	at org.sqlite.Unused.unused(Unused.java:29) [sqlite-jdbc-3.7.2.jar:]
      	at org.sqlite.Unused.setBinaryStream(Unused.java:60) [sqlite-jdbc-3.7.2.jar:]
      	at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.setBinaryStream(WrappedPreparedStatement.java:871)
      	at org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore.storeLockSafe(JdbcStringBasedCacheStore.java:247) [infinispan-cachestore-jdbc-5.2.5.Final.jar:5.2.5.Final]
      	... 73 more

      The driver does not support setBinaryStream(), only setBytes(). Not sure if there are any other methods required by infinispan but not implemented.

      As a simple comparison between JDBC and direct storage, I tried an app that caches 3000 records of around 5k and 60000 records of around 0.5k (total of less than 60MiB). Bdbje store operation completes in less than a minute. With a local mysql server it takes 10 minutes. And this is on a machine with plenty of CPU and memory over an SSD. Unfortunately bdbje does not work clustered for me (ISPN-2968).
      So my point is that a local disk based, fast, reliable, transactional engine is highly needed.

            [ISPN-2980] sqlite support

            Aleksandar Kostadinov added a comment - - edited

            Haha, nice they picked up my bug report so fast! btw I had to compile the driver because that snapshot is from before the patch. There is another small patch for compiling with jdk7 I submitted if you can run some perf tests with it. (attaching the one I compiled for fedora 18 - sqlite-jdbc-3.7.15-SNAPSHOT-f18.jar)
            UPDATE: the driver above is updated with this patch that is also needed. Updated description with option to run sqlite with WAL journal mode for better performance.

            Anyways it seems to be working now and my quick test shows that the same thing taking over 7 minutes with PostgreSQL and just over 10 with mysql takes 20-30 seconds with sqlight. This is like the time it takes with MySQL using the MEMORY engine.

            There are still significant drawbacks of this solution though:

            • xerial jdbc driver does not support running on top of OS bundled sqlite library so sqlite will not be supported by Red Hat
            • going through jdbc and a connection pool is still an overhead (for configuration and performance) and that is evident by the even better bdbje performance
            • XA transactions are not supported by the xerial jdbc driver

            Aleksandar Kostadinov added a comment - - edited Haha, nice they picked up my bug report so fast! btw I had to compile the driver because that snapshot is from before the patch. There is another small patch for compiling with jdk7 I submitted if you can run some perf tests with it. (attaching the one I compiled for fedora 18 - sqlite-jdbc-3.7.15-SNAPSHOT-f18.jar ) UPDATE: the driver above is updated with this patch that is also needed. Updated description with option to run sqlite with WAL journal mode for better performance. Anyways it seems to be working now and my quick test shows that the same thing taking over 7 minutes with PostgreSQL and just over 10 with mysql takes 20-30 seconds with sqlight. This is like the time it takes with MySQL using the MEMORY engine. There are still significant drawbacks of this solution though: xerial jdbc driver does not support running on top of OS bundled sqlite library so sqlite will not be supported by Red Hat going through jdbc and a connection pool is still an overhead (for configuration and performance) and that is evident by the even better bdbje performance XA transactions are not supported by the xerial jdbc driver

            The support for setBinaryStream was already implemented in sqlite-jdbc and is available in https://bitbucket.org/xerial/sqlite-jdbc/downloads/sqlite-jdbc-3.7.15-SNAPSHOT-2.jar. See https://bitbucket.org/xerial/sqlite-jdbc/commits/343c646

            Martin Gencur added a comment - The support for setBinaryStream was already implemented in sqlite-jdbc and is available in https://bitbucket.org/xerial/sqlite-jdbc/downloads/sqlite-jdbc-3.7.15-SNAPSHOT-2.jar . See https://bitbucket.org/xerial/sqlite-jdbc/commits/343c646

            FYI the workload I'm talking about is not only write but also a lot of read accesses. Running as async loader does not make a big difference.
            Replicating same data to another server (all servers running on localhost) though takes only 20-30 seconds. I guess the lack of too many read accesses is the main reason for that difference.

            I think that a local efficient transactional storage would be unbeatable. For example a single EC2 lagre instance has 850GB ephemeral storage so one can easily build a reasonable size data grid on a couple of these if there is a way to efficiently access that amount disk space.

            Aleksandar Kostadinov added a comment - FYI the workload I'm talking about is not only write but also a lot of read accesses. Running as async loader does not make a big difference. Replicating same data to another server (all servers running on localhost) though takes only 20-30 seconds. I guess the lack of too many read accesses is the main reason for that difference. I think that a local efficient transactional storage would be unbeatable. For example a single EC2 lagre instance has 850GB ephemeral storage so one can easily build a reasonable size data grid on a couple of these if there is a way to efficiently access that amount disk space.

              Unassigned Unassigned
              akostadi1@redhat.com Aleksandar Kostadinov
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: