Uploaded image for project: 'Teiid'
  1. Teiid
  2. TEIID-3725

In the JDG translators, enable named cache swapping so that materialization can be supported

    • Icon: Feature Request Feature Request
    • Resolution: Done
    • Icon: Major Major
    • 9.0, 8.12.5
    • 8.12
    • Misc. Connectors
    • None

      The JDG translators, that in order to support materialization, will need to enable the named cache that's referenced by the connection, to be swapped. This is due to JDG doesn't currently support renaming a cache (i.e., like table rename in JDBC). And because of that, it limits how the cache can be refreshed (don't want to clear it before re-loading).

      Ideas are:
      1. configure translator with the 2 cache names to use (a) initial cache to read from and (b) the staging cache to use
      perform materialize load
      call SYSADMIN.setProperty to trigger the swapping of the cache names
      2 ???

      Note: because there's no persistence in Teiid so that any cache name changes will outlive a server restart, when a restart occurs, the translator will read from the cache identified as the initial cache to read from.

            [TEIID-3725] In the JDG translators, enable named cache swapping so that materialization can be supported

            added PR: https://github.com/teiid/teiid-documents/pull/52 for document update.

            Van Halbert (Inactive) added a comment - added PR: https://github.com/teiid/teiid-documents/pull/52 for document update.

            Assigning back to Van. There probably needs to be a release note and documentation updates for this to be resolved.

            Steven Hawkins added a comment - Assigning back to Van. There probably needs to be a release note and documentation updates for this to be resolved.

            Will there be a separate issue for the doc changes, or do you want to work that under this one?

            Steven Hawkins added a comment - Will there be a separate issue for the doc changes, or do you want to work that under this one?

            I've need to add the documentation for this.

            Van Halbert (Inactive) added a comment - I've need to add the documentation for this.

            Should this be marked as resolved based upon the pull request or is there still more to do/document?

            Steven Hawkins added a comment - Should this be marked as resolved based upon the pull request or is there still more to do/document?

            Just completed the changes and testing with supporting materializing to a remote cache. These changes will include the following:

            • added materialize configuration examples to the docs/teiid/datasources/infinispan
            • updated the jdg-remote-cache example.
            • eliminated the use of the JDG remote-query quick start, by coping that code into the teiid jdg-remote-cache quick start because there are changes to the configuration of the materialize caches that were needed.
            • this also includes the refactoring of the infinispan-dsl resource adapter (TEIID-3559).

            Van Halbert (Inactive) added a comment - Just completed the changes and testing with supporting materializing to a remote cache. These changes will include the following: added materialize configuration examples to the docs/teiid/datasources/infinispan updated the jdg-remote-cache example. eliminated the use of the JDG remote-query quick start, by coping that code into the teiid jdg-remote-cache quick start because there are changes to the configuration of the materialize caches that were needed. this also includes the refactoring of the infinispan-dsl resource adapter ( TEIID-3559 ).

            Also, the user can configure a RemoteCacheManager and bind it to JNDI. For which each resource adapter can share. That is also an option.

            Van Halbert (Inactive) added a comment - Also, the user can configure a RemoteCacheManager and bind it to JNDI. For which each resource adapter can share. That is also an option.

            Each connection shares the same Cache Container via the reference back to the resource-adapter config factory. The connectionImpl is very light weight.

            Van Halbert (Inactive) added a comment - Each connection shares the same Cache Container via the reference back to the resource-adapter config factory. The connectionImpl is very light weight.

            It matters most how heavy weight each connection is. If they hold a significant amount of resources per connection / resource adapter, then it would definitely make sense to to move toward a single resource adapter - as we are likely to promote more general usage of JDG for materialization.

            Steven Hawkins added a comment - It matters most how heavy weight each connection is. If they hold a significant amount of resources per connection / resource adapter, then it would definitely make sense to to move toward a single resource adapter - as we are likely to promote more general usage of JDG for materialization.

            I don't disagree that its not optimal. However, until the latest releases of JDG, are there options to do all the configuration from the client side (i.e., configure protobuf on the server from the client). Which would now enable the resource adapter to be more flexible. But that would be a major change in configuration/migration for what's already available. So I've resisted making sweeping changes that would cause compatibility issues. Unless we want to create a brand new resource-adapater/translator for users to transition to.

            Van Halbert (Inactive) added a comment - I don't disagree that its not optimal. However, until the latest releases of JDG, are there options to do all the configuration from the client side (i.e., configure protobuf on the server from the client). Which would now enable the resource adapter to be more flexible. But that would be a major change in configuration/migration for what's already available. So I've resisted making sweeping changes that would cause compatibility issues. Unless we want to create a brand new resource-adapater/translator for users to transition to.

            The concern would be around how many resources would be consumed by each connection pool. If you wanted to have 10 things materialized you would have 10 pools each with x connections.

            Steven Hawkins added a comment - The concern would be around how many resources would be consumed by each connection pool. If you wanted to have 10 things materialized you would have 10 pools each with x connections.

            Van Halbert (Inactive) added a comment - - edited

            Currently yes, because the configuration indicates what class, cache name, marshaller, protobuf file, etc. per cache. It would get complicated to configure multiple caches per resource adapter. Unless those configuration options were moved to the translator, and just let the resource adapter deal with the cache container.

            (removed the last sentence, its not necessarily true)

            Van Halbert (Inactive) added a comment - - edited Currently yes, because the configuration indicates what class, cache name, marshaller, protobuf file, etc. per cache. It would get complicated to configure multiple caches per resource adapter. Unless those configuration options were moved to the translator, and just let the resource adapter deal with the cache container. (removed the last sentence, its not necessarily true)

            Just wanted to check what you mean by the resource adapter will know the name of what is being truncated/swapped. Can there only be a single materialization per resource adapter?

            Steven Hawkins added a comment - Just wanted to check what you mean by the resource adapter will know the name of what is being truncated/swapped. Can there only be a single materialization per resource adapter?

            Example dynamic vdb section to define the materialization:

            	CREATE view stockPricesMatView
            		(
                        productId integer,
                        symbol string,
                        price bigdecimal,
                        companyName   varchar(256)
                        
             		) OPTIONS (MATERIALIZED 'TRUE', UPDATABLE 'TRUE',         
                           MATERIALIZED_TABLE 'StockMatCache.Stock', 
                           "teiid_rel:MATVIEW_TTL" 60000,
                           "teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT" 'execute StockMatCache.native(''truncate cache'');',
               		       "teiid_rel:MATVIEW_LOAD_SCRIPT" 'insert into StockMatCache.Stock (productId, symbol, price, companyName) SELECT  A.ID, S.symbol, S.price, A.COMPANY_NAME FROM Stocks.StockPrices AS S, Accounts.PRODUCT AS A WHERE S.symbol = A.SYMBOL',
                           "teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT"  'execute StockMatCache.native(''swap cache names'');', 
                           "teiid_rel:MATVIEW_SHARE_SCOPE" 'NONE',
                           "teiid_rel:ALLOW_MATVIEW_MANAGEMENT" 'true', 
                           "teiid_rel:MATVIEW_STATUS_TABLE" 'Accounts.status',                
                           "teiid_rel:MATVIEW_ONERROR_ACTION" 'THROW_EXCEPTION')
             		AS SELECT  A.ID, S.symbol, S.price, A.COMPANY_NAME
                                FROM Stocks.StockPrices AS S, Accounts.PRODUCT AS A
                                WHERE S.symbol = A.SYMBOL;
            

            Van Halbert (Inactive) added a comment - Example dynamic vdb section to define the materialization: CREATE view stockPricesMatView ( productId integer, symbol string, price bigdecimal, companyName varchar(256) ) OPTIONS (MATERIALIZED 'TRUE' , UPDATABLE 'TRUE' , MATERIALIZED_TABLE 'StockMatCache.Stock' , "teiid_rel:MATVIEW_TTL" 60000, "teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT" 'execute StockMatCache. native (' 'truncate cache' ');' , "teiid_rel:MATVIEW_LOAD_SCRIPT" 'insert into StockMatCache.Stock (productId, symbol, price, companyName) SELECT A.ID, S.symbol, S.price, A.COMPANY_NAME FROM Stocks.StockPrices AS S, Accounts.PRODUCT AS A WHERE S.symbol = A.SYMBOL' , "teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT" 'execute StockMatCache. native (' 'swap cache names' ');' , "teiid_rel:MATVIEW_SHARE_SCOPE" 'NONE' , "teiid_rel:ALLOW_MATVIEW_MANAGEMENT" ' true ' , "teiid_rel:MATVIEW_STATUS_TABLE" 'Accounts.status' , "teiid_rel:MATVIEW_ONERROR_ACTION" 'THROW_EXCEPTION' ) AS SELECT A.ID, S.symbol, S.price, A.COMPANY_NAME FROM Stocks.StockPrices AS S, Accounts.PRODUCT AS A WHERE S.symbol = A.SYMBOL;

            Comment from Adrian regarding improving usability of materialization:

            So by "DV will reverse engineer from a table into the pojo to be used"
            you mean DV will generate the pojo source code based on the columns DV
            discovers in the db table? In that case it could also annotate its
            java-bean like properties with our protobuf annotations (see Infinispan
            Query Guide JDG 6.5, section 7.3.7). The protobuf schema would be
            internally generated by JDG by inspecting those annotations, it will
            also generate the entity marshallers so it would save DV a lot of work.
            The process is described in [1].

            Adrian

            [1]
            https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/6.5/html/Infinispan_Query_Guide/sect-Protobuf_Encoding.html#Defining_Protocol_Buffers_Schemas_With_Java_Annotations

            Status:

            The ability to perform materialization when JDG is in library mode has been done and tested. The jdg-local-cache quick start needs cleanup, but is essentially done.

            The next step is to make the changes to the connector-infinispan-dsl resource adapter so that it handles the cache aliasing as the connector-infinispan.6 resource adapter (library mode). Also, based on Adrian's suggestion, I'll go ahead and add the logic to check the class for protobuf annotations so that the cache can be configured accordingly.

            Van Halbert (Inactive) added a comment - Comment from Adrian regarding improving usability of materialization: So by "DV will reverse engineer from a table into the pojo to be used" you mean DV will generate the pojo source code based on the columns DV discovers in the db table? In that case it could also annotate its java-bean like properties with our protobuf annotations (see Infinispan Query Guide JDG 6.5, section 7.3.7). The protobuf schema would be internally generated by JDG by inspecting those annotations, it will also generate the entity marshallers so it would save DV a lot of work. The process is described in [1] . Adrian [1] https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/6.5/html/Infinispan_Query_Guide/sect-Protobuf_Encoding.html#Defining_Protocol_Buffers_Schemas_With_Java_Annotations Status: The ability to perform materialization when JDG is in library mode has been done and tested. The jdg-local-cache quick start needs cleanup, but is essentially done. The next step is to make the changes to the connector-infinispan-dsl resource adapter so that it handles the cache aliasing as the connector-infinispan.6 resource adapter (library mode). Also, based on Adrian's suggestion, I'll go ahead and add the logic to check the class for protobuf annotations so that the cache can be configured accordingly.

            When the resource-adapter is configured for materialization, the following assumptions are made:

            • any insert/update/delete will be performed against the staging cache.
            • any read will be performed against the primary cache.
            • as part of the materialization process, the updates will be performed as specified in the MATVIEW_LOAD_SCRIPT option.
              example:
                		       "teiid_rel:MATVIEW_LOAD_SCRIPT" 'insert into StockMatCache.Stock (productId, symbol, price, companyName) SELECT  A.ID, S.symbol, S.price, A.COMPANY_NAME FROM Stocks.StockPrices AS S, Accounts.PRODUCT AS A WHERE S.symbol = A.SYMBOL',
              

            Van Halbert (Inactive) added a comment - When the resource-adapter is configured for materialization, the following assumptions are made: any insert/update/delete will be performed against the staging cache. any read will be performed against the primary cache. as part of the materialization process, the updates will be performed as specified in the MATVIEW_LOAD_SCRIPT option. example: "teiid_rel:MATVIEW_LOAD_SCRIPT" 'insert into StockMatCache.Stock (productId, symbol, price, companyName) SELECT A.ID, S.symbol, S.price, A.COMPANY_NAME FROM Stocks.StockPrices AS S, Accounts.PRODUCT AS A WHERE S.symbol = A.SYMBOL' ,

            Added the following 2 property options to the resource-adapters:

            • StagingCacheName : Cache name for the staging cache used in materialization
            • AliasCacheName : Cache name for the alias cache used to manage the aliasing used in materialization

            Van Halbert (Inactive) added a comment - Added the following 2 property options to the resource-adapters: StagingCacheName : Cache name for the staging cache used in materialization AliasCacheName : Cache name for the alias cache used to manage the aliasing used in materialization

            Van Halbert (Inactive) added a comment - - edited

            The native queries are currently looking like:

            Before native query format:

            truncate cache 
            

            After native query format:

             swap cache names 
            

            changed to remove the cache names on the native query as the resource adapter will have this configured.

            Van Halbert (Inactive) added a comment - - edited The native queries are currently looking like: Before native query format: truncate cache After native query format: swap cache names changed to remove the cache names on the native query as the resource adapter will have this configured.

            Van Halbert (Inactive) added a comment - - edited

            There will need to be 3 caches in order to do materialization:

            • the primary cache used for reading
            • the staging cache to load using materialization
            • a cache used for tracking the cache name aliasing that will be used. This will enable the underlying translator logic to swap out the cache (i.e, primary versus staging) after the cache that was used for staging is loaded and provide a persistence of the aliasing when the server is restarted.

            Another reason for needing the 3rd cache is that the other caches will be completely cleared before each materialization. So combining different object types into a cache wouldn't work at this time.

            The configuring of the caches to be used will be done on the connector, along with configuring the other JDG connection information. I contemplated using translator overrides for this additional settings, but decided on the connector because the connector settings are specific and the staging cache settings would be specifically related to that same cache.

            Van Halbert (Inactive) added a comment - - edited There will need to be 3 caches in order to do materialization: the primary cache used for reading the staging cache to load using materialization a cache used for tracking the cache name aliasing that will be used. This will enable the underlying translator logic to swap out the cache (i.e, primary versus staging) after the cache that was used for staging is loaded and provide a persistence of the aliasing when the server is restarted. Another reason for needing the 3rd cache is that the other caches will be completely cleared before each materialization. So combining different object types into a cache wouldn't work at this time. The configuring of the caches to be used will be done on the connector, along with configuring the other JDG connection information. I contemplated using translator overrides for this additional settings, but decided on the connector because the connector settings are specific and the staging cache settings would be specifically related to that same cache.

            Based on the latest conference call, it was decided that the teiid translator will be changed to use a second JDG cache for managing the cache names. The key:value pairs will be used to refer to: cache alias name: cache name
            The cache alias names will be assigned accordingly:
            modelName.tableName : jdgCacheNameA
            modelName.tableName.stage : jdgCacheNameB
            The translator will be changed to support native query, so that a specific native query can be issued as part of the materialization process to handle pre-processing and post-processing (swapping jdgCacheNameA/B).
            There is more usability work to be done to assist with reverse engineering the view table into a pojo object. Will start looking at that once the translator logic is done and in the product build.

            Van Halbert (Inactive) added a comment - Based on the latest conference call, it was decided that the teiid translator will be changed to use a second JDG cache for managing the cache names. The key:value pairs will be used to refer to: cache alias name: cache name The cache alias names will be assigned accordingly: modelName.tableName : jdgCacheNameA modelName.tableName.stage : jdgCacheNameB The translator will be changed to support native query, so that a specific native query can be issued as part of the materialization process to handle pre-processing and post-processing (swapping jdgCacheNameA/B). There is more usability work to be done to assist with reverse engineering the view table into a pojo object. Will start looking at that once the translator logic is done and in the product build.

            posting the comments from the email thread:

            This definitely is better approach, much more akin to staging concept in the RDBMS.

            Ramesh

            ----- Original Message -----
            > We don't have cache delegates at the moment, but we would implement them
            > for your needs, including a clustered swap.
            >
            > Tristan
            >
            > On 25/11/2015 18:00, Van Halbert wrote:
            > > Maybe there's a version of cache delegates I missed. One of the options
            > > was for Teiid to use/manage the aliases that mapped to the 2 caches. And
            > > would be swapped when materialization process was performed. But an issue
            > > with this is when there's a cluster of DV nodes, the name swapping would
            > > have to be replicated across all the nodes. And DV doesn't have the means
            > > to guarantee the name is changed on all nodes after the process is
            > > performed.
            > >
            > > Van Halbert
            > >

            Van Halbert (Inactive) added a comment - posting the comments from the email thread: This definitely is better approach, much more akin to staging concept in the RDBMS. Ramesh ----- Original Message ----- > We don't have cache delegates at the moment, but we would implement them > for your needs, including a clustered swap. > > Tristan > > On 25/11/2015 18:00, Van Halbert wrote: > > Maybe there's a version of cache delegates I missed. One of the options > > was for Teiid to use/manage the aliases that mapped to the 2 caches. And > > would be swapped when materialization process was performed. But an issue > > with this is when there's a cluster of DV nodes, the name swapping would > > have to be replicated across all the nodes. And DV doesn't have the means > > to guarantee the name is changed on all nodes after the process is > > performed. > > > > Van Halbert > >

            Van Halbert (Inactive) added a comment - - edited

            Looking at an example of dynamic vdb and materialization:

            ...
            MATERIALIZED_TABLE 'Accounts.h2_stock_mat',
            "teiid_rel:MATVIEW_TTL" 60000,
            "teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT" 'execute accounts.native(''truncate table mat_stock_staging'');',
            "teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT" 'execute accounts.native(''ALTER TABLE h2_stock_mat RENAME TO h2_stock_mat_temp'');execute accounts.native(''ALTER TABLE mat_stock_staging RENAME TO h2_stock_mat'');execute accounts.native(''ALTER TABLE h2_stock_mat_temp RENAME TO mat_stock_staging'');',
            "teiid_rel:ON_VDB_DROP_SCRIPT" 'DELETE FROM Accounts.status WHERE Name=''stockPricesMatView'' AND schemaname = ''StocksMatModel''',
            "teiid_rel:MATERIALIZED_STAGE_TABLE" 'Accounts.mat_stock_staging',
            "teiid_rel:ALLOW_MATVIEW_MANAGEMENT" 'true',
            "teiid_rel:MATVIEW_STATUS_TABLE" 'status',
            "teiid_rel:MATVIEW_SHARE_SCOPE" 'NONE',
            "teiid_rel:MATVIEW_ONERROR_ACTION" 'THROW_EXCEPTION')
            ...

            What about supporting native calls to be caught by the translator to swap the names? Or maybe a pushed down function call that the translator caught to swap the cache?

            By using similar options related to using the above properties would keep it consistent, regardless type of data source used.

            Van Halbert (Inactive) added a comment - - edited Looking at an example of dynamic vdb and materialization: ... MATERIALIZED_TABLE 'Accounts.h2_stock_mat', "teiid_rel:MATVIEW_TTL" 60000, "teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT" 'execute accounts.native(''truncate table mat_stock_staging'');', "teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT" 'execute accounts.native(''ALTER TABLE h2_stock_mat RENAME TO h2_stock_mat_temp'');execute accounts.native(''ALTER TABLE mat_stock_staging RENAME TO h2_stock_mat'');execute accounts.native(''ALTER TABLE h2_stock_mat_temp RENAME TO mat_stock_staging'');', "teiid_rel:ON_VDB_DROP_SCRIPT" 'DELETE FROM Accounts.status WHERE Name=''stockPricesMatView'' AND schemaname = ''StocksMatModel''', "teiid_rel:MATERIALIZED_STAGE_TABLE" 'Accounts.mat_stock_staging', "teiid_rel:ALLOW_MATVIEW_MANAGEMENT" 'true', "teiid_rel:MATVIEW_STATUS_TABLE" 'status', "teiid_rel:MATVIEW_SHARE_SCOPE" 'NONE', "teiid_rel:MATVIEW_ONERROR_ACTION" 'THROW_EXCEPTION') ... What about supporting native calls to be caught by the translator to swap the names? Or maybe a pushed down function call that the translator caught to swap the cache? By using similar options related to using the above properties would keep it consistent, regardless type of data source used.

              van.halbert Van Halbert (Inactive)
              van.halbert Van Halbert (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: