-
Enhancement
-
Resolution: Obsolete
-
Minor
-
None
-
None
-
None
We need a way to list annotation-less indexed entities in the infinispan XML.
The
<indexed-entities>
element schema will need to be extended as follows:
<indexed-entities> <!-- annotated entity --> <indexed-entity>org.infinispan.query.queries.faceting.Car</indexed-entity> <!-- non-annotated entity --> <indexed-entity-mapping> <!-- the FQN of the class to index --> <class>my.domain.model.Author</class> <!-- optional --> <spatial name="place" mode="HASH"/> <!-- list of indexed properties --> <property name="name" type="method"> <field store="true" index="true" analyze="true" norms="true" term-vector="yes" boost="0.5"/> </property> <property name="title" type="method"> <field store="true" index="true" analyze="true" norms="true" term-vector="yes" boost="0.5" analyzer="titleanalyzer"/> </property> <property type="method" name="birthdate"> <field store="true" index="true" analyze="true" norms="true" term-vector="yes" boost="0.5"/> <date-bridge resolution="DAY"/> </property> <property type="method" name="city"> <spatial name="name" store="true" boost="0.5" spatial-mode="RANGE" /> </property> </indexed-entity-mapping> </indexed-entities>
[ISPN-7842] Declarative indexed entity mapping
Just a heads-up: the programmatic mapping API is different in Hibernate Search 6 is different, down to some concepts that are quite different (e.g. there's not just one type of field but multiple types: FullTextField, KeywordField, ScaledNumberField; FieldBridge was split into multiple different classes; the concept of Binder was introduced, etc.).
If we introduce an XML schema for declarative mapping, we might want to model it after the Search 6 concepts rather than the Search 5 concepts currently exposed in Infinispan. Or better, model it after Infinispan-specific concepts that are a subset of Search 6 concepts (e.g. annotations in infinispan packages); but those don't exist yet.
Since I work on Hibernate Search, I feel the urge to give my two cents
Mising features
- The index name seems to be missing; but maybe it's on purpose (you don't want the index name to be customized, nor two entities to share the same index)?
- I think support for class bridges is missing (I guess <property type="class">, without a property name, would work, but it's hardly elegant). See org.hibernate.search.annotations.ClassBridge and org.hibernate.search.cfg.EntityMapping.classBridge(Class<?>) in Hibernate Search.
- You need some way to assign analyzers to fields (e.g. property name="foo" analyzer="myAnalyzer"). See org.hibernate.search.annotations.Field.analyzer() and org.hibernate.search.cfg.FieldMapping.analyzer(Class<?>) in Hibernate Search.
- As Sanne mentioned, analyzer definitions are absolutely necessary. You may think they don't belong in entity mappings (and I could not agree more), but they should be somewhere, so that you can reference the definitions when defining index fields (see above).
- As Sanne mentioned, there should probably be some support for embedding other objects in an entity's index, though you could probably add it later. And yes, embedded objects may be mapped to index fields without having their own index (therefore the "indexed-entity-mapping" tag probably wouldn't be a good fit for those). See org.hibernate.search.annotations.IndexedEmbedded and org.hibernate.search.cfg.PropertyMapping.indexEmbedded() as well as org.hibernate.search.annotations.ContainedIn and org.hibernate.search.cfg.PropertyMapping.containedIn() in Hibernate Search.
- As Sanne mentioned, custom field bridges (and custom class bridges, but that's about the same) are an important feature and should probably be supported from the start. See org.hibernate.search.annotations.Field.bridge() and org.hibernate.search.cfg.PropertyMapping.bridge(Class<? extends FieldBridge>) in Hibernate Search.
Unadvisable features
- As Sanne mentioned, it would be best not to include index-time boost, since it will probably disappear soon. On top of that, I seem to remember it's buggy: if you contribute to the same index field from multiple properties, each time with the same boost, I believe the boost will grow exponentially with the number of contributing properties. So... you've been warned.
Cosmetics
- <property name="name" type="method"> "type" may not be a very explicit name... maybe "access-mode" would be better?
- Unless there's a specific reason for that, the "mode" attribute on the "spatial" tag should probably accept lowercase values instead of HASH and such, so it'll be consistent with other attributes. Also, it's called "spatial-mode" instead of just "mode" when used at a property level; you may want to use the same attribute name in both cases.
- The values for the "term-vector" attribute in the "field" tag should probably be "true/false/with-offsets/with-positions/with-position-offsets" to be consistent with "norms", "store", etc.
Index-driven mapping
Also, I'd like to point out that while annotation-based mapping is necessarily entity-driven, XML-based mapping could be index-driven, which may arguably be clearer when contributing to the same index field from multiple entities :
<indexed-entities> <!-- annotated entity --> <indexed-entity>org.infinispan.query.queries.faceting.Car</indexed-entity> <!-- non-annotated entity --> <index-entity-mapping index-name="people"> <!-- the FQN of the class to index --> <indexed-entity id="author">my.domain.model.Author</class> <indexed-entity id="fan">my.domain.model.Fan</class> <!-- optional --> <spatial name="place" mode="HASH"> <bridge entity="author"> <!-- class bridge --> </spatial> <!-- list of index fields --> <field name="name" store="true" index="true" analyze="true" norms="true" term-vector="true"> <bridge entity="author" property="name" access-mode="method" impl="my.package.MyBridge"> <bridge entity="fan" property="fullName" access-mode="method"> </field>
Of course, it would be a bit confusing for users switching over from annotation-based mapping, and it also may be a bit more challenging to implement. You may not want to do this for various reasons, but I wanted to mention it, since that's probably not something you'll be able to change later. At least now you can knowingly dismiss the idea!
I agree on the server vs domain metadata conflict raised by anistor.
For embedded mode, personally I hope all of the Cache configurations will be separated from the root CacheManager one day: as discussed in the past the "application level CacheManager" should be able to be automatically started (stopped) from a JVM scoped root via something like the FORK channel, or the equivalent at Infinispan level.
The same should happen for server: separation of configuration and lifecycle of caches needed for a specific application; in the server mode this could be similar to a multitenancy feature, to allow rolling upgrades / data migration of a specific set of caches belonging to one user/application.
My hope is that Cache configuration and its content metadata would still belong together, so making the above design sensible. Pretty much like tables and the table content in a RDBMs, what we're missing is the grouping of several tables in one logical database.
In short I agree this separation should happen but I don't think it's necessarily evil to have this specific metadata tied to a Cache, if we can agree that a "cut" of separation needs to be done somewhere between the Cache definition and say the JGroups configuration (or somewhere in between).
Putting domain related data or metadata in a server config xml is going to upset many developers. Having to list the class names that need to be indexed is mildly bad, I accepted it when we agreed to introduce it, especially because entity autodetection was kind enough to let you develop your domain model and then put the whole list of types there when you were done with it. But exposing the inner structure of your domain model adds a new dimension to the problem; makes me want to reconsider. Especially since we have a completely different mechanism to handle this for protobuf entities.
Current state: implementation of <indexed-entities> is just a list of type names, which are currently just classes, no protobuf types (yet). This works in embedded mode only. In server mode it has little chance of working due to various deployment dependencies issues; so this xml element is present in server's xsd but is ignored as of 9.0.0.final.
What if my POJO is not a POJO, but a protobuf entity? It'd nice, as a non-java client, to be able to define my mapping in a declarative XML way. This means keeping the schema generic enough to support both embedded and client/server cases.
We'd also need a section for the Analyzers, which are not necessarily tied to a certain entity, but can be referenced by name
High level questions
I assume the intent that - for a given class type A - it's going to be mapped either via annotations or via explicit mapping? (I would suggest to expect that, at least for the first iterations, as we don't support integrating the two at the moment)
(follow up on previous question) I hope you'll enforce (validate) against listing the same class in both ways
We will need to require the actual class definition to be "on classpath" during bootstrap of the Cache. I hope that's fine for the use case?
Property mapping
Let's not expose index-time boost. We'll deprecate / remove this soon.
I'm confused about the Spatial mapping example on city. Could you add the entity sources to clarify? i.e. you'll need matching @Latitude and @Longitude for each Spatial coordinates-set, unless these are embeddables but then you'll need nested mapping.
norms, term-vectors, etc.. : in the annotations case we use enums. Let's implement the XML schema to match, i.e. don't accept any string.
- Being able to support @FieldBridge is typically quite important.
Entity Properties vs Entity Fields
Hibernate Search can map either properties and/or fields. We can read/write to Java fields, and this is supported by Infinispan Query as well.
When having
<property name="name" type="method">
<field ...
I guess type refers to a getter? A "method" should typically match the full method name. Yet a "getter" is slang for property so the block already hinted about it.
Should <property> be named differently, and have two different identifiers to get rid of the type ?
Should the inner field be named documentField or indexField ?
The @Field annotation is not ambiguous as one can see it's not a POJO field, but in this context it reads ambiguously.
Nested structures?
What if my POJO contains a map/set of many (one/none/null) objects which need to be mapped ?
In Hibernate Search we support the recursion into the related types, but the child element doesn't have to be @Indexed on its own.
Careful with polymorphic collections.
Analyzer definitions?
I don't think we need to expose all of Hibernate Search features, but defining Analyzers is essential.
Mapping / converstion of Infinispan keys
The concept of a separate key is not native in Hibernate Search, yet you need a way to define how key types being used are two-way converted to the index format. This is represented as a @ProvidedId in the Hibernate Search internals but in this case you might want to expose them in a simpler form.
anistor Would it work to have the indexing config for remote in the XML as well, like described here?
anistor Is <indexed-entity> being used for protobuf entities as well?
Infinispan issue tracking has been migrated to GitHub issues: https://github.com/infinispan/infinispan/issues
If you still want this issue to be worked on, create a new issue on GitHub and link this issue.