Indexing in Zimbra

Photo by Elena G on Unsplash

For information about Lucene files:

You can inspect the class link below for Zimbra Lucene:

IndexField enum class on line 199 is important. Store and analyzed parameters belonging to the indexed fields are configured here. The classes in this folder, you can see custom analyzer — character filter — tokenizer — token filter classes. Here it is applied within the code instead of the “settings.json” file we use to implement ElasticSearch config in Spring Boot.

In Zimbra, first IndexDocument object is created. The fields’ store and parameters are declared here:

Another important class is this:

Especially, the “tokenstream” method is important on line 142. It directs to the classes which exists under the path in the provided link above, depending on the field. In IndexDocument, then the fields are set, the instructions in “analysis” are applied on them.

Zimbra indexes using Lucene but parametrically, we can set ElasticSearch config. For indexing in ElasticSearch, class is being used.

In ElasticSearch, There are other extra analyzers are added in code on top pf the ones in “analysis” folder (zimbrastandard, whitespace tokenizer, emailaddress, contactdata). To give an example, let’s say from mail address is “Senorita Developer <senoritadev@zimbrathree.nils.local>” . Lucene, while creating IndexDocument, does the analysis and converts it to “senorita developer senoritadev@zimbrathree.nils.local senoritadev @zimbrathree.nils.local zimbrathree.nils.local” (For ElasticSearch, IndexDocument is prepared by converting json formatted string). On top of it, these ElasticSearch analysis configurations are applied or just past by if not required.

topLevel.put("mappings", mappings);mappings.put(indexType, zimbra);zimbra.put("_source", source);source.put("enabled", false); // save spacesource.put("_all", false); // save spacezimbra.put("properties", properties);

As you can see from the piece of code above, for “ _source” enabled and _all definitions are done.

For indexed fields in Zimbra, I prepared an Excel like below (zimbra_index_fields):

Mail content exists in “l.content”. In indexing code flow, analyzed e-mail addresses, etc. are concatted to it.

We can also inspect Lucene document contents I mentioned at the top of the post from Zimbra. For example on my local Zimbra, my account with mboxId=3 is being indexed in file “/opt/zimbra/index/0/3/index/0”. A screenshot containing a part of its content:

There are some information in files with “.tis” but it is not easy to map and extract the real content (it looks as if meeting requests’ contents are visible but actually not the whole original content).

.tis file sample content from my local Zimbra:

For information about Zimbra Queries: “Fields — CONTENT field” title and its content can be examined.

Happy Coding!

I would love to change the world, but they won’t give me the source code | coding 👩🏼‍💻 | coffee ☕️ | jazz 🎷 | anime 🐲 | books 📚 | drawing 🎨

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Static and Dynamic Libraries

The Benefits of Being Your Own Boss During Christmas

How I overcame my most recent learning challenge

My high level understanding of realtime architecture

Activity|LaunchMode|With Lifecycle

Setting Up Redis Cluster

What the Hell Is Going On Here?

Unity Dev Blog: How to Build and Test Your Unity Game

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nil Seri

Nil Seri

I would love to change the world, but they won’t give me the source code | coding 👩🏼‍💻 | coffee ☕️ | jazz 🎷 | anime 🐲 | books 📚 | drawing 🎨

More from Medium

Micronaut application in heroku

First Step for Logging with Log4J2

Explore Prototype Design Pattern

Circuit Breaker Pattern With Netflix-Hystrix: Java