FORGEBOX Enterprise 🚀 - Take your ColdFusion (CFML) Development to Modern Times! Learn More...

Elasticsearch for the Coldbox Framework

v1.1.5+140 Public

Elasticsearch for the Coldbox Platform

LICENSE

Apache License, Version 2.0.

Installation

Via CommandBox: install cbelasticsearch

Instructions

The elasticsearch module for the Coldbox Platform provides you with a fluent search interface for Elasticsearch, in addition to a CacheBox Cache provider and a Logbox Appender. Both the cache provider and logbox appender rely on Wirebox DSL mappings to the Elasticsearch client. As such additional Wirebox configuration is necessary to use them outside of the Coldbox context.

Requirements

  • Coldbox >= v4.5
  • Elasticsearch >= v5.0
  • Lucee >= v4.5 or Adobe Coldfusion >= v11

Note: While only Elasticsearch 5.0 and above is supported, most of the REST-based methods will work on previous versions. A notable exception is the multi-delete methods, which use the delete by query functionality of ES5. As such, Cachebox and Logbox functionality would be limited.

Configuration

Once you have installed the module, you may add a custom configuration, specific to your environment, by adding an cbElasticsearch configuration object to your moduleSettings inside your Coldbox.cfc configuration file.

By default the following are in place, without additional configuration:

moduleSettings = {
    "cbElasticsearch" = {
        // The native client Wirebox DSL for the transport client
        client = "[email protected]",
        // The default hosts - an array of host connections
        //  - REST-based clients (e.g. JEST):  round robin connections will be used
        //  - Socket-based clients (e.g. Transport):  cluster-aware routing used
        hosts = [
            // The default connection is made to http://127.0.0.1:9200
            {
                serverProtocol = "http",
                serverName = "127.0.0.1",
                // Socket-based connections will use 9300
                serverPort = "9200"
            }
        ],
        // The default index
        defaultIndex = "cbElasticsearch",
        // The default number of shards to use when creating an index
        defaultIndexShards = 3,
        // The default number of index replicas to create
        defaultIndexReplicas = 0,
        // Whether to use separate threads for client transactions
        multiThreaded = true,
        // The maximum number of connections allowed per route ( e.g. search URI endpoint )
        maxConnectionsPerRoute = 10,
        // The maxium number of connectsion, in total for all Elasticsearch requests
        maxConnections = 100
    }
};

At the current time only the REST-based [JEST] native client is available. Support is in development for a socket based-client. For most applications, however the REST-based native client will be a good fit.

Creating Indexes

Indexing Basics

Elasticsearch documents are stored in "indexes", each which contain a "type".

It's easy to think of Elasticsearch indexes as the RDBMS equivalent of a database, and types as the equivalent of a table, however there are notable differences in the underlying architecture.

Elasticsearch engineer Adrien Grand on the comparisons:

the way data is stored is so different that any comparisons can hardly make sense, and this ultimately led to an overuse of types in cases where they were more harmful than helpful.

An index is stored in a set of shards, which are themselves Lucene indices. This already gives you a glimpse of the limits of using a new index all the time: Lucene indices have a small yet fixed overhead in terms of disk space, memory usage and file descriptors used. For that reason, a single large index is more efficient than several small indices: the fixed cost of the Lucene index is better amortized across many documents.

On types:

types are a convenient way to store several types of data in the same index, in order to keep the total number of indices low for the reasons exposed above... One nice property of types is that searching across several types of the same index comes with no overhead compared to searching a single type: it does not change how many shard results need to be merged

In short, indexes have a higher overhead and make the aggregation of search results between types very more expensive. If it is desired that your application search interfaces return multiple entity or domain types, then those should respresent distinctive types within a single index, allowing them to be aggregated, sorted, and ordered in search results.

Creating and Mapping an Index

The IndexBuilder model assists with the creation and mapping of indexes. Mappings define the allowable data types within your documents and allow for better and more accurate search aggregations. Let's say we have a book model that we intend to make searchable. We are storing this in our bookshop index, under the type of book. Let's create the index (if it doesn't exist) and map the type of book:

var indexBuilder = getInstance( "[email protected]" ).new(
    "bookshop",
    {
        "books" = {
            "_all" = { "enabled" = false },
            "properties" = {
                "title" = { "type" = "string" },
                "summary" = { "type" = "string" },
                "description" = { "type" = "string" },
                // denotes a nested struct with additional keys
                "author" = { "type" = "object" },
                // date with specific format type
                "publishDate" = {
                    "type" = "date",
                    // our format will be = yyyy-mm-dd
                    "format" = "strict_date"
                },
                "edition" = { "type" = "integer" },
                "ISBN" = { "type" = "integer" }
            }
        }
    }
).save();

If you use Elasticsearch > 6, replace above "type":"string" with "type":"text" (Elasticsearch has dropped the string type and is now using text).

We can also add mappings after the new() method is called:

// instantiate the index builder
var indexBuilder = getInstance( "[email protected]" ).new( "bookshop" );
// our mapping struct
var booksMapping = {
    "_all" = { "enabled" = false },
    "properties" = {
        "title" = { "type" = "string" },
        "summary" = { "type" = "string" },
        "description" = { "type" = "string" },
        // denotes a nested struct with additional keys
        "author" = { "type" = "object" },
        // date with specific format type
        "publishDate" = {
            "type" = "date",
            // our format will be = yyyy-mm-dd
            "format" = "strict_date"
        },
        "edition" = { "type" = "integer" },
        "ISBN" = { "type" = "integer" }
    }
};

// add the mapping and save
_Deprecation notice:  The index "type" ( e.g. "books" ) [has now been deprecated](https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html) in recent versions of Elasticsearch, and should no longer be used. Only a single type will be accepted in future releases._

Note that, in the above examples, we are applying the index and mappings directly from within the object, itself, which is intuitive and fast. We could also pass the `IndexBuilder` object to the `[email protected]` instance's `applyIndex( required IndexBuilder indexBuilder )` method, if we wished.

If an explicit mapping is not specified when the index is created, Elasticsearch will assign types when the first document is saved.

We've also passed a simple struct in to the index properties.  If we wanted to add additional settings or configure replicas and shards, we could pass a more comprehensive struct, including a [range of settings](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/index-modules.html) to the `new()` method to do so:

indexBuilder.new( "bookshop", { "settings" = { "number_of_shards" = 10, "number_of_replicas" = 2, "auto_expand_replicas" = true, "shard.check_on_startup" = "checksum" }, "mappings" = { "books" = { "_all" = { "enabled" = false }, "properties" = { "title" = { "type" = "string" }, "summary" = { "type" = "string" }, "description" = { "type" = "string" }, // denotes a nested struct with additional keys "author" = { "type" = "object" }, // date with specific format type "publishDate" = { "type" = "date", // our format will be = yyyy-mm-dd "format" = "strict_date" }, "edition" = { "type" = "integer" }, "ISBN" = { "type" = "integer" } } } } }

);


*Additional Reading:*

* [Elasticsearch Mapping Guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html)
* [Index Settings Reference](https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_settings.html)



## Mapping Builder

Introduced in `v1.0.0` the MappingBuilder model provides a fluent closure-based sytax for defining and mapping indexes.
This builder can be accessed by injecting it into your components:

component { property name="builder" inject="[email protected]"; }


The `new` method of the `IndexBuilder` also accepts a closure as the second (`properties`) argument.  If a closure is passed, a `MappingBuilder` instance is passed as an argument to the closure:

indexBuilder.new( "elasticsearch", function( builder ) { return { "_doc" = builder.create( function( mapping ) { mapping.text( "title" ); mapping.date( "createdTime" ).format( "date_time_no_millis" ); } ) }; } );


The `MappingBuilder` has one primary method: `create`.  `create` takes a callback with a `MappingBlueprint` object, usually aliased as `mapping`.

## Mapping Blueprint

The `MappingBlueprint` gives a fluent api to defining a mapping.  It has methods for all the ElasticSearch mapping types:

builder.create( function( mapping ) { mapping.text( "title" ); mapping.date( "createdTime" ).format( "date_time_no_millis" ); mapping.object( "user", function( mapping ) { mapping.keyword( "gender" ); mapping.integer( "age" ); mapping.object( "name", function( mapping ) { mapping.text( "first" ); mapping.text( "last" ); } ); } ); } )


As seen above, `object` expects a closure which will be provided another `MappingBlueprint`.  The results will be set as the `properties` of the `object` call.

## Parameters

Parameters can be chained on to a mapping type.  Parameters are set using `onMissingMethod` and will use the method name (as snake case) as the parameter name and the first argument passed as the parameter value.

builder.create( function( mapping ) { mapping.text( "title" ).fielddata( true ); mapping.date( "createdTime" ).format( "date_time_no_millis" ); } )


> You can also add parameters using the `addParameter( string name, any value )` or `setParameters( struct map )` methods.

The only exception to the parameters functions is `fields` which expects a closure argument and allows you to create multiple field definitions for a mapping.

builder.create( function( mapping ) { mapping.text( "city" ).fields( function( mapping ) { mapping.keyword( "raw" ); } ); } );


## Partials

The Mapping Blueprint also has a way to reuse mappings.  Say for instance you have a `user` mapping that gets repeated for managers as well.

The partial method accepts three different kinds of arguments:
1. A closure
1. A component with a `getPartial` method
1. A WireBox mapping to a component with a `getPartial` method

var partialFn = function( mapping ) { return mapping.object( "user", function( mapping ) { mapping.integer( "age" ); mapping.object( "name", function( mapping ) { mapping.text( "first" ); mapping.text( "last" ); } ); } ); };

builder.create( function( mapping ) { mapping.partial( "manager", partialFn ); mapping.partial( definition = partialFn ); // uses the partial's defined name, user in this case } );


The first approach is great for partials that are reused in the same index.
The second two approaches work better for partials that are reused across indexes.


Managing Documents
==================

Documents are the searchable, serialized objects within your indexes.  As noted above, documents may be assigned a type, allowing separation of schema, while still maintaining searchability across all documents in the index.   Within an index, each document is referenced by an `_id` value.  This `_id` may be set manually ( `document.setId()` ) or, if not provided will be auto-generated when the record is persisted.  Note that, if using numeric primary keys for your `_id` value, they will be cast as strings on serialization.

#### Creating a Document

The `Document` model is the primary object for creating and working with Documents.  Let's say, again, we were going to create a `book` typed document in our index.  We would do so, by first creating a `Document` object.

var book = getInstance( "[email protected]" ).new( index = "bookshop", type = "book", properties = { "title" = "Elasticsearch for Coldbox", "summary" = "A great book on using Elasticsearch with the Coldbox framework", "description" = "A long descriptio with examples on why this book is great", "author" = { "id" = 1, "firstName" = "Jon", "lastName" = "Clausen" }, // date with specific format type "publishDate" = dateTimeFormat( now(), "yyyy-mm-dd'T'hh:nn: ssZZ" ), "edition" = 1, "ISBN" = 123456789054321 } );

book.save();


In addition to population during the new method, we could also populate the document schema using other methods:

document.populate( myBookStruct )


or by individual setters:

document.setValue( "author", { "firstName" = "Jon", "lastName" = "Clausen" } );


If we want to manually assign the `_id` value, we would need to explicitly call `setId( myCustomId )` to do so, or would need to provide an `_id` key in the struct provided to the `new()` or `populate()` methods.

#### Retrieving documents

To retrieve an existing document, we must first know the `_id` value.  We can either retrieve using the `Document` object or by interfacing with the `Client` object directly.  In either case, the result returned is a `Document` object, i f found, or null if not found.

Using the `Document` object's accessors:

var existingDocument = getInstance( "[email protected]" ) .setIndex( "bookshop" ) .setTitle( "book" ) .setId( bookId ) .get();


Calling the `get()` method with explicit arguments:

var existingDocument = getInstance( "[email protected]" ) .get( id = bookId, index = "bookshop", type = "book" );


Calling directly, using the same arguments, from the client:

var existingDocument = getInstance( "[email protected]" ) .get( id = bookId, index = "bookshop", type = "book" );


#### Updating a Document

Once we've retrieved an existing document, we can simply update items through the `Document` instance and re-save them.

existingDocument.populate( properties = myUpdatedBookStruct ).save()


You can also pass Document objects to the `Client`'s `save()` method:

getInstance( "[email protected]" ).save( existingDocument );


#### Bulk Inserts and Updates

Builk inserts and updates can be peformed by passing an array of `Document` objects to the Client's `saveAll()` method:

var documents = [];

for( var myStruct in myArray ){ var document = getInstance( "[email protected]" ).new( index = myIndex, type = myType, properties = myStruct );

arrayAppend( documents, doucument );

}

getInstance( "[email protected]" ).saveAll( documents );


#### Deleting a Document

Deleting documents is similar to the process of saving.  The `Document` object may be used to delete a single item.

var document = getInstance( "[email protected]" ) .get( id = documentId, index = "bookshop", type = books ); if( !isNull( document ) ){ document.delete(); }


Documents may also be deleted by passing a `Document` instance to the client:

getInstance( "[email protected]" ).delete( myDocument );


Finally, documents may also be deleted by query, using the `SearchBuilder` ( more below ):

getInstance( "[email protected]" ) .new( index="bookshop", type="books" ) .match( "name", "Elasticsearch for Coldbox" ) .deleteAll();



Searching Documents
===================

The `SearchBuilder` object offers an expressive syntax for crafting detailed searches with ranked results.  To perform a simple search for matching documents documents, using Elasticsearch's automatic scoring, we would use the `SearchBuilder` like so:

var searchResults = getInstance( "[email protected]" ) .new( index="bookshop", type="books" ) .match( "name", "Elasticsearch" ) .execute();


By default this search will return an array of `Document` objects ( or an empty array if no results are found ), with a descending match score as the sort.

To output the results of our search, we would use a loop, accessing the `Document` methods:

for( var resultDocument in searchResults.hits ){ var resultScore = resultDocument.getScore(); var documentMemento = resultDocument.getMemento(); var bookName = documentMemento.name; var bookDescription = documentMemento.description; }


The "memento" is our structural representation of the document. We can also use the built-in method of the Document object:

for( var resultDocument in searchResults.hits ){ var resultScore = resultDocument.getScore(); var bookName = resultDocument.getValue( "name" ); var bookDescription = resultDoument.getValue( "description" ); }


### Search matching


#### Exact matching

The `term()` method allows a means of specifying an exact match of all documents in the search results.  An example use case might be only to search for active documents:

searchBuilder.term( "isActive", 1 );


Or a date:

searchBuilder.term( "publishDate", "2017-05-13" );


#### Boosting individual matches

The `match()` method of the `SearchBuilder` also allows for a `boost` argument.  When provided, results which match the term will be ranked higher in the results:

searchBuilder .match( "shortDescription", "Elasticsearch" ) .match( "description", "Elasticsearch" ) .match( name = "name", value = "Elasticsearch", boost = 0.5 );


In the above example, documents with a `name` field containing "Elasticsearch" would be boosted in score higher than those which only find the value in the short or long description.

#### Advanced Query DSL

The SearchBuilder also allows full use of the [Elasticsearch query language](https://www.elastic.co/guide/en/elasticsearch/reference/current/_introducing_the_query_language.html), allowing detailed configuration of queries, if the basic `match()`, `sort()` and `aggregate()` methods are not enough to meet your needs. There are several methods to provide the raw query language to the Search Builder.  One is during instantiation.  

In the following we are looking for matches of active records with "Elasticsearch" in the `name`, `description`, or `shortDescription` fields. We are also looking for a phrase match of "is awesome" and are boosting the score of the applicable document, if found.

var search = getInstance( "[email protected]" ) .new( index = "bookshop", type = "books", properties = { "query" = { "term" = { "isActive" = 1 }, "match" = { "name" = "Elasticsearch", "description" = "Elasticsearch", "shortDescription" = "Elasticsearch" }, "match_phrase" = { "description" = { "query" = "is awesome", "boost" = 2 } } } } ) .execute();


For more information on Elasticsearch query DSL, the [Search in Depth Documentation](https://www.elastic.co/guide/en/elasticsearch/guide/current/search-in-depth.html) is an excellent starting point.


#### Sorting Results

The `sort()` method also allows you to specify custom sort options.  To sort by author last name, instead of score, we would simply use:

searchBuilder.sort( "author.lastName", "asc" );


While our documents would still be scored, the results order would be changed to that specified.

#### Search Builder Function Reference:

* `new([string index], [string type], [struct properties])` - Populates a new SearchBuilder object.
* `reset()` - Clears the SearchBuilder and resets the DSL
* `deleteAll()` - Deletes all documents matching the currently built search query.
* `execute()` - Executes the built search
* `getDSL()` - Returns a struct containing the assembled Elasticsearch query DSL
* `match(string name, any value, [numeric boost], [struct options], [string matchType='any'])` - Applies a match requirement to the search builder query.
* `mustMatch(string name, any value, [numeric boost])` - `must` query alias for match().
* `mustNotMatch(string name, any value, [numeric boost])` - `must_not` query alias for match().
* `shouldMatch(string name, any value, [numeric boost])` - `should` query alias for match().
* `sort(any sort, [any sortConfig])` - Applies a custom sort to the search query.
* `term(string name, any value, [numeric boost])` - Adds an exact value restriction ( elasticsearch: term ) to the query.
* `aggregation(string name, struct options)`  - Adds an aggregation directive to the search parameters.

Counting Documents
===================

Sometimes you only need a count of matching documents, rather than the results of the query.  When this is the case, you can call the `count()` method from the search builder ( or using the client ) to only return the number of matched documents and omit the result set and metadata:

var docCount = getInstance( "[email protected]" ) .new( index = "bookshop", type = "books", properties = { "query" = { "term" = { "isActive" = 1 }, "match" = { "name" = "Elasticsearch", "description" = "Elasticsearch", "shortDescription" = "Elasticsearch" }, "match_phrase" = { "description" = { "query" = "is awesome", "boost" = 2 } } } } ) .count();


## Tests

To run the test suite you need a running instance of ElasticSearch.  We have provided a `docker-compose.yml` file in
the root of the repo to make this easy as possible.  Run `docker-compose up` in the root of the project and open
`http://localhost:8080/tests/runner.cfm` to run the tests.

If you would prefer to set this up yourself, make sure you start this app with the correct environment variables set:

```ini
ELASTICSEARCH_PROTOCOL=http
ELASTICSEARCH_HOST=127.0.0.1
ELASTICSEARCH_PORT=9200

Copyright Since 2005 ColdBox Framework by Luis Majano and Ortus Solutions, Corp www.coldbox.org | www.luismajano.com | www.ortussolutions.com


HONOR GOES TO GOD ABOVE ALL

Because of His grace, this project exists. If you don't like this, then don't read it, its not for you.

"...but we glory in tribulations also: knowing that tribulation worketh patience; And patience, experience; and experience, hope: And hope maketh not ashamed; because the love of God is shed abroad in our hearts by the Holy Ghost which is given unto us. ." Romans 5:5

THE DAILY BREAD

"I am the way, and the truth, and the life; no one comes to the Father, but by me (JESUS)" Jn 14:1-12

CHANGELOG

1.1.6

1.1.5

  • Updates Apache HTTP Client to v4.5.9
  • Adds count() methods to the SearchBuilder and Client

1.1.4

  • Implements url encoding for identifiers, to allow for spaces and special characters in identifiers

1.1.3

  • Implements update by query API and interface

1.1.2

  • Adds compatibility when Secure JSON prefix setting is enabled

1.1.1

  • Updates Java Dependencies, including JEST client, to latest versions
  • Implements search term highlighting capabilities

1.1.0

  • Updates to term and filterTerms SearchBuilder methods to allow for more precise filtering
  • Adds filterTerm method which allows restriction of the search context
  • Adds type and minimum_should_match parameters to multiMatch method in SearchBuilder

1.0.0

  • Adds support for Elasticsearch v6.0+
  • Adds a new MappingBuilder
  • Updates to SearchBuilder to alow for more complex queries with fewer syntax errors
  • Refactor filterTerms to allow other should or filter clauses
  • Add ability to specify _source excludes and includes in a query
  • ACF Compatibility Updates

0.3.0

  • Adds readTimeout and connectionTimeout settings
  • Adds defaultCredentials setting
  • Adds default preflight of query to fix common assembly syntax issues

0.2.1

  • Adds filterTerms() method to allow an array of term restrictions to the result set

0.2.0

  • Fixes pagination and offset handling
  • Adds support for terms filters in match()

0.1.0

  • Initial Release

Here are all the versions for this package. Please note that you can leverage CommandBox package versioning to install any package you like. Please refer to our managing package version guide for more information.

Version Created Last Update Published By Stable Actions
Current
1.1.6-snapshot Jun 26 2019 03:29 PM Jun 26 2019 03:29 PM
Version History
1.1.5+140 Jun 21 2019 07:22 PM Jun 21 2019 07:22 PM
1.1.5-snapshot Jun 03 2019 09:40 AM Jun 03 2019 09:40 AM
1.1.4+132 Jun 03 2019 09:27 AM Jun 03 2019 09:27 AM
1.1.4+131 Jun 03 2019 08:43 AM Jun 03 2019 08:43 AM
1.1.4-snapshot May 16 2019 06:02 AM May 16 2019 06:02 AM
1.1.3+117 May 11 2019 08:23 AM May 11 2019 08:23 AM
1.1.3-snapshot May 11 2019 08:07 AM May 11 2019 08:07 AM
1.1.2+110 Apr 24 2019 08:46 PM Apr 24 2019 08:46 PM
1.1.2-snapshot Apr 24 2019 08:30 PM Apr 24 2019 08:30 PM
1.1.1+100 Apr 17 2019 03:11 PM Apr 17 2019 03:11 PM
1.1.1-snapshot Apr 17 2019 03:14 PM Apr 17 2019 03:14 PM
1.1.0+92 Feb 27 2019 04:02 PM Mar 01 2019 03:44 PM
1.1.0-snapshot Feb 27 2019 03:57 PM Mar 01 2019 03:58 PM
1.0.0+84 Nov 29 2018 08:07 PM Nov 29 2018 08:07 PM
1.0.0-snapshot Feb 22 2019 11:17 AM Feb 22 2019 11:17 AM
0.3.0+61 Apr 27 2018 11:47 AM Apr 27 2018 11:47 AM
0.3.0+56 Dec 20 2017 06:33 AM Dec 20 2017 06:33 AM
0.3.0+53 Nov 23 2017 09:17 AM Nov 23 2017 09:17 AM
0.3.0+50 Nov 17 2017 07:07 AM Nov 17 2017 07:07 AM
0.3.0+48 Nov 16 2017 09:14 PM Nov 16 2017 09:14 PM
0.3.0-snapshot Nov 16 2017 08:53 PM Nov 29 2018 05:34 PM
0.2.1+42 Oct 01 2017 10:45 AM Oct 01 2017 10:45 AM
0.2.0+38 Sep 23 2017 09:16 AM Sep 23 2017 09:16 AM
0.2.0-snapshot Sep 23 2017 08:50 AM Sep 23 2017 08:50 AM
0.1.0+29 Aug 03 2017 12:15 PM Aug 03 2017 12:15 PM
0.1.0+27 Jun 07 2017 11:00 AM Jun 07 2017 11:00 AM
0.1.0+25 Jun 01 2017 04:47 PM Jun 01 2017 04:47 PM
0.1.0+22 Jun 01 2017 08:34 AM Jun 01 2017 08:34 AM
0.1.0+20 May 30 2017 05:24 PM May 30 2017 05:24 PM
0.1.0+17 May 22 2017 08:28 AM May 22 2017 08:28 AM
0.1.0+13 May 17 2017 02:12 PM May 17 2017 02:12 PM
0.1.0-snapshot May 15 2017 07:15 AM Aug 03 2017 12:15 PM

 

No collaborators yet.
     
  • May 12 2017 12:16 PM
  • Jun 26 2019 03:29 PM
  • 2430
  • 3258
  • 2057