Wednesday, May 27, 2015

Cloudant DB and Java on Bluemix

Dependencies


I've added this dependency to my Maven POM file:
<dependency>
 <groupId>com.cloudant</groupId>
 <artifactId>cloudant-client</artifactId>
 <version>1.0.1</version>
</dependency>



Java Client


Assuming a POJO named MyDoc with two fields
{ private String _id, private java.util.Collection lines }
this client will add and retrieve an instance of the class:
public class CloudantTest {

 public static final String DBNAME = "my-db";

 public static LogManager logger = new LogManager(CloudantTest.class);

 public static void main(String... args) throws Throwable {

  String url = ... ;
  String username = ... ;
  String password = ... ;

  CloudantClient client = new CloudantClient(url, username, password);
  logger.debug("Connected to Cloudant:\n\turl = %s\n\tserver-version = %s", url, client.serverVersion());

  List<String> databases = client.getAllDbs();

  /* drop the database if it exists */
  for (String db : databases)
   if (DBNAME.equals(db)) client.deleteDB(DBNAME, "delete database");

  /* create the db */
  client.createDB(DBNAME);
  Database db = client.database(DBNAME, true);

  /* POJO does not exist and will throw an exception */
  try {
   MyDoc doc = db.find(MyDoc.class, "100");
  } catch (NoDocumentException e) {
   logger.debug("Transcript not found (id = %s)", "100");
  }

  Response response = db.save(doc);
  logger.debug("Saved Document (id = %s)", response.getId());

  MyDoc doc = db.find(Transcript.class, "100");
  logger.debug("Found Document (id = %s)", doc.get_id());

  client.deleteDB(DBNAME, "delete database");
 }
 
 private static MyDoc getDoc() {
  List<String> lines = new ArrayList<String>();
  lines.add("transcript line 1");
  lines.add("transcript line 2");
  lines.add("transcript line 3");

  MyDoc doc = new MyDoc();
  doc.set_id("100");
  doc.setLines(lines);
  
  return doc;
 }
}



References

  1. [Github] A Java client for Cloudant

Wednesday, May 13, 2015

Generating an Automated Tokenization Test for Solr

Introduction


There is a variety of ways to configure the tokenizer within Solr.

The configuration of the tokenizer can be automated for the purpose of testing how each variation performs.  In summary, the schema.xml file is incrementally adjusted, a Docker/Solr container is launched, a Java analysis query is issued against Docker/Solr, and the analysis result is written to file.

An activity diagram depicting this flow is shown:

Fig 1: Activity Diagram


The Query Analyzer


The Query Analyzer depicted below is incrementally modified using the sed comamnd within a shell script:
<!-- Query Analyzer -->
<analyzer type="query">
 <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([a-zA-Z])\+1" replacement="$1$1" />
 <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([a-zA-Z])(/)([a-zA-Z])" replacement="$1 or $3" />
 <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\()(.)+(\))" replacement="$2" />
 <tokenizer class="solr.WhitespaceTokenizerFactory" />
 <filter class="solr.WordDelimiterFilterFactory"
  generateWordParts="#1"
  splitOnCaseChange="#2"
  splitOnNumerics="#3"
  stemEnglishPossessive="#4"
  preserveOriginal="#5"
  catenateWords="#6"
  generateNumberParts="#7"
  catenateNumbers="#8"
  catenateAll="#9"
  types="wdfftypes.txt" />
 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
 <filter class="solr.LowerCaseFilterFactory" />
 <filter class="solr.ASCIIFoldingFilterFactory" />
 <filter class="solr.KStemFilterFactory" />
 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
</analyzer>

The shell script that performs this incremental modification is invoked like this:
$ ./modify-schema.sh 0 1 0 0 0 0 1 1 0
The trailing numbers are placed into the relevant sections of the LuceneWordDelimiterFactory (LWDF) above.


modify-schema.sh


This script likely isn't going to win any points for style, but it takes the user params and seds them into the schema.xml file:
clear

# be root
echo root | sudo -S echo 'done'

# echo params
echo 'params: '1=$1, 2=$2, 3=$3, 4=$5, 5=$5, 6=$6, 7=$7, 8=$8, 9=$9

# place the shell args into the schema file
cat solr_data/transcripts/conf/schema.xml.bak       > solr_data/transcripts/conf/schema.xml
cat solr_data/transcripts/conf/schema.xml  | sed 's/#1/'$1'/' > solr_data/transcripts/conf/temp.xml
cat solr_data/transcripts/conf/temp.xml  | sed 's/#2/'$2'/' > solr_data/transcripts/conf/schema.xml
cat solr_data/transcripts/conf/schema.xml  | sed 's/#3/'$3'/' > solr_data/transcripts/conf/temp.xml
cat solr_data/transcripts/conf/temp.xml  | sed 's/#4/'$4'/' > solr_data/transcripts/conf/schema.xml
cat solr_data/transcripts/conf/schema.xml  | sed 's/#5/'$5'/' > solr_data/transcripts/conf/temp.xml
cat solr_data/transcripts/conf/temp.xml  | sed 's/#6/'$6'/' > solr_data/transcripts/conf/schema.xml
cat solr_data/transcripts/conf/schema.xml  | sed 's/#7/'$7'/' > solr_data/transcripts/conf/temp.xml
cat solr_data/transcripts/conf/temp.xml  | sed 's/#8/'$8'/' > solr_data/transcripts/conf/schema.xml
cat solr_data/transcripts/conf/schema.xml  | sed 's/#9/'$9'/' > solr_data/transcripts/conf/temp.xml
cat solr_data/transcripts/conf/temp.xml         > solr_data/transcripts/conf/schema.xml

# launch the docker container in a new instance
gnome-terminal -e ./reset.sh

# sleep for 20 seconds (give solr time to instantiate)
sleep 20

# invoke the JAR file that runs the analysis
java -cp \
  uber-ebear-scripts-testing-1.0.0.jar \
  com.ibm.ted.ebear.scripts.testing.SolrjAnalysis \
  $1 $2 $3 $4 $5 $6 $7 $8 $9

# stop the solr instance
./stop.sh



reset.sh


Resets the Solr Container:
clear
echo root | sudo -S echo 'done'
./stop.sh
sudo rm -rf /home/craig/ebear/solr
./run.sh



stop.sh


Stops the Solr Container:
# stop all docker containers
# <https://coderwall.com/p/ewk0mq/stop-remove-all-docker-containers>
sudo docker stop $(sudo docker ps -a -q)
sudo docker rm $(sudo docker ps -a -q)
# <http://jimhoskins.com/2013/07/27/remove-untagged-docker-images.html>
sudo docker rmi $(sudo docker images | grep "^<none>" | awk "{print $3}")



run.sh


Launches a new Solr Container:
  • create_dirs() {
     if [ ! -d "$2" ]; then
      echo "creating data volume ..."
      mkdir -p $2/conf
      cp $1/core.properties $2
      cp $1/conf/* $2/conf
      mkdir -p $2/data
      chmod -R 777 $2
     else
      echo "data volume already exists"
     fi
    }
    
    # copy SOLR core for 'transcripts' 
    create_dirs \
      solr_data/transcripts \
      ~/ebear/solr/transcripts
    
    sudo docker-compose up
    



    Solrj Analysis


    This class access the Solr analyzer and writes the results to file.

    The Solrj Analyzer has a list of terms to test the tokenizer against. These terms are URL encoded and inserted into a hard-coded query string against the Solr server (known LAN IP). An Xml/Xpath analysis is performed against the XML returned from the query, and the results are prepared into a TSV format for appending to file. There's a fair bit of hard-coding in here at the moment, but that can easily be extrapolated to external properties files or into parameters fed to the main method at runtime.

    package com.mycompany.testing;
    
    import java.net.*;
    import java.util.*;
    import javax.xml.parsers.*;
    import org.w3c.dom.*;
    import com.mycompany.utils.*;
    
    public class SolrjAnalysis {
    
     public static LogManager logger = new LogManager(SolrjAnalysis.class);
    
     public static void main(String... args) throws Throwable {
    
      List<String> terms = ListUtils.toList("WiFi", "WiFi's", "Wi-Fi", "O'Reilly's", "U.S.A", "can't", "what're", "afford.", "that!", "where?", "well,well", "now...", "craigtrim@gmail.com", "http://www.ibm.com", "@cmtrm", "#theartoftokenization");
      String encodedUrl = URLEncoder.encode(StringUtils.toString(terms, " "), Codepage.UTF_8.toString());
      logger.debug("Created Encoded URL String: %s", encodedUrl);
    
      String urlString = "http://192.168.1.73:8983/solr/transcripts/analysis/field?wt=xml&analysis.fieldvalue=" + encodedUrl + "&analysis.fieldtype=text_transcript";
    
      URL url = new URL(urlString);
      URLConnection conn = url.openConnection();
    
      DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
      DocumentBuilder builder = factory.newDocumentBuilder();
      Document dom = builder.parse(conn.getInputStream());
    
      Map<String, Map<String, Set<String>>> totalManipulationsByType = new HashMap<String, Map<String, Set<String>>>();
    
      Collection<String> types = getTypes(dom);
    
      List<String> orderedTypes = new ArrayList<String>();
      for (String type : types) {
    
       Map<String, Set<String>> innerMap = totalManipulationsByType.containsKey(type) ? totalManipulationsByType.get(type) : new HashMap<String, Set<String>>();
    
       Collection<Element> elements = XpathUtils.evaluateElements(dom, String.format("descendant-or-self::arr[@name='%s']/lst", type));
       for (Element element : elements) {
    
        String position = XpathUtils.evaluateText(element, "descendant-or-self::arr[@name='positionHistory']/int/text()");
        String text = XpathUtils.evaluateText(element, "descendant-or-self::str[@name='text']/text()");
    
        logger.debug("Extracted Variation (position = %s, text = %s, type = %s, )", position, text, type);
        if (!orderedTypes.contains(type)) orderedTypes.add(type);
    
        Set<String> innerSet = (innerMap.containsKey(position)) ? innerMap.get(position) : new HashSet<String>();
        innerSet.add(text);
        innerMap.put(position, innerSet);
       }
    
       totalManipulationsByType.put(type, innerMap);
      }
    
      StringBuilder sb = new StringBuilder();
    
      sb.append(getHeader(orderedTypes));
      
      String _params = StringUtils.toString(args, "\t");
    
      for (int i = 1; i < terms.size() + 1; i++) {
       StringBuilder sbBody = new StringBuilder();
       sbBody.append(_params + "\t");
    
       for (String key : orderedTypes) {
        Set<String> values = totalManipulationsByType.get(key).get(String.valueOf(i));
        if (null == values) sbBody.append("\t");
        else sbBody.append(StringUtils.toString(values, ",") + "\t");
       }
    
       sb.append(sbBody.toString() + "\n");
      }
    
      System.err.println(sb.toString());
      FileUtils.toFile(sb, "/home/craig/ebear/analysis.dat", true, Codepage.UTF_8);
     }
    
     private static String getHeader(List<String> orderedTypes) {
    
      StringBuilder sbH1 = new StringBuilder();
      sbH1.append("generateWordParts\tsplitOnCaseChange\tsplitOnNumerics\tstemEnglishPossessive\tpreserveOriginal\tcatenateWords\tgenerateNumberParts\tcatenateNumbers\tcatenateAll");
    
      StringBuilder sbH2 = new StringBuilder();
      for (String key : orderedTypes) {
    
       String _key = StringUtils.substringAfterLast(key, ".");
       sbH2.append(_key + "\t");
      }
    
      return sbH1 + "\t" + sbH2.toString() + "\n";
     }
    
     private static Collection<String> getTypes(Document dom) throws Throwable {
      List<String> list = new ArrayList<String>();
    
      Collection<Element> elements = XpathUtils.evaluateElements(dom, "descendant-or-self::lst[@name='index']/arr");
      for (Element element : elements)
       list.add(element.getAttribute("name"));
    
      logger.debug("Extracted Types (total = %s):\n\t%s", list.size(), StringUtils.toString(list, "\n\t"));
      return list;
     }
    }
    



    References

    1. [Blogger] Docker and Solr
  • SolrJ: Java API for Solr

    Introduction


    SolrJ is a Java client to access solr. SolrJ offers a java interface to add, update, and query the solr index.


    HttpSolrServer


    I use Spring to load this properties file:
    solr.host   = 127.0.0.1
    solr.port   = 8983
    solr.core   = documents
    
    # defaults to 0.  > 1 not recommended
    solr.maxretries   = 1
    
    # 5 seconds to establish TCP
    solr.connectiontimeout  = 5000
    
    # socket read timeout
    solr.socketreadtimeout  = 50000
    
    # max connections per host
    solr.maxconnectionsperhost = 250
    
    # max total connections
    solr.maxtotalconnections = 100
    
    # defaults to false
    solr.followredirects  = false
    
    # defaults to false
    # Server side must support gzip or deflate for this to have any effect.
    solr.allowcompression  = true
    

    and to populate this method:
    public static HttpSolrServer transform(String url, boolean allowcompression, Integer connectiontimeout, String core, boolean followredirects, Integer maxconnectionsperhost, Integer maxretries, Integer maxtotalconnections, Integer socketreadtimeout) throws AdapterValidationException {
     HttpSolrServer server = new HttpSolrServer(url);
    
     server.setMaxRetries(maxretries); 
     server.setConnectionTimeout(connectiontimeout); 
     
     /* Setting the XML response parser is only required for cross
        version compatibility and only when one side is 1.4.1 or
        earlier and the other side is 3.1 or later. */
     server.setParser(new XMLResponseParser()); 
     
     /* The following settings are provided here for completeness.
        They will not normally be required, and should only be used 
        after consulting javadocs to know whether they are truly required. */
     server.setSoTimeout(socketreadtimeout); 
     server.setDefaultMaxConnectionsPerHost(maxconnectionsperhost);
     server.setMaxTotalConnections(maxtotalconnections);
     server.setFollowRedirects(followredirects);
     server.setAllowCompression(allowcompression);
    
     return server;
    }
    

    By populating the method above with the values from the properties file, the system is able to work with a configured instance of the HttpSolrServer.


    Query Snippets


    This code will find the total records that mention the word "climate":
    public static void main(String... args) throws Throwable {
    
     HttpSolrServer server = HttpSolrServerAdapter.transform();
     assertNotNull(server);
    
     SolrQuery q = new SolrQuery("text:climate");
     q.setRows(0); // don't actually request any data
    
     long total = server.query(q).getResults().getNumFound();
     logger.info("Total Records (total = %s)", total);
    
    }
    


    A reusable method for executing a query:
    public static QueryResponse execute(
     String queryName, 
     HttpSolrServer server, 
     SolrQuery solrQuery) 
     throws BusinessException {
     try {
    
      QueryResponse queryResponse = server.query(solrQuery);
      logger.debug("Query Statistics " +
       "(query-name = %s, elapsed-time = %s, query-time = %s, status = %s, request-url = %s)",
        queryName, 
        queryResponse.getElapsedTime(), 
        queryResponse.getQTime(), 
        queryResponse.getStatus(), 
        queryResponse.getRequestUrl());
    
      SolrDocumentList solrDocumentList = queryResponse.getResults();
      logger.debug("Total Records " +
       "(query-name = %s, total = %s, query = %s)", 
        queryName, 
        StringUtils.format(solrDocumentList.size()), 
        solrQuery.toString());
    
      return queryResponse;
    
     } catch (SolrServerException e) {
      logger.error(e);
      throw new BusinessException("Unable to Query Server (query-name = %s, message = %s)", e.getMessage());
     }
    }
    



    References

    1. [Apache] SolrJ Wiki
    2. [Apache]  SolrJ Reference Guide

    Thursday, April 23, 2015

    The Apache Solr Query Architecture

    Overview

    Fig 1: Solr Query Architecture

    The diagram above demonstrates a document that contains the text: "the greenhouse gas effect".

    A user query for "climate change" will result in a scored relevancy match against this document.

    The user query against the Solr server is very simple:
    q=climate change
    Solr will augment this query in two stages:

    1. The Request Handler will add additional meta-data about how the query should be executed.  Notions of relevance, number of rows returned, if highlighting should be used, the fields to query and return, are all specified here.
      1. The query then becomes:
        q=climate change&defType=xml&wt=xml&fl=id title text&qf=title^2 text&rows=10&pf=title^2 text&ps=5&echoParams=all&hl=true&hl.fl=title text&debug=true
    2. The Query Analyzer will perform a linguistic analysis of the user query.  Tokenization, pattern filtering, stemming, synonyms, etc are all specified here.
      1. The query then becomes:
        q=(+((DisjunctionMaxQuery((text:climate | title:climate^2.0 | speaker:climate)) DisjunctionMaxQuery((text:change | title:change^2.0 | speaker:change)))~2) DisjunctionMaxQuery((title:"(greenhouse ghg climate climate deforestation pollution greenhouse carbon co2 methane nitrous n2o hydroflurocarbons hfcs perfluorocarbons pfcs sulfur sf6) (gas change shift gasses dioxide oxide hexafluoride)"~5^2.0 | text:"(greenhouse ghg climate climate deforestation pollution greenhouse carbon co2 methane nitrous n2o hydroflurocarbons hfcs perfluorocarbons pfcs sulfur sf6) (gas change shift gasses dioxide oxide hexafluoride)"~5)))/no_coord&defType=xml&wt=xml&fl=id title text&qf=title^2 text&rows=10&pf=title^2 text&ps=5&echoParams=all&hl=true&hl.fl=title text&debug=true

    This is a powerful design technique for abstracting complexity away from the user query while creating very complex and specific queries to find relevant documents.


    Request Handler


    The first augmentation stage is controlled by the request handler.

    Request handlers are defined within solrconfig.xml:
    <requestHandler name="/docQuery" class="solr.SearchHandler">
      <lst name="defaults">
        <str name="defType">edismax</str>
        <str name="wt">xml</str>
        <str name="fl">id author abstract heading text</str>
        <str name="qf">title^4 abstract^2 text</str>
        <str name="rows">10</str>
        <str name="pf">title^4 abstract^2 text</str>
        <str name="ps">5</str>
        <str name="echoParams">all</str>
        <str name="mm">3&lt;-1 5&lt;-2 6&lt;-40%</str> 
        <str name="hl">true</str>
        <str name="hl.fl">title abstract text</str>
        <str name="debug">true</str>
        <str name="explain">true</str>
      </lst>
    </requestHandler> 
    

    This request handler creates the node entitled "Augmented User Query 1".  So here's a major benefit to the configuration files already.  The user (or the application) didn't have to append all this information to the query string.  It's appended by default to each query.


    Query Analyzer


    The query is next augmented by each field that is being searched.

    Within the schema.xml file, I have a field defined for text:
        <fieldType name="text_doc" class="solr.TextField" positionIncrementGap="100">
    
          <!-- Indexer -->
          <analyzer type="index">
            <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([a-zA-Z])\1+" replacement="$1$1" />
            <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([a-zA-Z])(/)([a-zA-Z])" replacement="$1 or $3" />
            <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\()(.)+(\))" replacement="" />
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.WordDelimiterFilterFactory"
              generateWordParts="1"
              splitOnCaseChange="1"
              splitOnNumerics="1"
              stemEnglishPossessive="1"
              preserveOriginal="1"
              catenateWords="1"
              generateNumberParts="1"
              catenateNumbers="1"
              catenateAll="1"
              types="wdfftypes.txt" />
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
            <filter class="solr.EnglishPossessiveFilterFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.ASCIIFoldingFilterFactory" />
            <filter class="solr.StemmerOverrideFilterFactory" dictionary="stemdict.txt" />
            <filter class="solr.KStemFilterFactory" />
          </analyzer>
    
          <!-- Query Analyzer -->
          <analyzer type="query">
            <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([a-zA-Z])\+1" replacement="$1$1" />
            <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([a-zA-Z])(/)([a-zA-Z])" replacement="$1 or $3" />
            <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\()(.)+(\))" replacement="$2" />
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.WordDelimiterFilterFactory"
              generateWordParts="1"
              splitOnCaseChange="1"
              splitOnNumerics="1"
              stemEnglishPossessive="1"
              preserveOriginal="1"
              catenateWords="1"
              generateNumberParts="1"
              catenateNumbers="1"
              catenateAll="1"
              types="wdfftypes.txt" />
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
            <filter class="solr.EnglishPossessiveFilterFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.ASCIIFoldingFilterFactory" />
           <filter class="solr.StemmerOverrideFilterFactory" dictionary="stemdict.txt" />
            <filter class="solr.KStemFilterFactory" />
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
          </analyzer>
    
        </fieldType>
    


    Note that in the configuration file above, there are two analyzers specified for the field called "text_doc".  One analyzer is for the document text being indexed during the ingestion phase.  The other analyzer is for the user query text that triggers the search (for the indexed text).  In both cases, the configuration is largely identical, except for the use of synonyms.

    This is an important concept to grasp.  If the indexed content and the user query are both treated by (nearly) identical analyzers, it's going to be a lot easier to find relevant text.  As a counter example, imagine having to design a tokenization pipeline for user queries against content indexed by multiple, unknown configurations.  If you use aggresive stemming and wildcards to boost recall, this will come at the expense of precision.


    References

    1. [YouTube, 5:54] Apache Solr: Complex Query Format

    Tuesday, April 14, 2015

    Apache Solr and Docker (for Beginners)

    Introduction


    It is possible to find instructions for installing Docker directly on your OS. But there's almost no reason these days not to use docker. There are plenty of docker containers for Solr, and the operation (as I will demonstrate in this article) is almost trivial. Launching a docker container from a trusted image not only saves time and effort, but leverages best-practices installation and configuration techniques, at least where official or highly trusted Dockerfiles are concerned.

    In this tutorial, I'll walk through launching a docker container for Solr, attaching an external data volume, and demonstrating the successful GET and POST of data to Solr between container lifecycles.

    Using the docker search command, I find that these images have already been built:
    craig@devenv:~$ sudo docker search solr
    NAME                              DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
    makuk66/docker-solr               Solr is the popular, blazing-fast, open so...   37                   [OK]
    guywithnose/solr                                                                  6                    [OK]
    raycoding/piggybank-solr-tomcat   SolrCloud on Docker                             4                    [OK]
    cygri/solr                        A custom build of Solr for use with CKAN        3                    [OK]
    pointslope/solr                   This is a lightweight Apache Solr installa...   2                    [OK]
    yoshz/solr                        A docker image running SOLR  on top of Ubu...   1                    [OK]
    geoblacklight/solr                Solr for GeoBlacklight                          0                    [OK]
    infotechsoft/solr                 SOLR installed on CentOS using openjdk7         0                    [OK]
    reinblau/solr3                    Apache Solr , prepared for Search Api Solr      0                    [OK]
    kyberna/solr                                                                      0                    [OK]
    encoflife/solr                                                                    0                    [OK]
    lphoward/vehicleforge-solr        Dockerfile and resources for Apache Solr o...   0                    [OK]
    cpilsworth/solr                   Solr 4.10.2 on Jetty 9.2.5 using Oracle Ja...   0                    [OK]
    holmes/solr                                                                       0                    [OK]
    writl/solr-typo3                  Apache Solr configured for Typo3 solr exte...   0                    [OK]
    hiroara/solr                                                                      0                    [OK]
    blinkreaction/drupal-solr                                                         0                    [OK]
    quirky/solr                                                                       0                    [OK]
    eccenca/ckan-solr                                                                 0                    [OK]
    obi12341/solr-typo3               Please use writl/solr-typo3. This repo wil...   0                    [OK]
    pmoust/solr                       Solr container image - Ubuntu Trusty (LTS)...   0                    [OK]
    pataquets/solr                                                                    0                    [OK]
    anapsix/docker-solr               SOLR Java8 / Ubuntu 14.04. Includes JDBC f...   0                    [OK]
    manycore/solr                     Solr is the popular open source enterprise...   0                    [OK]
    dhorbach/solr                                                                     0                    [OK]
    craig@devenv:~$
    


    At the time of this article, the first docker image is the most popular, and has a Dockerfile with a configuration that I agree with. The image is built on top of java:8, which in turn uses dockerfile/ubuntu as the base image. I like this design approach rather than pulling the OS directly. If I ever want to extend this Dockerfile, I'm generally happier to work with an underlying Debian image as well, although this isn't generally much of an issue with Docker.

    The image is also well-supported, with an updated configuration for the Solr 5.1 release being released today.


    Building the Image


    This is an optional section, but as I've mentioned here, I like to piggyback on Dockerfiles that I'm planning to use on a project.

    Given a local docker registry and multiple team members, it becomes easier to point members to the registry and a standard naming scheme for images. In this case, I'm not adding any extensions to the Dockerfile at the moment, but that could always come later.

    My copy looks like:
    FROM    makuk66/docker-solr
    MAINTAINER  Craig Trim "craigtrim@gmail.com"
    

    and I build this using
    $ sudo docker build -t craig/solr .
    

    from within the directory containing my Dockerfile.


    Launching a Container


    I can launch a container from the image directly by typing
    $ sudo docker run -d -p 8983:8983 craig/solr
    691f2adee14563904996cbc465c513861860dd654a011063dc31f89660def543
    

    This launches the container and exposes the Solr installation's port 8983 on the Host port 8983.  I could use any Host port I want, and if I launch multiple containers from this image, will indeed need to select something different.


    The installation can be verified via a web browser:
    Fig 1: http://192.168.x.y:8983/solr

    A slight problem however: if I click on "Core Admin" in the dashboard and try to add a core, I get an error:

    Fig 2: SolrCore Initialization Error
    And at any rate, if I was able to add a core, what would this prove?  Given the ephemeral nature of Docker containers, any data or configuration I peform will be lost when the container is closed.  Granted, I could always commit my changes to the local image, but then I seem to lose much of the value of Docker.  My Dockerfile is no longer quite as relevant, and it becomes necessary to share a local image with the rest of my team (and even between my own devices), rather than simply pointing to a script.


    Attaching a Volume

    If you're familiar with fundamental Java concepts (particularly within the Spring Framework) the concept of "dependency injection" may be familiar.  Logic is crafted with reliance on a component, but without the requirement of managing that component's lifecycle.  And if you're not familiar with this dessign pattern, that's fine.  The docker concepts are simple enough as they stand.

    I'm tempted to call the attachement of data volumes to a docker image "directory injection".  The pattern is clear enough: the container relies on a given directory, or many given directories.  Now, whether these directories exist (and are configured) within the container, or on the host that the container is running on, isn't that important.

    Again, given the ephemeral nature of a docker container, it is often necessary to use a data volume on the host, and make the container dependent on this.

    In the case of Solr, we'll want to inject a directory into our container that holds a configuration for a Solr core.

    We do that like this:
    $ sudo docker run -p 8983:8983 -v /home/craig/solr_data/:/opt/solr/server/solr/books craig/solr
    

    The host information is in green and the container information in orange.

    In some cases (particularly when debugging, or just getting started), I avoid running a container in detached mode, and focus on the immediate log output.

    As in this case:
    $ sudo docker run -p 8983:8983 -v /home/craig/solr_data/:/opt/solr/server/solr/books craig/solr
    
    Starting Solr on port 8983 from /opt/solr/server
    
    0    [main] INFO  org.eclipse.jetty.server.Server  ? jetty-8.1.10.v20130312
    48   [main] INFO  org.eclipse.jetty.deploy.providers.ScanningAppProvider  ? Deployment monitor /opt/solr-5.0.0/server/contexts at interval 0
    55   [main] INFO  org.eclipse.jetty.deploy.DeploymentManager  ? Deployable added: /opt/solr-5.0.0/server/contexts/solr-jetty-context.xml
    122  [main] INFO  org.eclipse.jetty.webapp.WebInfConfiguration  ? Extract jar:file:/opt/solr-5.0.0/server/webapps/solr.war!/ to /opt/solr-5.0.0/server/solr-webapp/webapp
    1541 [main] INFO  org.eclipse.jetty.webapp.StandardDescriptorProcessor  ? NO JSP Support for /solr, did not find org.apache.jasper.servlet.JspServlet
    1598 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  ? SolrDispatchFilter.init()WebAppClassLoader=2028371466@78e67e0a
    1622 [main] INFO  org.apache.solr.core.SolrResourceLoader  ? JNDI not configured for solr (NoInitialContextEx)
    1623 [main] INFO  org.apache.solr.core.SolrResourceLoader  ? using system property solr.solr.home: /opt/solr/server/solr
    1624 [main] INFO  org.apache.solr.core.SolrResourceLoader  ? new SolrResourceLoader for directory: '/opt/solr/server/solr/'
    1776 [main] INFO  org.apache.solr.core.ConfigSolr  ? Loading container configuration from /opt/solr/server/solr/solr.xml
    1869 [main] INFO  org.apache.solr.core.CoresLocator  ? Config-defined core root directory: /opt/solr/server/solr
    1878 [main] INFO  org.apache.solr.core.CoreContainer  ? New CoreContainer 1048855692
    1879 [main] INFO  org.apache.solr.core.CoreContainer  ? Loading cores into CoreContainer [instanceDir=/opt/solr/server/solr/]
    1892 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  ? Setting socketTimeout to: 600000
    1892 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  ? Setting urlScheme to: null
    1896 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  ? Setting connTimeout to: 60000
    1899 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  ? Setting maxConnectionsPerHost to: 20
    1900 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  ? Setting maxConnections to: 10000
    1900 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  ? Setting corePoolSize to: 0
    1900 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  ? Setting maximumPoolSize to: 2147483647
    1900 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  ? Setting maxThreadIdleTime to: 5
    1901 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  ? Setting sizeOfQueue to: -1
    1901 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  ? Setting fairnessPolicy to: false
    1906 [main] INFO  org.apache.solr.handler.component.HttpShardHandlerFactory  ? Setting useRetries to: false
    2037 [main] INFO  org.apache.solr.update.UpdateShardHandler  ? Creating UpdateShardHandler HTTP client with params: socketTimeout=600000&connTimeout=60000&retry=true
    2040 [main] INFO  org.apache.solr.logging.LogWatcher  ? SLF4J impl is org.slf4j.impl.Log4jLoggerFactory
    2041 [main] INFO  org.apache.solr.logging.LogWatcher  ? Registering Log Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
    2044 [main] INFO  org.apache.solr.core.CoreContainer  ? Host Name:
    2079 [main] INFO  org.apache.solr.core.CoresLocator  ? Looking for core definitions underneath /opt/solr/server/solr
    2093 [main] INFO  org.apache.solr.core.CoresLocator  ? Found core books in /opt/solr/server/solr/books/
    2101 [main] INFO  org.apache.solr.core.CoresLocator  ? Found 1 core definitions
    2104 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? new SolrResourceLoader for directory: '/opt/solr/server/solr/books/'
    2188 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrConfig  ? current version of requestparams : -1
    2194 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrConfig  ? Adding specified lib dirs to ClassLoader
    2196 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/icu4j-54.1.jar' to classloader
    2196 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/poi-3.11.jar' to classloader
    2197 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/aspectjrt-1.8.0.jar' to classloader
    2197 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/xmpcore-5.1.2.jar' to classloader
    2197 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/poi-ooxml-3.11.jar' to classloader
    2198 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/xz-1.5.jar' to classloader
    2198 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/apache-mime4j-dom-0.7.2.jar' to classloader
    2200 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/boilerpipe-1.1.0.jar' to classloader
    2200 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/xmlbeans-2.6.0.jar' to classloader
    2200 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/pdfbox-1.8.8.jar' to classloader
    2200 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/isoparser-1.0.2.jar' to classloader
    2200 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/vorbis-java-tika-0.6.jar' to classloader
    2201 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/juniversalchardet-1.0.3.jar' to classloader
    2201 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/tika-xmp-1.7.jar' to classloader
    2201 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/tika-core-1.7.jar' to classloader
    2202 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to classloader
    2202 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/rome-1.0.jar' to classloader
    2203 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/tika-java7-1.7.jar' to classloader
    2203 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/tika-parsers-1.7.jar' to classloader
    2203 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/vorbis-java-core-0.6.jar' to classloader
    2203 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/poi-scratchpad-3.11.jar' to classloader
    2203 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/metadata-extractor-2.6.2.jar' to classloader
    2204 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/apache-mime4j-core-0.7.2.jar' to classloader
    2204 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/jhighlight-1.0.jar' to classloader
    2205 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/poi-ooxml-schemas-3.11.jar' to classloader
    2207 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/java-libpst-0.8.1.jar' to classloader
    2207 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/jdom-1.0.jar' to classloader
    2207 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/jmatio-1.0.jar' to classloader
    2208 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/commons-compress-1.8.1.jar' to classloader
    2208 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/xercesImpl-2.9.1.jar' to classloader
    2208 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to classloader
    2208 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/jempbox-1.8.8.jar' to classloader
    2209 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/tagsoup-1.2.1.jar' to classloader
    2211 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/extraction/lib/fontbox-1.8.8.jar' to classloader
    2212 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/dist/solr-cell-5.0.0.jar' to classloader
    2213 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/clustering/lib/mahout-collections-1.0.jar' to classloader
    2213 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/clustering/lib/mahout-math-0.6.jar' to classloader
    2213 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/clustering/lib/carrot2-mini-3.9.0.jar' to classloader
    2216 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/clustering/lib/attributes-binder-1.2.1.jar' to classloader
    2216 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/clustering/lib/jackson-mapper-asl-1.9.13.jar' to classloader
    2217 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/clustering/lib/simple-xml-2.7.jar' to classloader
    2219 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/clustering/lib/jackson-core-asl-1.9.13.jar' to classloader
    2219 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/clustering/lib/hppc-0.5.2.jar' to classloader
    2219 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/dist/solr-clustering-5.0.0.jar' to classloader
    2220 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/langid/lib/jsonic-1.2.7.jar' to classloader
    2220 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/langid/lib/langdetect-1.1-20120112.jar' to classloader
    2221 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/dist/solr-langid-5.0.0.jar' to classloader
    2224 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/velocity/lib/velocity-1.7.jar' to classloader
    2224 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/velocity/lib/commons-beanutils-1.8.3.jar' to classloader
    2224 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/velocity/lib/velocity-tools-2.0.jar' to classloader
    2225 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/contrib/velocity/lib/commons-collections-3.2.1.jar' to classloader
    2225 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrResourceLoader  ? Adding 'file:/opt/solr/dist/solr-velocity-5.0.0.jar' to classloader
    2290 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.update.SolrIndexConfig  ? IndexWriter infoStream solr logging is enabled
    2297 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrConfig  ? Using Lucene MatchVersion: 4.7.0
    2401 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.Config  ? Loaded SolrConfig: solrconfig.xml
    2410 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.schema.IndexSchema  ? Reading Solr Schema from /opt/solr/server/solr/books/conf/schema.xml
    2417 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.schema.IndexSchema  ? [books] Schema name=books
    2473 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.schema.IndexSchema  ? unique key field: id
    2567 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.CoreContainer  ? Creating SolrCore 'books' using configuration from instancedir /opt/solr/server/solr/books/
    2586 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? solr.NRTCachingDirectoryFactory
    2593 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? [books] Opening new SolrCore at /opt/solr/server/solr/books/, dataDir=/opt/solr-5.0.0/server/solr/books/data/
    2598 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.JmxMonitoredMap  ? No JMX servers found, not exposing Solr information with JMX.
    2603 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? [books] Added SolrEventListener for newSearcher: org.apache.solr.core.QuerySenderListener{queries=[]}
    2603 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? [books] Added SolrEventListener for firstSearcher: org.apache.solr.core.QuerySenderListener{queries=[{q=static firstSearcher warming in solrconfig.xml}]}
    2624 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.CachingDirectoryFactory  ? return new directory for /opt/solr-5.0.0/server/solr/books/data
    2627 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? New index directory detected: old=null new=/opt/solr-5.0.0/server/solr/books/data/index/
    2631 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.CachingDirectoryFactory  ? return new directory for /opt/solr-5.0.0/server/solr/books/data/index
    2653 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? created json: solr.JSONResponseWriter
    2654 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? adding lazy queryResponseWriter: solr.VelocityResponseWriter
    2655 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? created velocity: solr.VelocityResponseWriter
    2662 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? created xslt: solr.XSLTResponseWriter
    2662 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.response.XSLTResponseWriter  ? xsltCacheLifetimeSeconds=5
    2773 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? no updateRequestProcessorChain defined as default, creating implicit default
    2785 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /update/json/docs: org.apache.solr.handler.UpdateRequestHandler
    2786 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /config: org.apache.solr.handler.SolrConfigHandler
    2788 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /schema: org.apache.solr.handler.SchemaHandler
    2790 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /admin/luke: org.apache.solr.handler.admin.LukeRequestHandler
    2793 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /admin/system: org.apache.solr.handler.admin.SystemInfoHandler
    2794 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /admin/mbeans: org.apache.solr.handler.admin.SolrInfoMBeanHandler
    2796 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /admin/plugins: org.apache.solr.handler.admin.PluginInfoHandler
    2796 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /admin/threads: org.apache.solr.handler.admin.ThreadDumpHandler
    2796 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /admin/properties: org.apache.solr.handler.admin.PropertiesRequestHandler
    2796 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /admin/logging: org.apache.solr.handler.admin.LoggingHandler
    2799 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /admin/file: org.apache.solr.handler.admin.ShowFileRequestHandler
    2804 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /select: solr.SearchHandler
    2804 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /query: solr.SearchHandler
    2807 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /get: solr.RealTimeGetHandler
    2808 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /browse: solr.SearchHandler
    2809 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /update: solr.UpdateRequestHandler
    2809 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /update/json: solr.UpdateRequestHandler
    2809 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /update/csv: solr.UpdateRequestHandler
    2811 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? adding lazy requestHandler: solr.extraction.ExtractingRequestHandler
    2813 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /update/extract: solr.extraction.ExtractingRequestHandler
    2815 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? adding lazy requestHandler: solr.FieldAnalysisRequestHandler
    2815 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /analysis/field: solr.FieldAnalysisRequestHandler
    2816 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? adding lazy requestHandler: solr.DocumentAnalysisRequestHandler
    2816 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /analysis/document: solr.DocumentAnalysisRequestHandler
    2831 [coreLoadExecutor-5-thread-1] WARN  org.apache.solr.core.SolrResourceLoader  ? Solr loaded a deprecated plugin/analysis class [solr.admin.AdminHandlers]. Please consult documentation how to replace it accordingly.
    2832 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /admin/: solr.admin.AdminHandlers
    2835 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /admin/ping: solr.PingRequestHandler
    2837 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /debug/dump: solr.DumpRequestHandler
    2853 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /replication: solr.ReplicationHandler
    2853 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? adding lazy requestHandler: solr.SearchHandler
    2854 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /spell: solr.SearchHandler
    2854 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? adding lazy requestHandler: solr.SearchHandler
    2855 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /suggest: solr.SearchHandler
    2855 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? adding lazy requestHandler: solr.SearchHandler
    2855 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /tvrh: solr.SearchHandler
    2855 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? adding lazy requestHandler: solr.SearchHandler
    2855 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.RequestHandlers  ? created /terms: solr.SearchHandler
    2880 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.handler.loader.XMLLoader  ? xsltCacheLifetimeSeconds=60
    2884 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.handler.loader.XMLLoader  ? xsltCacheLifetimeSeconds=60
    2887 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.handler.loader.XMLLoader  ? xsltCacheLifetimeSeconds=60
    2888 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.handler.loader.XMLLoader  ? xsltCacheLifetimeSeconds=60
    2891 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? Using default statsCache cache: org.apache.solr.search.stats.LocalStatsCache
    2924 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? Hard AutoCommit: if uncommited for 15000ms;
    2925 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? Soft AutoCommit: disabled
    2976 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? SolrDeletionPolicy.onInit: commits: num=1
            commit{dir=NRTCachingDirectory(MMapDirectory@/opt/solr-5.0.0/server/solr/books/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@362b8706; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_2,generation=2}
    2977 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  ? newest commit generation = 2
    3018 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.search.SolrIndexSearcher  ? Opening Searcher@5e80903c[books] main
    3027 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.rest.ManagedResourceStorage  ? File-based storage initialized to use dir: /opt/solr/server/solr/books/conf
    3028 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.rest.RestManager  ? Initializing RestManager with initArgs: {storageDir=/opt/solr/server/solr/books/conf}
    3038 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.rest.ManagedResourceStorage  ? Reading _rest_managed.json using file:dir=/opt/solr/server/solr/books/conf
    3039 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.rest.RestManager  ? Initializing 0 registered ManagedResources
    3039 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.handler.component.SpellCheckComponent  ? Initializing spell checkers
    3049 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.spelling.DirectSolrSpellChecker  ? init: {name=default,field=text,classname=solr.DirectSolrSpellChecker,distanceMeasure=internal,accuracy=0.5,maxEdits=2,minPrefix=1,maxInspections=5,minQueryLength=4,maxQueryFrequency=0.01}
    3056 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.handler.component.SpellCheckComponent  ? No queryConverter defined, using default converter
    3058 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.handler.component.SuggestComponent  ? Initializing SuggestComponent
    3060 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.spelling.suggest.SolrSuggester  ? init: {name=mySuggester,lookupImpl=FuzzyLookupFactory,dictionaryImpl=DocumentDictionaryFactory,field=cat,weightField=price,suggestAnalyzerFieldType=string}
    3078 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.spelling.suggest.SolrSuggester  ? Dictionary loaded with params: {name=mySuggester,lookupImpl=FuzzyLookupFactory,dictionaryImpl=DocumentDictionaryFactory,field=cat,weightField=price,suggestAnalyzerFieldType=string}
    3094 [coreLoadExecutor-5-thread-1] WARN  org.apache.solr.handler.admin.AdminHandlers  ? <requestHandler name="/admin/"
     class="solr.admin.AdminHandlers" /> is deprecated . It is not required anymore
    3094 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.handler.ReplicationHandler  ? Commits will be reserved for  10000
    3096 [coreLoadExecutor-5-thread-1] INFO  org.apache.solr.core.CoreContainer  ? registering core: books
    3096 [searcherExecutor-6-thread-1] INFO  org.apache.solr.core.SolrCore  ? QuerySenderListener sending requests to Searcher@5e80903c[books] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(5.0.0):C1)))}
    3105 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  ? user.dir=/opt/solr-5.0.0/server
    3105 [main] INFO  org.apache.solr.servlet.SolrDispatchFilter  ? SolrDispatchFilter.init() done
    3140 [main] INFO  org.eclipse.jetty.server.AbstractConnector  ? Started SocketConnector@0.0.0.0:8983
    3146 [searcherExecutor-6-thread-1] ERROR org.apache.solr.core.SolrCore  ? org.apache.solr.common.SolrException: undefined field text
            at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1291)
            at org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getWrappedAnalyzer(IndexSchema.java:444)
            at org.apache.lucene.analysis.DelegatingAnalyzerWrapper$DelegatingReuseStrategy.getReusableComponents(DelegatingAnalyzerWrapper.java:74)
            at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:172)
            at org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:205)
            at org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:373)
            at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:741)
            at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:540)
            at org.apache.solr.parser.QueryParser.Term(QueryParser.java:299)
            at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:185)
            at org.apache.solr.parser.QueryParser.Query(QueryParser.java:107)
            at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:96)
            at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:150)
            at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
            at org.apache.solr.search.QParser.getQuery(QParser.java:141)
            at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:156)
            at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:201)
            at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
            at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)
            at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64)
            at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1778)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
    
    3149 [searcherExecutor-6-thread-1] INFO  org.apache.solr.core.SolrCore  ? [books] webapp=null path=null params={q=static+firstSearcher+warming+in+solrconfig.xml&distrib=false&event=firstSearcher} status=400 QTime=48
    3151 [searcherExecutor-6-thread-1] INFO  org.apache.solr.core.SolrCore  ? QuerySenderListener done.
    3151 [searcherExecutor-6-thread-1] INFO  org.apache.solr.handler.component.SpellCheckComponent  ? Loading spell index for spellchecker: default
    3152 [searcherExecutor-6-thread-1] INFO  org.apache.solr.handler.component.SpellCheckComponent  ? Loading spell index for spellchecker: wordbreak
    3152 [searcherExecutor-6-thread-1] INFO  org.apache.solr.handler.component.SuggestComponent  ? Loading suggester index for: mySuggester
    3152 [searcherExecutor-6-thread-1] INFO  org.apache.solr.spelling.suggest.SolrSuggester  ? reload()
    3152 [searcherExecutor-6-thread-1] INFO  org.apache.solr.spelling.suggest.SolrSuggester  ? build()
    3169 [searcherExecutor-6-thread-1] INFO  org.apache.solr.core.SolrCore  ? [books] Registered new searcher Searcher@5e80903c[books] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(5.0.0):C1)))}
    



    The Solr Core


    During startup, Solr will scan sub-directories looking for specific files named core.properties.

    The solr_data directory that I have injected into my container in the last section, contains these contents (at a minimum):
    $ tree
    .
    +-- solr_data
        +-- conf
        ¦   +-- schema.xml
        ¦   +-- solrconfig.xml
        +-- core.properties
    
    2 directories, 3 files
    


    The contents of each file are hyperlinked in place above.

    When we launch the docker container now, with this directory and its contents injected, both the log file and dashboard show us a more successful configuration.

    Last line of the log file:
    3169 [searcherExecutor-6-thread-1] INFO  org.apache.solr.core.SolrCore  ? [books] Registered new searcher Searcher@5e80903c[books] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(5.0.0):C1)))}
    


    The dashboard:
    Fig 3: Core Configuration working properly

    If I click on the "Core Selector" drop down list (at the bottom of the left-hand nav pane), this will bring up an overview pane, from which I can select "Query".  At the bottom of the query pane is an "execute" button.

    Clicking this button will perform a default query against the configured Solr core.

    Fig 4: Query returns no results
    As expected, when we run this query, there is no data.


    Java Client


    I'm going to use a Java client to add data:
    package com.trimc.blogger.solr.minimalist;
    
    import static org.junit.Assert.assertNotNull;
    
    import java.util.ArrayList;
    import java.util.Collection;
    
    import org.apache.solr.client.solrj.impl.HttpSolrServer;
    import org.apache.solr.client.solrj.impl.XMLResponseParser;
    import org.apache.solr.client.solrj.request.UpdateRequest;
    import org.apache.solr.client.solrj.response.UpdateResponse;
    import org.apache.solr.common.SolrInputDocument;
    
    @SuppressWarnings("deprecation")
    public final class PostData {
    
     public static void main(String... args) throws Throwable {
    
      String url = "http://192.168.1.34:8983/solr/books";
    
      HttpSolrServer server = new HttpSolrServer(url);
      assertNotNull(server);
    
      server.setMaxRetries(1); // defaults to 0.  > 1 not recommended.
      server.setConnectionTimeout(5000); // 5 seconds to establish TCP
      // Setting the XML response parser is only required for cross
      // version compatibility and only when one side is 1.4.1 or
      // earlier and the other side is 3.1 or later.
      server.setParser(new XMLResponseParser()); // binary parser is used by default
      // The following settings are provided here for completeness.
      // They will not normally be required, and should only be used 
      // after consulting javadocs to know whether they are truly required.
      server.setSoTimeout(1000); // socket read timeout
      server.setDefaultMaxConnectionsPerHost(100);
      server.setMaxTotalConnections(100);
      server.setFollowRedirects(false); // defaults to false
      // allowCompression defaults to false.
      // Server side must support gzip or deflate for this to have any effect.
      server.setAllowCompression(true);
    
      SolrInputDocument doc1 = new SolrInputDocument();
      doc1.addField("id", "id2", 1.0f);
      doc1.addField("title", "The Yellow Admiral", 1.0f);
      doc1.addField("speaker", "Patrick O'Brian", 1.0f);
      doc1.addField("page", 56, 1.0f);
      doc1.addField("url", "http://en.wikipedia.org/wiki/The_Yellow_Admiral", 1.0f);
      doc1.addField("line", "the dark of the moon", 1.0f);
    
      Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
      docs.add(doc1);
    
      UpdateRequest req = new UpdateRequest();
      req.setAction(UpdateRequest.ACTION.COMMIT, false, false);
      req.add(docs);
    
      UpdateResponse rsp = req.process(server);
      assertNotNull(rsp);
     }
    }
    


    I run the client locally, and then query the Solr core via the dashboard:
    Fig 5: Query returns results



    Restarting the Container


    To prove if the data is persisted between container lifecycles, first stop the running instance.

    If we examine the solr_data directory, we find this:
    $ tree
    .
    +-- conf
    ¦   +-- schema.xml
    ¦   +-- solrconfig.xml
    +-- core.properties
    +-- data
        +-- index
        ¦   +-- _2.fdt
        ¦   +-- _2.fdx
        ¦   +-- _2.fnm
        ¦   +-- _2_Lucene50_0.doc
        ¦   +-- _2_Lucene50_0.pos
        ¦   +-- _2_Lucene50_0.tim
        ¦   +-- _2_Lucene50_0.tip
        ¦   +-- _2.si
        ¦   +-- segments_4
        ¦   +-- write.lock
        +-- tlog
            +-- tlog.0000000000000000000
            +-- tlog.0000000000000000001
            +-- tlog.0000000000000000002
    
    4 directories, 16 files
    


    Note the creation of the sub-directory named data. If you've worked directly with Lucene before, you'll recognize the contents directory.

    I'm going to go ahead and start my instances in detached mode:
    $ sudo docker run -d -p 8983:8983 -v /home/craig/solr_data/:/opt/solr/server/solr/books craig/solr
    ad59d3fbd9d8b13cde5906f6192397c5a4556077957ef57d17f8a50bcb67f9c9
    

    and if I revisit the dash board via my web browser, the data is still present.  Since we were able to view the Lucene index on our host system, this is no surprise.


    References

    1. Resources:
      1. [GitHub] solr-minimalist
        1. Java/Maven project containing the client code to post data to Solr.
      2. [GitHub] Solr Dockerfile
    2. [StackOverflow] Solr Collections vs Cores
    3. [DigitalOcean] Manual installation of Solr 4.7.x on Ubuntu 14.04
      1. Leaves a couple key steps out (for example, doesn't explain why the Jetty install is necessary), and some links have changed over time. Be sure to thoroughly read the comments section.