SuttaCentral

Upgrade to Elasticsearch 1.5

search
Tags: #<Tag:0x00007fc4568eb8f0>

#1

I have decided to upgrade Elasticsearch from 1.4 to 1.5.

While we are actively developing search it’s not a bad idea to keep up with the most recent release of Elasticsearch as it is constantly being improved.

But 1.5 does introduce a potentially useful feature, “inner hits”. What this means, is if you have a thread object, containing post children, you can perform a search on the threads, and return not only the most relevant thread, but also the most relevant posts within each thread. This may well prove applicable to my plans to index the Discourse posts in Elasticsearch.

If you have elasticsearch installed, please refer to the Elasticsearch installation document on the github wiki for the instructions on upgrading from 1.4 to 1.5.


#2

Upgrade worked fine. One minor detail: elasticsearch repositories were disabled on upgrade to Vivid, but that’s an easy fix.

Everything then went swimmingly, but elasticsearch doesn’t start. I’ve upgraded on both my computers and the same, with similar errors:

  File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/cherrypy/_cprequest.py", line 670, in respond
    response.body = self.handler()
  File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/cherrypy/lib/encoding.py", line 212, in __call__
    self.body = self.oldhandler(*args, **kwargs)
  File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/cherrypy/lib/jsontools.py", line 63, in json_handler
    value = cherrypy.serving.request._json_inner_handler(*args, **kwargs)
  File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/cherrypy/_cpdispatch.py", line 61, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/home/sujato/suttacentral/sc/root.py", line 75, in data
    return show.data(**kwargs)
  File "/home/sujato/suttacentral/sc/show.py", line 201, in data
    out[name] = getattr(sc.data.data, name)(**kwargs)
  File "/home/sujato/suttacentral/sc/data.py", line 8, in translation_count
    return sc.search.query.div_translation_count(lang)
  File "/home/sujato/suttacentral/sc/search/query.py", line 28, in div_translation_count
    result = es.search(index=lang, doc_type='text', search_type='count', body=body)
  File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/elasticsearch/client/utils.py", line 68, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/elasticsearch/client/__init__.py", line 497, in search
    params=params, body=body)
  File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/elasticsearch/transport.py", line 307, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/elasticsearch/connection/http_urllib3.py", line 82, in perform_request
    raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError(('Connection aborted.', ConnectionRefusedError(111, 'Connection refused'))) caused by: ProtocolError(('Connection aborted.', ConnectionRefusedError(111, 'Connection refused')))

Another seeming glitch, unrelated to elasticsearch: a couple of days ago, my local server decided to rebuild the TIM. Fine, except it just kept going, rebuilding apparently indefinitely. After stopping and restarting it seems fine. But now I’m updating SC on my laptop, and it’s doing the same thing, just churning over the TIM, pausing for a few minutes when it’s finished, then starting again.


#3

Elasticsearch:
What do you get when you navigate in your browser to http://localhost:9200

With the TIM, try deleting from the suttacentral/db folder anything which looks remotely like text-info-model*, restart and see if the problem goes away.

Also a change I made to the TIM a while ago is it completely rebuilds whenever a file changes. The rebuild does not take long on a SSD. The criteria it uses to detect file changes is not terribly sophisticated, it just discovers the most recent file modification time of any .html file within data/text, and if that has changed it rebuilds. It is of no consequence whether or not the changed file is actually valid/included (for example you can do touch data/text/foo.html to force a rebuild - the foo.html is not indexed but that will not cause a rebuild loop), but hypothetically speaking if some badly behaved text editor or something kept ‘touching’ an open HTML file (updating it’s mtime), you’d see constant rebuilds.

Finally you can do this:

grep -Fe 'sc.textdata.build' log/app.log

Post the last dozen or so lines of output.


#4

nothing.

As for other suggestions, I’ll try them and see.


#5

Nothing at localhost:9200 means the elasticsearch server is simply not running.

I’m upgrading to Vivid now, with any luck I’ll run into the same problem.


#6

Okay after upgrading I also had elasticsearch service failing to start.

It turned out to be a folder ownership problem, this fixed it for me:

sudo chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/data/elasticsearch

Something, either in vivid or 1.5, I think vivid, seems to have changed the data path used, since I had set a custom data path (to a HDD instead of the SSD) in /etc/elasticsearch/elasticsearch.yml
After the upgrade, this setting was being ignored.
I set that path in /etc/defaults/elasticsearch and it once again used that folder for data.

I mention this because it’s quite possible you’ll end up with two copies of the indexes, they are nearly 1GB which might be enough to care about on a SSD. You can find where elasticsearch has been storing stuff by running:
locate data/elasticsearch/nodes

If that data folder isn’t /usr/share/elasticsearch/data you might want to delete it to reclaim the space.


#7

None of this works for me. I don’t have a /usr/share/elasticsearch/data, and locate data/elasticsearch/nodes gives me nothing, as do both locate data/elasticsearch and locate elasticsearch/data

locate elasticsearch/nodes takes me to /var/lib/elasticsearch/elasticsearch/nodes

In case it’s useful, /etc/default/elasticsearch has:

#  Run Elasticsearch as this user ID and group ID
# ES_USER=elasticsearch
# ES_GROUP=elasticsearch

#  Heap Size (defaults to 256m min, 1g max)
# ES_HEAP_SIZE=2g

#  Heap new generation
# ES_HEAP_NEWSIZE=

#  max direct memory
# ES_DIRECT_SIZE=

#  Maximum number of open files, defaults to 65535.
# MAX_OPEN_FILES=65535

#  Maximum locked memory size. Set to "unlimited" if you use the
#  bootstrap.mlockall option in elasticsearch.yml. You must also set
#  ES_HEAP_SIZE.
# MAX_LOCKED_MEMORY=unlimited

#  Maximum number of VMA (Virtual Memory Areas) a process can own
# MAX_MAP_COUNT=262144

#  Elasticsearch log directory
# LOG_DIR=/var/log/elasticsearch

#  Elasticsearch data directory
# DATA_DIR=/var/lib/elasticsearch

#  Elasticsearch work directory
# WORK_DIR=/tmp/elasticsearch

#  Elasticsearch configuration directory
# CONF_DIR=/etc/elasticsearch

#  Elasticsearch configuration file (elasticsearch.yml)
# CONF_FILE=/etc/elasticsearch/elasticsearch.yml

#  Additional Java OPTS
# ES_JAVA_OPTS=

#  Configure restart on package upgrade (true, every other setting will lead to not restarting)
# RESTART_ON_UPGRADE=true

And the path info in elasticsearch.yml has:

#  Path to directory containing configuration (this file and logging.yml):
# 
# path.conf: /path/to/conf

#  Path to directory where to store index data allocated for this node.
# 
# path.data: /path/to/data
# 
#  Can optionally include more than one location, causing data to be striped across
#  the locations (a la RAID 0) on a file level, favouring locations with most free
#  space on creation. For example:
# 
# path.data: /path/to/data1,/path/to/data2

#  Path to temporary files:
# 
# path.work: /path/to/work

#  Path to log files:
# 
# path.logs: /path/to/logs

#  Path to where plugins are installed:
# 
# path.plugins: /path/to/plugins

#8

Okay, please post the output of:

tail /var/log/elasticsearch/elasticsearch.log -n 20

#9
sujato@sujato-ThinkCentre-M93p:~$ tail /var/log/elasticsearch/elasticsearch.log -n 20
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: org.elasticsearch.ElasticsearchException: Plugin is incompatible with the current node
	at org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:525)
	... 7 more
[2015-05-14 07:11:18,358][ERROR][plugins                  ] [Silver Samurai] cannot start plugin due to incorrect Lucene version: plugin [4.10.3], node [4.10.4].
[2015-05-14 07:11:18,358][WARN ][plugins                  ] [Silver Samurai] failed to load plugin from [jar:file:/usr/share/elasticsearch/plugins/analysis-stempel/elasticsearch-analysis-stempel-2.4.2.jar!/es-plugin.properties]
org.elasticsearch.ElasticsearchException: Failed to load plugin class [org.elasticsearch.plugin.analysis.stempel.AnalysisStempelPlugin]
	at org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:532)
	at org.elasticsearch.plugins.PluginsService.loadPluginsFromClasspath(PluginsService.java:407)
	at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:116)
	at org.elasticsearch.node.internal.InternalNode.<init>(InternalNode.java:151)
	at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:70)
	at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:207)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: org.elasticsearch.ElasticsearchException: Plugin is incompatible with the current node
	at org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:525)
	... 7 more
[2015-05-14 07:11:18,362][ERROR][bootstrap                ] {1.4.5}: Initialization Failed ...
- ElasticsearchException[Missing mandatory plugins [analysis-icu]]

#10

It’s still trying to load the outdated (2.4) plugins, that should have been fixed by these commands in the wiki section:

cd /usr/share/elasticsearch/
# Clear the old plugin versions
sudo rm -r plugins/*
# Install new versions
sudo bin/plugin install elasticsearch/elasticsearch-analysis-stempel/2.5.0
sudo bin/plugin install elasticsearch/elasticsearch-analysis-icu/2.5.0
sudo bin/plugin install elasticsearch/marvel/latest

#11

I did that stuff before; just did it again, still the same error.


#12

Okay, wipe out and re-install elasticsearch

sudo apt-get purge elasticsearch
sudo apt-get install elasticsearch
cd /usr/share/elasticsearch/
sudo bin/plugin install elasticsearch/elasticsearch-analysis-stempel/2.5.0
sudo bin/plugin install elasticsearch/elasticsearch-analysis-icu/2.5.0
sudo bin/plugin install elasticsearch/marvel/latest
sudo chown -R elasticsearch:elasticsearch /usr/share/elasticsearch /var/log/elasticsearch
sudo service elasticsearch start

Give it a minute to start up, and then check http://localhost:9200

If it still doesn’t start up, paste the output of:
grep -F elasticsearch /var/log/syslog | grep Exception | tail -n 20


#13

Finally! Thanks so much, works perfectly on my desktop now, i will do the same for my laptop.


#14

Now trying to get my laptop working, also getting problems. After doing what your last post said, now the local server is not serving any text files, while the rest is okay. No search either.

I’m getting the following errors:

Traceback (most recent call last):
File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/cherrypy/_cprequest.py", line 670, in respond
response.body = self.handler()
File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/cherrypy/lib/encoding.py", line 212, in __call__
self.body = self.oldhandler(*args, **kwargs)
File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/cherrypy/_cpdispatch.py", line 61, in __call__
return self.callable(*self.args, **self.kwargs)
File "/home/sujato/suttacentral/sc/root.py", line 61, in default
return show.default(*args, **kwargs)
File "/home/sujato/suttacentral/sc/show.py", line 149, in default
return SuttaView(sutta, lang, canonical).render()
File "/home/sujato/suttacentral/sc/views.py", line 208, in render
self.setup_context(context)
File "/home/sujato/suttacentral/sc/views.py", line 558, in setup_context
super().setup_context(context)
File "/home/sujato/suttacentral/sc/views.py", line 373, in setup_context
context.discourse_results = sc.search.discourse.search(self.uid)
File "/home/sujato/suttacentral/sc/search/discourse.py", line 321, in search
result = es.search(discourse_index, doc_type='topic', body=body)
File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/elasticsearch/client/utils.py", line 68, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/elasticsearch/client/__init__.py", line 497, in search
params=params, body=body)
File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/elasticsearch/transport.py", line 307, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/elasticsearch/connection/http_urllib3.py", line 86, in perform_request
self._raise_error(response.status, raw_data)
File "/home/sujato/.pyenv/versions/suttacentral/lib/python3.4/site-packages/elasticsearch/connection/base.py", line 102, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.NotFoundError: TransportError(404, 'IndexMissingException[[discourse] missing]')

And under “Nitty Gritty Details”:

{'Accept-Language': 'en-US,en;q=0.8,ko;q=0.6,ms;q=0.4,zh-TW;q=0.2,zh;q=0.2,id;q=0.2,pt;q=0.2,ru;q=0.2', 'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate, sdch', 'Host': 'localhost:8800', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Referer': 'http://localhost:8800/mn', 'Dnt': '1', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/41.0.2272.76 Chrome/41.0.2272.76 Safari/537.36', 'Remote-Addr': '127.0.0.1'}
Response Headers
{'Server': 'CherryPy/3.3.0', 'Date': 'Tue, 19 May 2015 05:51:45 GMT', 'Content-Type': 'text/html'}
Cherrypy Config
{'server.socket_port': 8800, 'log.error_file': '', 'tools.staticdir.root': '/home/sujato/suttacentral/static', 'tools.log_tracebacks.on': True, 'tools.remove_trailing_slash.on': True, 'tools.set_offline.on': True, 'error_page.default': , 'log.screen': False, 'tools.trailing_slash.on': False, 'server.socket_host': '127.0.0.1', 'log.access_file': '', 'engine.autoreload.on': True, 'tools.encode.on': True, 'tools.log_headers.on': True}

And for good measure:

grep -F elasticsearch /var/log/syslog | grep Exception | tail -n 20
May 19 15:42:12 sujato-UX31A elasticsearch[7640]: org.elasticsearch.ElasticsearchException: Failed to load logging configuration
May 19 15:42:12 sujato-UX31A elasticsearch[7640]: Caused by: java.nio.file.NoSuchFileException: /usr/share/elasticsearch/config
May 19 15:42:12 sujato-UX31A elasticsearch[7640]: at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
May 19 15:42:12 sujato-UX31A elasticsearch[7640]: at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
May 19 15:42:12 sujato-UX31A elasticsearch[7640]: at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)

#15

Well elasticsearch is actually fine. The main exception is caused because the server can’t connect to a discourse instance to download the posts (yes, it really should produce an error instead of an exception, but oh well). If you want it to download the posts from discourse.suttacentral.net and so be able to see discourse search results on your development machine, you need to do a little bit of setup:

Edit ‘local.conf’ and add a section which looks like this:

[discourse]
    username: 'system'
    forum_url: 'https://discourse.suttacentral.net/'    
    api_key: '????????????????????????????????????????????'
    sync_period: 300

For the api_key, in place of the question marks you need to use the ‘All Users’ api key from this admin page:
https://discourse.suttacentral.net/admin/api

Note that the api key is top secret! If a malicious person got their hands on it they could cause all sorts of trouble. That’s why we don’t put it in any file which is committed to github, nor paste it anywhere public or private for that matter, the only people who should have access to it are those who can already see it on the /admin/api page


#16

Okay, but if I don’t want to see the Discourse results? Normally I won’t need to, although just now I was tweaking the CSS so it would have been handy. But anyway, that should hopefully be done, so best to just not pull discourse results.


#17

There is little harm in having the results, but if you don’t want to, just set the discourse user or api key to an empty string '' and I’ll fix it so that it just puts nothing there instead of raising an exception.


#18

Done these things, still getting no texts showing up in local server.


#19

It should work better in the latest push so git pull and try again.


#20

Thanks, it works fine now. Texts are working, and Discourse is too. Do I need that setting in local.conf? Because I set it to null for Discourse API, but it still has results.

Also, with the popup, that’s great, but we don’t need the box-shadow now, just a simple border like the epigraph will be fine. And given that we’re not competing with the epigraph, we don’t need to increase font size to 1.2, just leave it as normal.

We can also view it in Incognito, right?