Diorisis Search 3.3
[ made since 2020 by Alessandro Vatri ]
Diorisis Search is an application designed to build and run complex linguistic queries on the Diorisis Ancient Greek Corpus (Vatri and McGillivray 2018) through an intuitive graphic interface.
Download Mac (OSX 10.10 or higher)
103Mb      706 downloads
Download Windows (Installer)
98Mb      870 downloads
Download Windows (Portable)
98Mb      160 downloads
Download Linux (Debian 64bit)
147Mb      168 downloads
Select and search any of 820 lemmatized Ancient Greek texts
Read and navigate Greek texts
Build complex search patterns through an intuitive interface
Search for forms according to their morphological features
Parse any Greek text and load it into the search engine
Save and reload your queries
Calculate basic frequency data for your results
View occurrences in context sentences
Highlight and navigate occurrences in the texts
Rank lemmas or word forms matching search patterns by frequency
View the analysis of each word and report errors
Export the results as Excel spreadsheets

download example
Share your results with other users
Create a collection of sentences from your search results
Export sentence collections as Word files
Search and export author/work abbreviations from the Diccionario Griego-Español list
Create parsing exercises
Mark parsing exercises automatically
Export marked scripts and return them to students
Select and search any of 820 lemmatized Ancient Greek texts

Functions

File and corpora

Diorisis Search is capable of:

Online services

Corpus management

Text Reader

Query builder

Searchable elements

Users can include the following types of linguistic items in their queries:

the exact phrase
Searches for the exact sequence of word nodes whose respective @form attributes correspond exactly (i.e. including grave vs acute accents) to the user input, unless the option ignore diacritics is selected (this option may also be activated globally). Forms may be input either in Unicode (UTF-8) Polytonic Greek or in BetaCode. Punctuation signs may not be included in the search and will be ignored by the search engine (e.g. the sequence λέγω ὅτι will return instances of both "λέγω ὅτι" and "λέγω, ὅτι").
the exact form
A word whose form corresponds exactly (i.e. including the grave vs acute accents) to the user input, unless the option ignore diacritics is selected (this option may also be activated globally). Forms may be input either in Unicode (UTF-8) Polytonic Greek or in BetaCode.
a form containing the sequence
A word node whose form contains the sequence (the string) input by the user.
For instance, πι selects all forms that contain πι (e.g. πιστεύεις, Ἀσκληπιῷ, etc., but not ἐλπίζων or πίπτει, unless the option ignore diacritics is selected). Strings may be input either in Unicode (UTF-8) Polytonic Greek or in BetaCode.
a form of the lemma
A word whose lemma corresponds exactly to the one selected by the user from the list of lemmas occurring in the Diorisis Corpus (and not all lemmas in e.g. LSJ, some of which would not be found anyway!). Lemmas may be input either in Unicode (UTF-8) Polytonic Greek or in BetaCode with all diacritics (even if the option ignoring diacritics is active).
a form of the lemma that contains the sequence
A wordwhose lemma contains the sequence (the string) input by the user.
For instance, πι selects all forms of lemmas that contain πι (e.g. πιστεύω, ἐπιλαμβάνω, etc., but not ἐλπίς, unless the option ignore diacritics is selected). Strings may be input either in Unicode (UTF-8) Polytonic Greek or in BetaCode.
a word with the following morphological features
A word of which at least one possible morphological analysis corresponds to the combination of values input by the user.
the punctuation mark
A punctuation mark corresponding to the user input.

If words are selected according to their form or lemma, users have the option to specify which morphological analyses should be possible for the word, if required.

Search commands

The relationship between linguistic items in the linear order of the sentence can be specified in the following ways:

followed by
Requires that the first word or punctuation mark should be followed by a word or punctuation mark whose features the user will be prompted to specify.
If the first element in the query is a strong punctuation mark (full stop, middle dot, or question mark), the search will be extended to the immediately following sentence (e.g. queries can capture words that follow a question mark within the following sentence).
followed or preceded by
Requires that the first word or punctuation mark should be followed or preceded by a word or punctuation mark mark whose features the user will be prompted to specify.
This command is not available if the first element in the query is a strong punctuation mark (full stop, middle dot, or question mark).
followed by (ignore punctuation)
Requires that the first word or punctuation mark should be followed by a word mark whose features the user will be prompted to specify.
If the first element in the query is a strong punctuation mark (full stop, middle dot, or question mark), the search will be extended to the immediately following sentence (e.g. queries can capture words that follow a question mark within the following sentence).
NB: The scope of the search is defined by counting only the number of word nodes.
For instance, in the sequence [ ὦ ἄνδρες, ἐγὼ ], ἐγὼ counts as immediately following ἄνδρες (scope = 1).
If used to include in the query more than one element after the first, this command will appear as preceded by the word and in the drop-down menu.
followed or preceded by (ignore punctuation)
Requires that the first word or punctuation mark should be followed or preceded by a word whose features the user will be prompted to specify.
NB: The scope of the search is defined by counting only the number of word nodes.
For instance, in the sequence [ ὦ ἄνδρες, ἐγὼ ], ἐγὼ counts as immediately following ἄνδρες (scope = 1).
When any of these commands is selected, the user will also be prompted to indicate the scope of the search (i.e. the required distance or range of distances of the target from the first element).
or
Specifies that the first element may be defined by an alternative set of features. This command is only available while specifying the first element of the search.
ignoring diacritics
Requests that all diacritics signs be ignored in all form- and lemma-based searches. This option may be activated selectively for individual elements.

All elements may be searched for negatively, that is, it is possible to search for elements that match any feature but those specified (e.g. anything but the exact form instead of the exact form).

The maximum scope of a search is one sentence. In the Diorisis Corpus, sentences are defined as sequences of words and punctuation marks delimited by a strong punctuation mark (full stop, middle dot, or question mark).

Within the sentence, searches for individual elements to follow or precede the first one may be restricted to a specific range (scope) for each element. The following options are available:

within
The search engine will search for the specified element within one and the specified number of elements from the first element.
For instance, a search for the form ἀνὴρ within 3 words after the form ὁ will capture sequences like ὁ ἀνὴρ, ὁ δ’ ἀνὴρ, ὁ αὐτὸς ἀνὴρ, or ὁ δ’ αὐτὸς ἀνὴρ.
With the commands followed by or followed or preceded by, the range is calculated counting the number of word and punct nodes.
between
The search engine will search for the specified element in elements at a distance from the first element ranging within the specified lower and upper end.
For instance, a search for the form ἀνὴρ between 2 and 3 words after the form ὁ will capture sequences like ὁ δ’ ἀνὴρ, ὁ αὐτὸς ἀνὴρ, or ὁ δ’ αὐτὸς ἀνὴρ, but not ὁ ἀνὴρ.
With the commands followed by or followed or preceded by, the range is calculated counting the number of word and punct nodes.
exactly
The search engine will search for the specified element in elements at the specified distance from the first element.
For instance, a search for the form ἀνὴρ exactly 2 words after the form ὁ will capture sequences like ὁ δ’ ἀνὴρ or ὁ αὐτὸς ἀνὴρ, but not ὁ ἀνὴρ or ὁ δ’ αὐτὸς ἀνὴρ.
With the commands followed by or followed or preceded by, the range is calculated counting the number of word and punct nodes.
With the commands followed by (ignore punctuation) or followed or preceded by (ignore punctuation), the range is calculated counting the number of word nodes only (and, as a consequence, it is wider).
in the same sentence
The search engine will search for the specified element in elements in the same sentence as the first element.

Results

Diorisis Search returns the following data:

Result sentences can be saved in the Saved Sentences workbook. Saved sentences are temporarily stored in memory, along with their exact reference, and may be viewed and as a Microsoft Word 2010+ docx document for use e.g. in handouts, exercises, or other teaching materials.

Teaching Tools

Diorisis Search can be used to create and mark parsing exercises.

User Reviews

Caio Borges Geraldes
App version: 1.0
Researcher (Linguistics)
Reviewed in Brasil on 11/05/20 19:33:32
The search engine is very good and has great flexibility, surely something I will be using in my research from now on. I have two suggestions that would make the engine more usefull for my own research and maybe for other colleagues : 1. A minor improvement would be to include a native Linux version. It is possible to run it with Wine on Linux machines, but the loading time with Wine is a bit too long and might be affecting the query times. It might be quite simple to do so, but I am not so sure. 2. The major improvement I would like to see is to include not only the frequency counts of the query's result, but also the sentences themselves in the export result file. It would make the application way more effective for building research databases for further annotation. It seems to me that this addition can be easily implemented since the engine is already returning this data. Thank you very much for the project!
Mar A Rodda
App version: 3.2
Researcher (Classics)
Reviewed in UK on 24/06/24 16:50:28
Extremely useful tool, ensuring that one of the best available processed corpora (Diorisis) is accessible to the wider researcher community without the need for a sophisticated linguistics/coding background. I have used Diorisis for my research (with my own scripts), and have both recommended Diorisis Search to colleagues who used it in research papers and used it in workshops for students. I have not tried it as a teaching tool, but the ability to set and mark exercises is very promising!

Changelog

Version 3.3 Nov 2022
  • Added function: add text/passage reference when copying selections from reader (not only when displaying search results).
  • Added function: start/end-with flags for partial form/lemma searches.
  • Added function: add selections/sentences to Saved Sentences from reader.
  • Interface update: removed 'Minimal view'.
  • Interface update: texts with no hits are hidden, can be toggled.
  • Interface update: view hits for one text at a time in Result Visualizer.
  • Interface update: separated Export Results from Save Results.
  • Bug fix: export Excel.
  • Bug fix: copy text from reader.
  • Bug fix: italicised speech numbers in references.
Version 3.21 Nov 2021
  • Bug fixes.
Version 3.2 Oct 2021
  • Export text to UTF-8 txt files tool added.
  • Reader: added support for line breaks in poetry.
  • Reader: added support for edition information display.
  • Fixed bugs in corpus updater.
Version 3.11 May 2021
  • Bug fixes.
  • Support for 64bit Linux.
Version 3.1 Apr 2021
  • Lemma suggestions in query builder.
  • Navigate results in text reader.
  • Quick search for phrases in text reader.
  • Updater for single Diorisis Corpus files.
  • New error reporting system.
  • Read text of imported file.
  • Performance improvement loading dictionary.
  • Windows installer.
  • Cosmetic fixes.
  • Bug fixes.
Version 3.01 Mar 2021
  • Fixed bug with Text Reader and XML version of the Diorisis Corpus.
Version 3.0 Feb 2021
  • Added Text Reader.
  • New result visualizer.
  • Context menus to copy text and references.
  • Ability to restrict searches to parts of texts.
  • Save results to local files and reload them into the app.
  • Improvements in corpus selection interface.
  • Bug fixes.
Version 2.12 Jan 2021
  • Integration of DGE (Diccionario Griego-Español) author/work abbreviations.
  • Search and export DGE abbreviations.
  • Automatically view DGE abbreviations of corpus works.
  • Use DGE abbreviations as references when exporting sentences into Word documents.
Version 2.11 Dec 2020
  • Bug fixes.
  • New 'Search exact phrase' function.
Version 2.10 Nov 2020
  • Spanish localization for the parsing exercises (courtesy of Alberto Pardal Padín).
  • A new, streamlined in-app update system.
Version 2.0 Oct 2020
  • Bug fixes and cosmetic improvements.
Version 2.0β Sep 2020
  • Fixed bug in visualization of shared results.
  • Monitor progress and cancel query.
  • Performance boost of search engine.
  • Optimized memory handling in visualization of results.
  • JSON conversion bug fix.
  • Search engine bug fixes.
  • Added support for non-lemmatized texts.
  • Added possibility to combine multiple user-defined corpora.
  • Possibility to search for negative patterns.
  • Parser for non-Diorisis texts typed in by users.
  • Disambiguator for user-input parsed texts.
  • Possibility to search user-input parsed texts.
  • Ability to save lists of sentences.
  • Create, solve, and mark parsing exercises, with multilingual support (English/Italian).
Version 1.02 May 2020
  • Fixed bugs in morphology search.
  • Fixed bugs in XML to JSON converter.
  • Fixed bugs in morphological analysis visualizer.
Version 1.01 Apr 2020
  • Fixed bug: saving queries and corpora.
Version 1.0 Apr 2020
  • Possibility to upload searches to online archive (w/online API) added.
  • Bug fixes in error reporting system.
  • Possibility to save and reload queries.
  • Cancel buttons on all dialogs.
  • Support for BetaCode in lemma input.
  • Palette restyling.
  • Bug fixes in search engine for context words defined morphologically.
  • User review system added.
  • Outputs count of lemmas/forms occurring as the first element of a query (seed).
  • Restyling of result window, with collapsible sections.