You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Michael Peter Christen
f8cd57c92f
new indexing strategy: ALL links that appear anywhere are indexed, not
...
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
13 years ago
..
content
added changes from copperdust (submitted by email):
13 years ago
geolocalization
added autotaggig stub .. only reading and parsing of vocabularies at
13 years ago
importer
!Important: move from Hashtable to HashMap
13 years ago
language
added changes from copperdust (submitted by email):
13 years ago
parser
fix for parser problem if a <a>-tag is 'within' html tags with unclosed
13 years ago
AbstractParser.java
added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
14 years ago
Autotagging.java
fix for single-word vocabulary lines
13 years ago
Condenser.java
new indexing strategy: ALL links that appear anywhere are indexed, not
13 years ago
Document.java
new indexing strategy: ALL links that appear anywhere are indexed, not
13 years ago
ImageParser.java
- enhanced description on search front page
13 years ago
LargeNumberCache.java
more performance hacks
15 years ago
LibraryProvider.java
added autotagging to document condenser:
13 years ago
Parser.java
*) added SID file (Commodore 64) sound file parser
14 years ago
Phrase.java
more performance hacks
15 years ago
SentenceReader.java
Initial performance improvements
13 years ago
SnippetExtractor.java
performance hack
13 years ago
StringBuilderComparator.java
replaced String with StringBuilder in suggestion process
13 years ago
TextParser.java
refactoring: moved document Classification to cora package
13 years ago
WordCache.java
vocabularies are now also used as source for a did-you-mean computation
13 years ago
WordTokenizer.java
performance hack
13 years ago