sixcooler
41c9215174
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
7a64bebb86
init Recrawl job chunk size to max crawl loader during job start, to use some system preferences
...
and allow injection of recrawl urls before queue is empty
During recrawl the balancer hangs on the very last urls often on hosts with huge delay time,
by allowing injection earlier progress is more balanced. Max number of injected crawl urls by recrawl job is 2 * max loader.
9 years ago
sixcooler
e7dab60ebd
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
Michael Peter Christen
9244694e64
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
9 years ago
Michael Peter Christen
151ccd50a9
fix for image size field values (must be multi-valued)
9 years ago
reger
c9937973e3
unescape MultiProtocolURL getAttributes() return values.
...
use getAttributes() to get query parameters as clear text (w/o url encoding)
use getSearchpartMap() to get in internal format (url encoded)
fix for http://mantis.tokeek.de/view.php?id=606
9 years ago
sixcooler
6695e5cdd3
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
10b0eb106f
fix link target on iframe list in CrawlProfileEditor
9 years ago
reger
78e8c6f3e5
refactor special handling (static override) of SUPPORTED_EXTENSIONS/MIME_TYPES
...
not used for genericImageParser
9 years ago
reger
d54c5d310a
add links with image extension not automatically to image links.
...
With the wide spread use e.g. of Wikimedia the url file extension of links with image extension often point to html.
9 years ago
sixcooler
0431be8d6c
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
5744342fec
handle image preview for url w empty file extension
...
fix of commit 688f7b2a5c
9 years ago
reger
851e8f6c8a
check jpeg file signature in genericImageParser
...
to fail early without further object allocation if source is not a jpeg.
9 years ago
reger
fb75fea446
use recrawljob w/o sort results by date
...
This is a workaround for existing index (not fully reindexed) since intro of schema with docvalues
to prevent solr exception causing recrawljob to fail with
org.apache.solr.core.SolrCore java.lang.IllegalStateException: unexpected docvalues type NONE for field 'load_date_dt' (expected=NUMERIC). Use UninvertingReader or index with docvalues.
9 years ago
reger
43c27aa550
upd to solr/lucene 5.3.1
9 years ago
reger
fd5a1dc297
upd to poi-3.13
9 years ago
sixcooler
839d710105
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
688f7b2a5c
allow/display svg images in image results previews
...
svg is not supported by awt but by most browser. Image content is delivered as received (without size adjustment)
9 years ago
reger
d5330391de
remove some unused var allocation in parser
9 years ago
Michael Peter Christen
3d7dd9d3aa
follow-up to latest commit: also flush the search cache if all crawls
...
had been terminated.
9 years ago
Michael Peter Christen
225200194a
every time a crawl is started, the user expects a different search
...
result behaviour. This requires that the search cache is flushed for
each crawl start. TODO: this should also be done if a crawl is
terminated.
9 years ago
Michael Peter Christen
c737ff235d
in case that the include_string contains several entries including
...
1-char tokens and also more-than-1-char tokens, then remove the 1-char
tokens to prevent that we are to strict. This will make it possible to
be a bit more fuzzy in the search where it is appropriate.
9 years ago
Michael Peter Christen
8e555d79a3
add also 1-character tokens to the token list because that could be also
...
searched for. A full-string search for a filename may fail if those
1-char tokens are omitted
9 years ago
sixcooler
1091e25f4c
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
7c82cd4415
add a end condition to svgParser for wrong content
...
(if parser choosen just by file extension)
9 years ago
sixcooler
9c2cd7e87b
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
b92d81b073
remove double caching of inputstream in ViewImage
9 years ago
reger
c7c5e2dff9
fix old/obsolete solr dependency to stax
...
delete obsolete jar
9 years ago
reger
beed1c417e
Add report profile with OWASP Dependency-Check to maven pom
9 years ago
reger
356d4d1301
remove rdfParser from init (current function identical with genericParser)
9 years ago
reger
c647d899e3
add svgParser to parse metadate from svg images
...
Reads document level included title and description and skips the graphic content to save bandwidth.
svg metadata element is not interpreted
- remove rdfParser from init (current function identical with genericParser)
9 years ago
reger
bad34804fe
optimize parseInt for <img> tag attribute parsing
...
Performance better as using Numberformat.parse or parseInt(substring())
9 years ago
sixcooler
68c6d6ca7a
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
Michael Peter Christen
3c31bf845f
fix for latest merge
9 years ago
Michael Peter Christen
6ebc2451a9
Merge pull request #14 from luccioman/master
...
Translator refactoring : no more regular expression processing
9 years ago
reger
2f51baff4f
check for loading error (includs unsupported formats)
...
to prevent blank thumbnail display in image search because of not handled source which don't load on click.
Now the cross icon indicates the problem (inlcuding not supported format)
9 years ago
luc
5578886f6f
Merge branch 'master' of https://github.com/luccioman/yacy_search_server.git
9 years ago
luc
c38d6c1f37
Correction for mantis 535: inurl: parameter doesn't work on URLs with
...
upper-case letters
9 years ago
reger
52e3eb4ce8
harmonize/correct assignment to Ymarkmeta.mime
...
replace use of deprecated
9 years ago
Michael Peter Christen
87f358058e
Fix for index entries which have id's not computed as hash from the url.
...
This makes it possible to operate with outside-computed url hashes in
enterprise environments not using the build-in crawler from YaCy.
9 years ago
reger
2951c9fc40
remove unused check for known fileextension in searchtrailer
...
(check is done on add to filetype-nav)
9 years ago
reger
3f2b8ab5e5
optionally include mime in p2p url exchange string
...
if doctype decodes to ambiguous mime and default conversion is not equal to original
9 years ago
sixcooler
de01b25805
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
a3195d78ae
add Portuguese month names to date recognition
9 years ago
reger
d2cc11ea8f
fix html parser taking <style> content as text.
...
Noticed some result description contain css content from style tag.
Added <style> to tag list to scrape it's content not as text
+ test case included
9 years ago
sixcooler
9ace7876ef
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
Michael Peter Christen
5f706797cb
patch for a bug inside of solr since solr 5.0 when using a boost
...
function with a numeric date field:
"unexpected docvalues type NUMERIC for field 'last_modified' (expected
one of [SORTED, SORTED_SET]). Use UninvertingReader or index with
docvalues."
This is a well-known bug inside solr which prevents that now the 'sort
by date' in the YaCy search interface can be used. Without this patch no
results at all is displayed (since the exception prevents that). Now
there is at least a result but it is not ordered properly.
9 years ago
sixcooler
c9da652249
Merge branch 'master' of https://github.com/yacy/yacy_search_server
9 years ago
reger
733d725dec
limit css scrolling to result/content window x
...
from pull request #10
9 years ago
Burkhard
4c38083a11
Merge pull request #10 from Raegdan/raegdan-css-layout-fix
...
Fixed CSS scrolling
9 years ago