yacy_search_server

Commit Graph

Author	SHA1	Message	Date
sixcooler	bfccb8db1c	Merge branch 'master' of https://github.com/yacy/yacy_search_server	10 years ago
reger	826f14f37f	fix unnececary set null of peer flags, causing reread remove obsolete version flags	10 years ago
sixcooler	cdbafe340e	Merge branch 'master' of https://github.com/yacy/yacy_search_server	10 years ago
reger	571609c208	upd javascript img viewerto highslide 4.1.13	10 years ago
reger	f0b5bc93a3	remove obsolete yacy.init entry "secureHttps" not used anywhere	10 years ago
reger	c4fa6d7bf5	upd to icu4j-56_1	10 years ago
reger	5445f38070	upd to jetty 9.2.13.v20150730	10 years ago
reger	6ca02ad577	upd httpclient-4.5.1, httpmime-4.5.1, httpcore-4.4.3, commons-compress-1.10	10 years ago
reger	c6495a5b62	add a log entry on parsing ajax crawling scheme snapshot (prev. commit `9252e36aeb`)	10 years ago
reger	9252e36aeb	implement ajax crawling scheme for ajax sites which adhere to the proposed use of hash-bangs to provide html content see freshly deprecated https://developers.google.com/webmasters/ajax-crawling/ Implementation improves parsing of the homepage (ajax page) which uses metatag "fragment" in header and parses supplied html snapshot instead of mostly empty ajax/scripted page. Implementation supports also hash-bang urls (url with anchor starting with ! like ...path#!hashfragment) but our crawler filters it (use of hash-bang is controversly discussed and proposal is deprecated, makes no sense to adjust the crawler, but as long as it is used by some sites the minor change/improvement in htmlparser is good for some time). Quick - how does it work - if metatag fragment with content "!" is found - htmlparser tries to get content of htmls snapshot (using a different url) - htmlparser returns 2 documents (original url and snapshot content - but using same original url) - after parsing result documents are joined (and stored to index containing content also from snapshot page... as the original ajax page contains typically no parseable html content)	10 years ago
Michael Peter Christen	d1ae999ef9	replaced HashMap with LinkedHashMap to preserve the object order	10 years ago
Michael Peter Christen	7d075a1d76	added log lines	10 years ago
Michael Peter Christen	092dac086e	Merge branch 'master' of https://github.com/luccioman/yacy_search_server	10 years ago
Michael Peter Christen	a44cc774d0	Merge branch 'master' of github.com:yacy/yacy_search_server	10 years ago
sixcooler	41c9215174	Merge branch 'master' of https://github.com/yacy/yacy_search_server	10 years ago
reger	7a64bebb86	init Recrawl job chunk size to max crawl loader during job start, to use some system preferences and allow injection of recrawl urls before queue is empty During recrawl the balancer hangs on the very last urls often on hosts with huge delay time, by allowing injection earlier progress is more balanced. Max number of injected crawl urls by recrawl job is 2 * max loader.	10 years ago
sixcooler	e7dab60ebd	Merge branch 'master' of https://github.com/yacy/yacy_search_server	10 years ago
luc	d6522fa4a2	Integrated haraldk/TwelveMonkeys library to first add TIF image format support.	10 years ago
luc	e093fb228d	Created a generic ViewImage performance render test.	10 years ago
Michael Peter Christen	9244694e64	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	10 years ago
Michael Peter Christen	151ccd50a9	fix for image size field values (must be multi-valued)	10 years ago
luc	3ad564e2e4	Created a ViewImage rendering performance measurement test.	10 years ago
luc	62e07a26a0	Refactoring : split into sub-functions to make it understanding and performance measurement easier.	10 years ago
luc	b3f044072e	Updated table headers and SVG file url for case sensitive OS.	10 years ago
luc	ff963cbe23	Merge branch 'master' of https://github.com/yacy/yacy_search_server	10 years ago
reger	c9937973e3	unescape MultiProtocolURL getAttributes() return values. use getAttributes() to get query parameters as clear text (w/o url encoding) use getSearchpartMap() to get in internal format (url encoded) fix for http://mantis.tokeek.de/view.php?id=606	10 years ago
sixcooler	6695e5cdd3	Merge branch 'master' of https://github.com/yacy/yacy_search_server	10 years ago
reger	10b0eb106f	fix link target on iframe list in CrawlProfileEditor	10 years ago
reger	78e8c6f3e5	refactor special handling (static override) of SUPPORTED_EXTENSIONS/MIME_TYPES not used for genericImageParser	10 years ago
reger	d54c5d310a	add links with image extension not automatically to image links. With the wide spread use e.g. of Wikimedia the url file extension of links with image extension often point to html.	10 years ago
luc	f5746b5490	Added ico and bmp sample pictures	10 years ago
luc	baede48161	Added JPEG 2000 and FITS samples	10 years ago
luc	7c9d80c5d0	Added image formats and informations for each format.	10 years ago
sixcooler	0431be8d6c	Merge branch 'master' of https://github.com/yacy/yacy_search_server	10 years ago
luc	073ef730af	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	10 years ago
reger	5744342fec	handle image preview for url w empty file extension fix of commit `688f7b2a5c`	10 years ago
luc	82dd004260	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	10 years ago
reger	851e8f6c8a	check jpeg file signature in genericImageParser to fail early without further object allocation if source is not a jpeg.	10 years ago
reger	fb75fea446	use recrawljob w/o sort results by date This is a workaround for existing index (not fully reindexed) since intro of schema with docvalues to prevent solr exception causing recrawljob to fail with org.apache.solr.core.SolrCore java.lang.IllegalStateException: unexpected docvalues type NONE for field 'load_date_dt' (expected=NUMERIC). Use UninvertingReader or index with docvalues.	10 years ago
Michael Peter Christen	3cbf86f295	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	10 years ago
Michael Peter Christen	23f6294a2d	removed unused import	10 years ago
reger	43c27aa550	upd to solr/lucene 5.3.1	10 years ago
reger	fd5a1dc297	upd to poi-3.13	10 years ago
sixcooler	839d710105	Merge branch 'master' of https://github.com/yacy/yacy_search_server	10 years ago
luc	0ae9297ca5	Created a html test page to check ViewImage rendering with different file formats.	10 years ago
luc	136e8f6fbd	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	10 years ago
reger	688f7b2a5c	allow/display svg images in image results previews svg is not supported by awt but by most browser. Image content is delivered as received (without size adjustment)	10 years ago
reger	d5330391de	remove some unused var allocation in parser	10 years ago
Michael Peter Christen	3d7dd9d3aa	follow-up to latest commit: also flush the search cache if all crawls had been terminated.	10 years ago
Michael Peter Christen	225200194a	every time a crawl is started, the user expects a different search result behaviour. This requires that the search cache is flushed for each crawl start. TODO: this should also be done if a crawl is terminated.	10 years ago

1 2 3 4 5 ...

12005 Commits (bfccb8db1cf3695748fb2fa0db3b321ca94c0b42) All Branches Search

12005 Commits (bfccb8db1cf3695748fb2fa0db3b321ca94c0b42)

All Branches