yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	938d8a9731	Added some JavaDoc	7 years ago
luccioman	f369679d1c	Fixed read/copy on input streams reading sometimes less than expected.	8 years ago
luccioman	bf55f1d6e5	Started support of partial parsing on large streamed resources. Thus enable getpageinfo_p API to return something in a reasonable amount of time on resources over MegaBytes size range. Support added first with the generic XML parser, for other formats regular crawler limits apply as usual.	8 years ago
luccioman	a04feac064	Ensure file input streams proper closing in both success and failures Also add when possible a warning level log message on input stream closing error instead of failing silently. This could help understanding some IO exceptions such as "too many files open".	8 years ago
luccioman	a0dfbaca6a	FileUtils : added some JavaDocs and unit test cases	8 years ago
luccioman	f0639d810c	Customized name for Threads still using the default "Thread-n" pattern. This makes threads monitoring easier to read.	8 years ago
luccioman	ee92082a3b	Updated javadocs : warning about closing stream responsibility.	8 years ago
reger	5aaa057c65	ignore empty input lines in FileUtils.getListArray() to poka joke blacklist read. equalizes behavior with getListString() improves: case were blacklist file contained a undesired empty line, not fixed by blacklist-cleaner.	9 years ago
reger	4cc38e979d	add InputStream close after reading input file (Vocabulary_p servlet)	9 years ago
Burkhard	9a18e2297b	Merge pull request #51 from JeremyRand/multiple-boost-query Fix multiple boost queries	9 years ago
reger	f0d7b93372	make use and activate autodetect charset in Vocabulary input from file + revert mistake of empty cn.lng	9 years ago
JeremyRand	58824dfa6c	Refactor escaping in config file read/write code. Now it uses Apache Commons StringUtils instead of RegEx.	9 years ago
luc	571bc55937	Refactoring : use StandardCharsets constants instead of hard-coded charset names.	9 years ago
luc	5bbb2e1730	Ensure resource is closed when reading a full file InputStream	9 years ago
reger	7d0d19cb8e	avoid File.deleteOnExit() on temp files JVM registers each file in a list regardless of already deleted and never cleans up the list during runtime. This accumulates to a considerable amount of mem during large crawls and/or long uptime. To tackle this, all temp files are now created in a subdir of java.io.tmpdir and the jvm tmpdir property is set to this subdir, which is deleted by code on shutdown. Additionally let pdfParser use this tmp subdir too.	9 years ago
reger	02e4489a23	set tmpfile.deleteOnExit by default, to make sure files are removed on shutdown.	9 years ago
luccioman	2f0f0180e2	Added a function to list files recursively.	10 years ago
Michael Peter Christen	413eeefed4	added character set detection library from http://www-archive.mozilla.org/projects/intl/chardet.html	10 years ago
Michael Peter Christen	2beb6abeb6	disabled crazy sleep loop	10 years ago
sixcooler	72561926aa	do not overwrite yacy.conf in case of an exception may be a fix for http://mantis.tokeek.de/view.php?id=180	10 years ago
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	12 years ago
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	12 years ago
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	12 years ago
Marc Nause	75f9568472	) only install files from the RELEASE directory ) minor changes	12 years ago
Marc Nause	3bc5ee6e3d	*) added protection against CSRF in update download page (http://localhost:8090/ConfigUpdate_p.html?releaseinstall=../../test.txt&deleteRelease=Delete+Release does not work anymore)	12 years ago
Michael Peter Christen	c5f67a5d6d	fixed a problem with local search from solr results: now all results from solr are shown (again)	12 years ago
Michael Peter Christen	f8f05ecba7	- added a delete button in host browser to delete a complete subpath - removed storage of default collection name - default is now "user" - made stacking of crawl start points concurrently	12 years ago
Michael Peter Christen	b400fc7b4d	fix for file parser problem	12 years ago
Michael Peter Christen	6017691522	added an exception catch	12 years ago
Michael Peter Christen	613cf7da7f	enhancement to post argument parsing - possible fix to zero-filled parameter values	12 years ago
Michael Peter Christen	a8167e6e5b	clean-up: removed unused methods in kelondro	13 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
Michael Peter Christen	ce8d4b87d9	fixes for new eclipse 'Juno' warning 'Resource leak'.	13 years ago
Michael Peter Christen	b9d42fd9c8	using com.google.common.io.Files instead of homebrew methods	13 years ago
Michael Peter Christen	3b992e6b00	using utf8 String compression in Webstructure database	13 years ago
Michael Peter Christen	c639248c23	protection against strange answers from remote peers during search	13 years ago
Marek Otahal	f75b5e40e0	little fix in copy() Signed-off-by: Marek Otahal <markotahal@gmail.com>	13 years ago
Michael Christen	e7e429705a	- less automatic indexing after a search (needs to reset the default crawl profiles) - fix for concurrency problem in storage of serverSwitch Properties - markup update	13 years ago
orbiter	775b44017e	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8033 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	eb9c9edb01	enhanced table method (used by almost all yacy api interfaces) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8000 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	035ebfbf3b	- performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill) - this may have also (good) performance side effects on other parts of YaCy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
sixcooler	9170a434ed	throwing an exception again in FileUtils.copy(reader, writer) OOMs could occour here and should not be ignored git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7858 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	fe0c08455b	more concurrency (enhancement) hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7759 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	a36fda991e	hack to increase speed of url hash computation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7751 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4bea3f9714	hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes). The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	746e3c3b06	Replaced a widely-used Property Object in the httpd with HashMap<String, Object> which is not synchronized like Properties A synchronization is not needed here and applies an overhead to the httpd process which is now removed. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7745 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	10e2f588f8	- enhanced ybr ranking computation - many speed/performance hacks - added solr charding and new charding web interface - added option to switch off the yacy index when using solr - added new fail-url categories which are used to make a distinction which fail-urls to be sent to solr - refactoring/renaming of some method names to distinguish host/url hashes better - a large number of bug/npe fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7738 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	1989ebc24b	removed more warnings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7598 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	a07a1a8b1e	removed type cast warnings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7593 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	694fa3a2a5	- replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion - changed menu structure slightly git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7583 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago

1 2

74 Commits (607b39b427f76ec139df5d9d5479cf09a0d6fe4a)