reger
5aaa057c65
ignore empty input lines in FileUtils.getListArray() to poka joke blacklist read.
...
equalizes behavior with getListString()
improves: case were blacklist file contained a undesired empty line, not
fixed by blacklist-cleaner.
9 years ago
reger
4cc38e979d
add InputStream close after reading input file (Vocabulary_p servlet)
9 years ago
Burkhard
9a18e2297b
Merge pull request #51 from JeremyRand/multiple-boost-query
...
Fix multiple boost queries
9 years ago
reger
f0d7b93372
make use and activate autodetect charset in Vocabulary input from file
...
+ revert mistake of empty cn.lng
9 years ago
JeremyRand
58824dfa6c
Refactor escaping in config file read/write code. Now it uses Apache Commons StringUtils instead of RegEx.
9 years ago
luc
571bc55937
Refactoring : use StandardCharsets constants instead of hard-coded
...
charset names.
9 years ago
luc
5bbb2e1730
Ensure resource is closed when reading a full file InputStream
9 years ago
reger
7d0d19cb8e
avoid File.deleteOnExit() on temp files
...
JVM registers each file in a list regardless of already deleted and never
cleans up the list during runtime.
This accumulates to a considerable amount of mem during large crawls and/or
long uptime.
To tackle this, all temp files are now created in a subdir of java.io.tmpdir
and the jvm tmpdir property is set to this subdir, which is deleted by
code on shutdown.
Additionally let pdfParser use this tmp subdir too.
9 years ago
reger
02e4489a23
set tmpfile.deleteOnExit by default,
...
to make sure files are removed on shutdown.
9 years ago
luccioman
2f0f0180e2
Added a function to list files recursively.
10 years ago
Michael Peter Christen
413eeefed4
added character set detection library from
...
http://www-archive.mozilla.org/projects/intl/chardet.html
10 years ago
Michael Peter Christen
2beb6abeb6
disabled crazy sleep loop
10 years ago
sixcooler
72561926aa
do not overwrite yacy.conf in case of an exception
...
may be a fix for http://mantis.tokeek.de/view.php?id=180
10 years ago
Michael Peter Christen
5e31bad711
- the webgraph shall store all links which appear on a web page and not
...
all unique links! This made it necessary, that a large portion of the
parser and link processing classes must be adopted to carry a different
type of link collection which carry a property attribute which are
attached to web anchors.
- introduction of a new URL class, AnchorURL
- the other url classes, DigestURI and MultiProtocolURI had been renamed
and refactored to fit into a new document package schema, document.id
- cleanup of net.yacy.cora.document package and refactoring
12 years ago
Roland Haeder
841a28ae76
Added 'final' for all exception blocks as this helps the Java compiler
...
to optimize memory usage
Conflicts:
source/net/yacy/search/Switchboard.java
12 years ago
Michael Peter Christen
5878c1d599
- refactoring of log to ConcurrentLog:
...
jdk-based logger tend to block
at java.util.logging.Logger.log(Logger.java:476) in concurrent
environments. This makes logging a main performance issue. To overcome
this problem, this is a add-on to jdk logging to put log entries on a
concurrent message queue and log the messages one by one using a
separate process.
- FTPClient uses the concurrent logging instead of the log4j logger
12 years ago
Marc Nause
75f9568472
*) only install files from the RELEASE directory
...
*) minor changes
12 years ago
Marc Nause
3bc5ee6e3d
*) added protection against CSRF in update download page
...
(http://localhost:8090/ConfigUpdate_p.html?releaseinstall=../../test.txt&deleteRelease=Delete+Release
does not work anymore)
12 years ago
Michael Peter Christen
c5f67a5d6d
fixed a problem with local search from solr results: now all results
...
from solr are shown (again)
12 years ago
Michael Peter Christen
f8f05ecba7
- added a delete button in host browser to delete a complete subpath
...
- removed storage of default collection name - default is now "user"
- made stacking of crawl start points concurrently
12 years ago
Michael Peter Christen
b400fc7b4d
fix for file parser problem
12 years ago
Michael Peter Christen
6017691522
added an exception catch
12 years ago
Michael Peter Christen
613cf7da7f
enhancement to post argument parsing - possible fix to zero-filled
...
parameter values
12 years ago
Michael Peter Christen
a8167e6e5b
clean-up: removed unused methods in kelondro
13 years ago
orbiter
0cbda0b2b8
- replaced all length() == 0 and size() == 0 with isEmpty()
...
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
13 years ago
Michael Peter Christen
ce8d4b87d9
fixes for new eclipse 'Juno' warning 'Resource leak'.
13 years ago
Michael Peter Christen
b9d42fd9c8
using com.google.common.io.Files instead of homebrew methods
13 years ago
Michael Peter Christen
3b992e6b00
using utf8 String compression in Webstructure database
13 years ago
Michael Peter Christen
c639248c23
protection against strange answers from remote peers during search
13 years ago
Marek Otahal
f75b5e40e0
little fix in copy()
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Michael Christen
e7e429705a
- less automatic indexing after a search (needs to reset the default
...
crawl profiles)
- fix for concurrency problem in storage of serverSwitch Properties
- markup update
13 years ago
orbiter
775b44017e
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8033 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
eb9c9edb01
enhanced table method (used by almost all yacy api interfaces)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8000 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
035ebfbf3b
- performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill)
...
- this may have also (good) performance side effects on other parts of YaCy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
9170a434ed
throwing an exception again in FileUtils.copy(reader, writer)
...
OOMs could occour here and should not be ignored
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7858 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
fe0c08455b
more concurrency (enhancement) hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7759 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a36fda991e
hack to increase speed of url hash computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7751 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
4bea3f9714
hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
...
used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes).
The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
746e3c3b06
Replaced a widely-used Property Object in the httpd with HashMap<String, Object> which is not synchronized like Properties
...
A synchronization is not needed here and applies an overhead to the httpd process which is now removed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7745 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
10e2f588f8
- enhanced ybr ranking computation
...
- many speed/performance hacks
- added solr charding and new charding web interface
- added option to switch off the yacy index when using solr
- added new fail-url categories which are used to make a distinction which fail-urls to be sent to solr
- refactoring/renaming of some method names to distinguish host/url hashes better
- a large number of bug/npe fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7738 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1989ebc24b
removed more warnings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7598 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a07a1a8b1e
removed type cast warnings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7593 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
694fa3a2a5
- replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
...
- changed menu structure slightly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7583 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e1b6916423
always try to guess the size of a StringBuilder to prevent too many memory re-allocations
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7572 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
3b40b98256
*) set SVN properties
...
*) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7567 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
cb1f49d0f2
replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
993b9bc1a8
memory/performance hacks, less synchronization, better concurrency
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7544 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1110d16af9
performance hack: replaced generic row.getColBytes() call with row.getPrimaryKeyBytes() where the column is 0
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7529 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
804ae2275b
- do not delete idx and gap files if the heap is not modified
...
this change may have bugs in it which may cause damage to your existing data. please use with care.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7516 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
af87af0d4c
- removed synchronization in serverSwitch which should improve speed
...
- fixed wrong assert in network graph
- enhanced double check method in table class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7511 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago