orbiter
cb1f49d0f2
replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
9d366ee9d7
*) removed unused code (I assume that most of the code was really dead, but if you need any of the classes, tell me and I will put it back in.)
...
*) minor code cleanup in ViewLog
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7557 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
7138f4036b
less synchronization, better thread dump tool
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7556 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
8d14916c74
more patches for a better out-of-memory management
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7555 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
c2c5b12882
- even less memory for circle tool
...
- background thread for bookmark initialization: this uses a DNS lookup which may cause long waiting times during startup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7554 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6badc5e558
reduce size of static memory usage: use short instead of int in circle coordinates cache
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7553 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
ce0c8247fc
removed (most probably!?!) superfluos System.err output
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7552 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
799c534935
one more patch again OOM during secondary remote search
...
see also: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=3202
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7551 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f8d0454c53
small bug fixes and experiments with search speed enhancement
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7549 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
993b9bc1a8
memory/performance hacks, less synchronization, better concurrency
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7544 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
65bcc60808
stupid me: revert placement of closing connection which caused unclosed connections
...
+ reuse sockets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7543 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
e3d75d6cd5
Not storing external header in an Header-Array and reduce a loop for its conversion.
...
Ensure connection close if a OOM is thrown.
Ensure setting resolved host is set at the request.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7542 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
42d90664f3
- fixed a memory leak in the httpc.post method (no finish)
...
- patched some more memory-saving relevant code
- some more minor bug fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7541 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
38dce547c0
better concurrency (less locking on date formatting) more logging and minor bug fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7540 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
89d337841c
more logging for OOMs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7534 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b1781d7aae
some more performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7533 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b2f147d28e
performance hack: excluded map encoding in many cases from synchronization block, especially when doing an iteration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7532 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
5e186e0122
continuing the fight against deadlocks during time formatting: better caching.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7531 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
dec24244cf
added convenience class to generate UTF StringBody objects with a default UTF8 charset.
...
Reason: if this is not used in StringBody-Class initialization, a default charset name is parsed.
This is a synchronized process and all classes using default charsets synchronize at that point
Synchronization is omitted if this class is used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7530 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1110d16af9
performance hack: replaced generic row.getColBytes() call with row.getPrimaryKeyBytes() where the column is 0
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7529 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
19b2a50578
- enhanced date formatter cache
...
- added more instances of formatter objects to different classes to make them independent in case of lockings that may applay during synchronization of the date formatter object (date formatting is not thread-safe and must be synchronized therefore)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7528 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
48a61c39a3
speed hacks in BLOB ArrayStack:
...
- more concurrency if possible
- less threads if no concurrency necessary
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7527 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a92d80a545
performance enhancements using an alternative to a insensitive collator (a complex string compare):
...
- less synchronizations
- better speed
..at most important and commonly used classes: http headers, url parsing and html parsing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7526 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
bcea497644
next try to fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=3193&start=0&sid=b98aa9a7466397602b436eb45f4a9d39
...
tested proxy, crawl, updatedownload - please do further testing!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7524 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
ad7fcb9d61
Enhanced Base64Order transformation: less overhead (transformation between StringBuilder and byte[])
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7523 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f95e50ec3d
more explanation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7522 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bb36bf841a
emergency commit (sorry sixcooler for not waiting) because without that automatic updating peers would not be able to do the next update.
...
Please see http://forum.yacy-websuche.de/viewtopic.php?p=22059#p22059
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7521 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
8ad4e10491
fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=3193&start=0&sid=b98aa9a7466397602b436eb45f4a9d39
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7520 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0ce17d823a
- fixed bug in ordering
...
- fixed ConcurrentModificationException in set join
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7519 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
dec4f36700
- fix for missing favicons in search widgets
...
- fix for bad digest/hash computation in case of interrupts to class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7518 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
804ae2275b
- do not delete idx and gap files if the heap is not modified
...
this change may have bugs in it which may cause damage to your existing data. please use with care.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7516 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e3ef4e3021
- increased default peer ping time from 2 minutes to 1 minute
...
- filtering out too old peers when reading seed lists (limit is now 240 minutes)
- added concurrent host names resolving in front of the http client because the http client uses the java built-in DNS resolve which is not multithreading-safe (i have seen deadlocks in thread dumps showing that this bug in jdk is still there)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7515 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
5e45ded8e2
- removed locks from WordReference
...
- refactoring of HeapReader/Writer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7514 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
cd19d0517e
added dns resolve to HTTPClient POST using a dns cache to prevent that that not-thread-safe built-in dns cache inside apache http client is used
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7513 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
af87af0d4c
- removed synchronization in serverSwitch which should improve speed
...
- fixed wrong assert in network graph
- enhanced double check method in table class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7511 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
57e6728cb7
- removed usage of /etc/alternatives/www-browser because of problems with lynx, see:
...
http://forum.yacy-websuche.de/viewtopic.php?p=21959#p21959
please look if the browser that is linked with /etc/alternatives/www-browser can be detected and insert call again if
it can be made sure that this does not call lynx
- replaced severe warnings with just warnings in yacyClient
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7506 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
91eeaf2cff
fix in ftp client
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7505 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e717bf74ba
more logging, more care about OOMs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7503 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d84b4a072e
healing for some OOM problems
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7502 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
4aa406fb0f
added log output to find bug in url parser for short hosts
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7501 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
82f262f685
- enhanced circle drawing speed
...
- beautified 'moving dot' feature (using smaller and correctly positioned dots)
- added moving dots to DHT transfer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7500 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
29dc416ac6
more animations in graphics. See network and access picture.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7498 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
93b9c4fbc9
added missing file for latest commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7497 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3e380c51b6
update to browser start with linux
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7486 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6083f2f171
fix for (false) oom
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7484 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b35fda43ea
more changes to headless mode; now non-headless mode is used when:
...
- YaCy runs on Windows
- YaCy is started with the -gui option
in all other cases YaCy runs in headless mode
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7481 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6c52e31993
new methods to open a browser
...
- if YaCy is started with the option -gui, it is not in headless mode. Then the java 1.6 browse method is used if all other methods fail
- in linux, the path /etc/alternatives/www-browser is used if no firefox is installed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7480 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
5892fff51f
introduction of dht-burst modes: this can expand the number of target peers in some cases where a better heuristic is needed. The problematic cases are either when a muti-word search is made (still a hard case for our term-oriented DHT) or when a network operator wants that all robinson peers are asked. We therefore introduced two new network steering values that switch on more peers during the peer selection. Because the number of peers can now be very large, the number of maximum httpc connections was also increased.
...
Please see new coments in yacy.network.freeworld.unit for details of the new DHT selection methods.
The number of maximum peers is now not fixed to a specific number but may increase with
- the partition exponent
- the number of redundant peers
- the robinson burst percentage
- the multiword burst percentage
The maximum can then be the number of senior peers (all visible peers).
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7479 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
4588b5a291
- fixed document number limitation for crawls that restrict the number of documents per domain
...
- some restructuring of the document counting and logging structures was necessary
- better abstraction of CrawlProfiles
- added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation
- more refactoring to get the LibraryProvider more clean
- some refactoring of the Condenser class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
64f32e8f00
*) replaced all IPs in IP filters for proxy with the proper regular expression
...
*) some cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7477 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
3e8b72be50
update to httpclient-4.1 - sorry forgot some
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7474 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
74b22dfa24
*) fixed bug which affected blacklist entries which consisted of domain _and_ path parts
...
*) minor cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7471 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
fe93caac5a
added flags and administration options to show advanced search and to show search result attributes (for each search result)
...
Administration can be done at ConfigPortal.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7466 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
431f780f41
patch for bad data in url metadata
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7464 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
5905f912c5
replaced more double types with float
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7462 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0cdfb82963
replaced more appearance of double values by float values
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7461 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
eb12e15738
moved all Double values to Float values because of
...
http://www.exploringbinary.com/java-hangs-when-converting-2-2250738585072012e-308/
YaCy does not really need double-precision floating point computation anywhere, so this should not affect any feature
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7460 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
982aa689ef
* fix StringIndexOutOfBoundException in WebStructureGraph
...
* add better escaping to saveMap and loadMap
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7458 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
88773e4daa
changed the default port from 8080 to 8090
...
see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
991b92f4ae
enhanced network graphic
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7446 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
a321c7673d
* adminAccountForLocalhost only for localhost
...
* yacy crawls local domains also, if no password is set (the interface is already protected)
* it's not required anymore, to set a password in intranet mode
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7436 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
hermens
930cb412dd
Let SHORT_MILSEC_FORMATTER make a new formatted String every millisecond
...
see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=3103
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7434 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
090c73e32e
catch a OOM in HeapReader iteration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7433 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
48463c4507
*) General private License? ;-)
...
*) minor code changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7432 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6c1b14c8e1
- more control in access tracker: count number of returned search results (not only info how much is in the index)
...
- extended query params for this
- enhanced cora
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7430 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
9f38c0023d
*) Minor changes, mainly cleaning up a little bit, no functional changes.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7428 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
54e77e6255
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7426 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
feefe17568
npe assert fix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7424 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
733903f2c9
fix for http://forum.yacy-websuche.de/viewtopic.php?p=21489#p21489
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7422 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
24e4126eee
added JSON parser code from json.org (added generics to it)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7421 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
10ae8d961b
- cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
...
- cleaned up (removed special code and documentation for 27c3)
- added remote search functions to be used within cora
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
lotus
0e54233408
UPnP: map port again if we are not reachable (e.g. when router rebooted)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7419 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
lotus
b1484299b2
same units for memory observer configuration (MiB)
...
old setting for DHT (RAM) will be lost after update
can be set on /Performance_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7418 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
387db84087
maybe found bug in non-working index dumper
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7414 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a4c9d27287
- moved some variables from Stwitchboard to new class AccessTracker
...
- added a limitation in access tracking to delete queries which are older than 10 minutes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7410 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
e4aabaa1c3
* fix negative filelength for files >2G
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7408 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
cdfe8afe3f
fix for really bad table iteration implementation: reduction of IO
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7407 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
9eae33f886
*) Ooops...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7406 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
a001e8075c
*) minor enhancements
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7405 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
11ea966f9e
*) added SID file (Commodore 64) sound file parser
...
*) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7403 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b2ed4cfaf8
more small bugfixes and light refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7401 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3ca06d6290
patch for http://forum.yacy-websuche.de/viewtopic.php?p=21460#p21460
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7399 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
903c824c2c
- allow only scanned resourced with granted status
...
- increased time-out when scanning an ip range
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7398 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
936e976c23
*) added FreeMind ( http://freemind.sourceforge.net/ ) mindmap parser
...
*) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7397 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
3d95981f7d
*) cleaning up the code a little bit
...
*) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7396 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
2a6499364d
*) minor changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7395 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
c0274bd123
*) minor changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7394 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
fe46536f6e
enhanced network scanner (less name resolving during scanning and no name resolving during search)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7392 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e753027c43
fix for http://forum.yacy-websuche.de/viewtopic.php?p=21439#p21439
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7390 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bf4ef1513e
- fix for map view
...
- remove some UNRESOLVED PATTERN
- maybe a fix for non-flushing cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7389 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6b70393d1d
- new java version 1.6
...
- replaced old gif animator by java 1.6 gif animator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7388 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e88c428008
fix to ftp loader
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7387 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
59b70a5a92
another fix to the ftp crawler: now correct directory listings according to rfc2640 (path with spaces) and better title names for such files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7386 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
9b25a33fd9
- fixed numerous bugs
...
- better document names
- fixed problem with ftp crawling
- added automatic removal of search results from services that are not online according to the latest network scan: this does not delete the index but just does not show them. after the next network scan when the server is available again, the results are again showed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7385 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
7bdb13bf7f
more fixes to smb crawling: better file names
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7384 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
94c48500cc
several fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7383 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0ac7311a62
fix for token parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7382 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
58b59f9bc8
- a collection of bug fixes and some redesign of the Scanner class
...
- fixed smb crawling
- added smbget to download script generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7381 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
c288fcf634
redesigned CrawlStartScanner user interface and added more features:
...
- multiple hosts for environment scans can be given (comma-separated)
- each service (ftp, smb, http, https) for the scan can be selected
- the scan result can be accumulated or refreshed each time a network scan is made
- a scheduler was added to repeat a scan and add all found urls to the indexer automatically
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7378 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
9d2159582f
* fix system update if urls are in blacklist (for example for very general blacklists like *.de)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7375 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
56264dcc17
- added CamelCase parser to MultiProtocolURI: generate better to-be-indexed words from urls
...
- integrated new parser into loader processes: enrich document parser
- fixed a concurrent modification exception in kelondro iterator
- hand-over of document size from crawler to indexer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7374 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
99a7fe87f9
- removed old intranet scanner (the generic scanner now completely subsumes the old one)
...
- added information about granted access
- enhanced servlet design
- added submit-feedback (because it is a long-running task)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7372 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
acab6801d9
added new network scanner
...
- you can scan any ip or host in the internet for services
- this replaces the intranet scanner
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7371 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
14e4fae8e9
fixes to ftp client
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7369 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a563b05b60
enhanced crawler:
...
- added a new queue 'noload' which can be filled with urls where it is already known that the content cannot be loaded. This may be because there is no parser available or the file is too big
- the noload queue is emptied with the parser process which indexes the file names only
- the 'start from file' functionality now also reads from ftp crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7368 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
c36da90261
added a very fast ftp file list generator to site crawler:
...
- when a site-crawl for ftp sites is now started, then a special directory-tree harvester gets the complete directory structure of a ftp server at once
- the harvester runs concurrently and feeds into the normal crawl queue
also in this:
- fixed the 'start from file' crawl function
- added a link detector for the html parser. The html parser can now also extract links that are not included in <a> tags.
- this causes that a crawl start is now also possible from clear text link files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7367 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
db99db4be9
some redesign of the search-fail-response mechanism:
...
when a search fails for a single url because the snippet cannot be generated, then the url reference is deleted from the index. This mechanism was redesign and enhanced. The process now also writes into the work tables into the table searchfl to prepare a re-indexing mechanism.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7364 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
4915d1781a
* use local backup-file, if remote network-definition is not availible
...
* resolve single point of failure in networks, managed by central network-definitions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7363 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
4e2c14efbb
fixed bugs in parser and ftp client
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7360 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d78e322e84
added a directory-structure reader to ftp client
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7359 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f0651e5f2f
added image search to yacyinteractive.html
...
this causes that the search result view switches from list format to image preview format when a search is restricted to png, gif or jpg documents
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7358 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b769cce433
- added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only
...
- enhanced the pdf and torrent parser: better documents titles
- enhanced the ftp client: more time-out time
- fixed bugs in json for search results
- enhanced yacyinteractive.html: added a file type navigator and a download-script generator for search result files
Please have a look at yacyinteractive.html: this will become the hacker-download tool for 27c3!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7355 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
21e84539e8
one more fix to Domains
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7353 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e192d61972
fix for latest commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7352 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
22453b13ad
implemented local host address discovery as posted in
...
http://forum.yacy-websuche.de/viewtopic.php?p=21310#p21310
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7351 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
cc6499bf8d
- added http://blekko.com as search heuristic (like scroogle). This was easy since they deliver their search results also as rss feed
...
- renamed YaCys search result modifications keywords for RECENT, NEAR and language: to the blekko slashtag naming scheme. YaCy now supports the following blekko-like slash built-in slashtags:
/date
- for search results ordered by date (most recent up)
/near
- for search results where search words appear near to each other (closest up)
/language/<lang>
- for a sorting by language where the wanted language gets up. Example: /language/de
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7350 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a9f754c45f
removed unused CR accumulation and distribution process
...
this was never used and extended in the last years. The resulting YBR ranking criteria
is still a good idea and will be used in the future. Possible generation methods for YBR
ranking are:
- "trust-rank" using the link structure as can be discovered in a single crawl (idea from FSCONS)
- "block-rank" calculated from the local link structure
- a distributed "block-rank" using the xml API to the link structure from other peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7349 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3d945bb442
fix for ftp client: suppress bad directory listing time-out
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7348 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d4a1a1850b
removed warnings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7347 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
9b3fae9496
*) cleaning up the code a little bit
...
*) program to interface, not implementation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7345 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
321eb012fe
removed two warnings and reverted one change
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7340 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
fd74bc388c
* fix small bug in sessionid-removal
...
* add testcase for seesionid-removal
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7333 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
eb79b952ef
*) cleaner code
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7331 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
38fdf43587
*) renamed classes according to standard Java coding conventions
...
*) String.isEmpty() was introduced in Java 1.6, but we still use Java 1.5
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7330 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
025e3f4790
*) renamed classes according to standard Java coding conventions
...
*) removed unsused code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7328 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
a025b1da89
* fix bug when browsing local filesystem (e. g. repository) with yacy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7323 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
sixcooler
b87bf88ac8
using less memory on merging and rewriting blobs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7317 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
f1ori
d62e449a11
* fix FilterEngine, forgot comparision-operator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7314 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
441fbc26e2
security patch for WeakPriorityBlockingQueue (produced a deadlock)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7307 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
5dcb838293
- removed thread overhead when calling dns services
...
- fixed localsearch (changed it by accident)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7306 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4c50d3428e
smaller file size for array stacks to support smaller deletion sizes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7305 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
becc463d8a
enhanced did-you-mean
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7300 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
93c535d111
fixed http://forum.yacy-websuche.de/viewtopic.php?p=21113#p21113
...
fixed a concurrent modification exception during search and a time-out problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7298 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
04932dc268
added rdf data structure for rss feeds
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7297 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
84f2953cd8
fix for rss loader / rss type recognition
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7296 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4c72885cba
added a sitemap entry parser and loader for sitemaps
...
(a recursion if a sitemap refers to another sitemap)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7295 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
445619f3ec
added a submenu ConfigHTCache_p.html to set the size of the HTCache separately from the proxy configuration.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7291 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
sixcooler
85c65475fa
smal but important correction of last commit @ HTTPClient
...
(if there is a response it really should be taken to its end)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7290 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
f1ori
acd93b1b31
* add failsafe mechanisme to domainlist retrieval
...
domainlist is saved locally, if none of the given urls in network.unit.domainlist
could be retrieved, the file from the last boot is used instead
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7289 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
70c95608d4
Added CORS Access header for yacysearch.rss output
...
used some of the recommendations from Copro:
http://forum.yacy-websuche.de/viewtopic.php?p=21015#p21015
Original Request:
http://forum.yacy-websuche.de/viewtopic.php?p=20829#p20829
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7288 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
f1ori
def4253555
* add option to network definition to provide a domainlist (syntax like in blacklists)
...
* crawler and search allow only urls matching one in domainlist (if list is provided)
* this may be useful to prevent dedicated networks from being "polluted"
* FilterEngine is improved Backlist-object, Blacklist may inherit from FilterEngine in the future
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7285 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
fb92f9ae8e
added mime type image/jpeg (image/jpg is wrong but it is left here because it does not harm and this error also exists in configuration of web servers)
...
see also:
http://forum.yacy-websuche.de/viewtopic.php?p=21129#p21129
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7279 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
155d556568
- better memory protection
...
- more logging
- little bit of refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7278 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
f1ori
7d8de34778
* add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7276 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
e3e3b49d52
- enhanced main release recognition
...
- yacybot user agent now includes the yacy network name (not the peer name!)
- refactoring and clean-up (mostly turned tab into spaces)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7266 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
58e74282af
added a word counter statistic in condenser which is used by the did-you-mean to calculate best matches for given search words.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7258 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
863065abc4
added user agent logging to access tracker
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7256 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
ed4371dcf3
enhanced navigation implementation and enhanced tag cloud computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7252 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
ca738ac924
- added a tag cloud to search results (using the topics)
...
- some refactoring of score classes
- added default package for new classes add_ymark and delete_ymark
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7251 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
e4d561971e
added more score cluster options and made score cluster usage more transparent
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7248 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago