orbiter
0819e1d397
protection against OOM cases in image parser. See also bugs.yacy.net/view.php?id=54
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7942 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
52a2b3f110
try to fix bug http://bugs.yacy.net/view.php?id=26
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7941 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2cba860693
- fix for wrong entries in NOLOAD indexing queue (that caused that urls had been only indexed based on their url and not loaded)
...
- patch for better urls to solr admin interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7938 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2842ce30d6
added synchronization in ReferenceContainer and logging for shrinking
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7937 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
cec3836e73
added reference limitation to IndexControlRWIs_p.html servlet
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7936 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
ecb4986b38
refactored stuff from last commit to ReferenceContainer
...
see: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3353&p=23163#p23163
the limiting of references is disabled per default
to enable this set yacy.conf - index.maxReferences to a value of e.g. 100000
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7935 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
f7c4abfdd7
limit references per blob & term to the 100.000 youngest
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7934 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
28f5b79deb
added a fast mass-deletion method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7933 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a70dbce41c
added another file tool class to yacy-cora
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7932 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
49e5ca579f
added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e02bfbde56
fix for solr url
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7930 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
f1ori
41e146116a
fixes size of document in case the server doesn't give the size in the header
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7929 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
580beb12a5
reverting SVN 7863; the synchronization was needed and no synchronization causes repeated DNS lookup for the same hosts
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7928 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
44d6416e2d
ensure termination of shrink()
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7927 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
52230a6864
replaced catching of Exception with Throwable, which catches also Errors
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7926 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
877eaf6bcb
switched off logging of org.apache.http which was suddenly switched on by default (??)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7925 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e1a3d609aa
moved merger object from Segment to IndexCell to enable a correct shutdown sequence. This solves a bug where yacy cannot be shut down during an index merge that appears during the shutdown phase.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7924 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
2cf61a40ce
fixed a bug from 7856, where Snippet returned an error by mistake when Metadata was found
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7921 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
610b01e1c3
- added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
...
- some refactoring for mime type discovery
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7919 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
3da21c4266
protection against starting of a (second) yacy peer while another one is already running on the same port
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7917 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
b5252ef91f
added new word recommendation library in DictionaryLoader_p.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7913 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1c007188ad
bugfixes in html parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7912 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
231074bf0a
fixed a parsing bug by reverting SVN 7766
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7910 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
30a8a2f76b
*) replacing one ugly hack with an extended ugly hack ;-)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7908 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
95379ce0b1
*) should fix some problems with RSS Importer (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3253 )
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7907 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
24e76a7b69
*) Replaced occurrences of "Wikimedia" with "MediaWiki" where applicable. (Thanks to the folks of 0x20.be for pointing this out.)
...
*) Added description of where to place MediaWiki dump for import.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7905 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
d40a177c05
Generation Memory Strategy fine tuning
...
add some log-output in termlist_p
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7904 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
839f407fe4
Generation Memory Strategy fine tuning:
...
- some more optimism on requests of unknown values
- avoid a premature value of 0 byte available
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7903 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
3e6767d66c
limitation of reference evaluation (protection against crawler pits)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7902 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a5541751a8
- added memory computation to termlist_p.xml
...
- added option to delete terms in termlist_p.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7901 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
45e497a9bd
fix for term iteration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7900 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
5dd2efc9a2
- bugfixes in html parser
...
- new fields in solr
- extended file viewer to debug parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7897 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c595a6a47
added new methods to count the number of objects in RWIs. lots of refactoring was necessary to introduce new Rating class and to unify naming of methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7896 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
75df87832c
refactoring/better naming of methods and classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7895 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
9f9f634de2
fix in search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7894 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
5f8a5ca32d
- not doing merge-jobs while short on Memory
...
- using configuration-values of crawling-max-filesize also for snippetfetching and loading files into Index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7893 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
965fabfb87
enhanced sorting speed (affects all DB operations)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7892 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
41a8ee4569
added iterable implementation in KeyList
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7891 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
22d69a6368
refactoring in cora: added sorting package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7890 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
51cf697acd
refactoring: moved all score-related classes to new ranking package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7889 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a0d5e7b6e6
added new score comparator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7888 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
169236c6d9
almost revert changes in this class of 7880 and 7882
...
since MemoryControl does handle negative value requests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7887 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
4fec99115b
Implementation of strategies for controlling memory resources.
...
You can toggle between previous (standard) and new (generation) strategy at PerformanceMemory_p.html.
The generation memory strategy is implemented with the objective of running more robust
but with the cost of early stopping some tasks (eg. dht) while running low on memory.
This new strategy does respect the generational way a heap is organized on most used jvms.
These changes run fine on my 3 peers for weeks now, but as I'm human, I may fail.
Please be carefull using generation memory strategy and report errors by naming
OS, jvm and java_args.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7886 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
63a375b801
do not look at external dtd, cause this make this reader stay forewer(?) on on faulty dtd-locations
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7885 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c58af6874
- added a short memory status simulation mode
...
- added a button in PerformanceMemory_p.html to set the simulated short memory status
- bugfix: added a missing lowercase in KeyList
- better concurrency in loader dispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7883 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c64faf41e2
addon to svn 7880
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7882 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
7b7a196243
ignore cookies in httpclient per default
...
disable cookiestore,cause the default one caused segfaults on my peers
this does not harm use of cookies via YaCy as proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7881 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
06408a9428
since many POST-requests come as gzip they report a contentlength of -1
...
request memory of -1 * 3 look useless to me
so I added some megs to it - even correct report of contentlength should not be harmed by this
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7880 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
411ed159f8
do some extra sleep while running low on memory
...
(1 sec. per outofmemoryCycle)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7879 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
9ab0ba41e2
using GzipDecompressingEntity from httpclient instead of our own
...
(was just fixed there in httpclient-4.1.2 and does a proper job)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7877 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
07f5954570
try better handling of corrupt blobs
...
@developer: please revert if I'm wrong
see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=3334
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7872 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
f970670a7c
- bugfix in ServerScannerList
...
- speed up of generation of scanner list avoiding forced dns lookup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7871 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
8e03b8ee8b
better integration of server list in interactive search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7870 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0a3ab7da1b
do not sort concrrently the same array
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7868 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
594d8f546a
#cccamp11 maintenance fix: anons may find up to 1000 items in interactive search (was: 100)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7866 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
eb14111200
encapsulate potential expensive objects in TextSnippet to allow GC them asap
...
this reduces chance of OOMs at massive search & snippet-fetching
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7865 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0d33cf352b
removed synchronization in DNS resolve (solves a problem when loading snippets but in the past concurrent dns requests also caused deadlocks. but this is many years ago and we will give it another try)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7863 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e3fc1efbef
performance hack and ensuring termination in serverAccessTracker. cause:
...
"Session_:53600#0_POST /yacy/hello.html HTTP/1.1" prio=10 tid=0x2322b000 nid=0x3ba7 runnable [0x03d3e000]
java.lang.Thread.State: RUNNABLE
at java.lang.Long.valueOf(Long.java:557)
at de.anomic.server.serverAccessTracker.clearTooOldAccess(serverAccessTracker.java:113)
at de.anomic.server.serverAccessTracker.cleanupAccessTracker(serverAccessTracker.java:75)
- locked <0x3bda2ae8> (a de.anomic.server.serverAccessTracker)
at de.anomic.server.serverAccessTracker.track(serverAccessTracker.java:125)
at de.anomic.server.serverSwitch.track(serverSwitch.java:542)
at de.anomic.http.server.HTTPDemon.parseRequestLine(HTTPDemon.java:641)
at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:491)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at de.anomic.server.serverCore$Session.listen(serverCore.java:757)
at de.anomic.server.serverCore$Session.run(serverCore.java:651)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7862 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
44d74f8f89
performance hacks for seed generation (because thread dumps showed multiple occurrences at these code points)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7861 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
5cd07d7f84
early freeing resources on deleting index reference if search-verification fails (aka Switchboard.cleanupJob)
...
doing same thingy on other methods of touched files as well
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7860 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
a311596881
finishing up my commits (7855-7858) which could be helpful for
...
not declaring inside loops (helps GC of some VMs)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7859 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
9170a434ed
throwing an exception again in FileUtils.copy(reader, writer)
...
OOMs could occour here and should not be ignored
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7858 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
c0caca57e3
stoping thread for fetching searchresults if running short on memory
...
- in most cases at least one thread stays alive for getting the results
- fewer threads should do the work with less resouces, but much slower then
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7857 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
ce248cc8dd
less byte-arrays of response-content, less byte-array <-> stream conversation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7856 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
59b767eebd
stop loading via http at defined maximum of bytes - even size is unknown before loading
...
using max-file-size of type int for parsing documents
(since content is used as byte-arrays, 'integer' should be maximum)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7855 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
916d79111e
Runtime.maxMemory() DOES change @ runtime:
...
I wondered getting Total-ram > Max-ram and MemoryControl.available() < 0
MemoryControl.available() < 0 causes some errors where its value is used for dimension of buffers for eg.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7852 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
3a5fa73008
* revert parts of previous commit, because it breaks the trickle-feature
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7851 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
6e79675ff3
* use gzip-encoding in more cases
...
* send Expire-Header for static content
* should improve webserver-performance for slow connections
* fixes #37
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7850 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
299af4943c
added another memory protection hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7849 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1f300217f8
more protection for the cleanup thread
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7848 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d13103a0a7
changed the way how the index cache is flushed: do not flush when a put was made because that could cause that many put calls synchronize for a long time when the dump or a merge is performed. Instead a watchdog thread is doing the dump and therefore puts cannot block any more which is good when a put happens during a search result preparation.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7847 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b06faab9d3
do not allocate a StringBuilder object in case that there is not enough memory for that
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7846 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6a6f27eaf3
do not sort arrays again if arrays are already sorted
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7845 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3d043ce9d6
- refactoring
...
- do not start worker threads in Array class if concurrency is not used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7844 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
48b78e9ff4
disabling concurrency in new sort since that is not working yet correctly
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7843 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
62ac73a108
fixed bugs and deadlocks in core database indexing structures:
...
- added new Array class that contains an abstraction of the java Arrrays class which replaces the home-brew quicksort algorithm.
- the new class is about four times slower than the old one, but it works correct (the old one had errors)
- fixed a synchronization problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7842 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
aff875baef
smaler ping-entry @ ProfilingGraph
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7841 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1912d0cccc
changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7840 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bb8e3f8523
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7839 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
be15874be1
added request line in http which can support better debugging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7838 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
11dc653de3
added a visualization of peer pings to the performance graphic
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7837 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3a191cdf14
because newbies are scared about the memory consumption in the performance graph and arguments about high memory consumption according to bad knowledge about java garbage collection techniques, the memory display had been removed from the performance graph shown on the Status.html page. The memory graph can still be seen on the Performance page where the memory graph is just like it was.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7836 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
cominch
09bb7a390c
do not replace malformed or invalid URLs in urlproxy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7835 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
52d799e7c8
fix for solr auth
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7833 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
9eb8e9acd9
no error message about missing browser in headless environments
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7832 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d3c89b90ce
temporary adding the old httpclient-3.1 again because the solrj classes need them. should be removed as soon solrj supports httpclient-4
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7831 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bd99969758
fixed bad query
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7830 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
768c59740c
- replaced solrj 3.1 with solrj 3.3
...
- updated also slf4j
- added authentication for solrj
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7829 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
c7b95e8c81
*) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
...
*) Corrupt crawlProfilesPassive.heap would cause crawlProfilesActive.heap to be deleted. Don't know if this ever happend, but will not happen anymore.
*) Cleaned up a little bit.
*) Added some comments.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7827 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6d2e252bcf
fix for:
...
java.lang.NullPointerException
at net.yacy.kelondro.index.RowCollection.<init>(RowCollection.java:97)
at net.yacy.kelondro.index.RowSet.<init>(RowSet.java:48)
at net.yacy.kelondro.rwi.ReferenceContainer.<init>(ReferenceContainer.java:58)
at net.yacy.kelondro.rwi.ReferenceIterator.next(ReferenceIterator.java:69)
at net.yacy.kelondro.rwi.ReferenceIterator.next(ReferenceIterator.java:43)
at net.yacy.kelondro.blob.ArrayStack.merge(ArrayStack.java:1023)
at net.yacy.kelondro.blob.ArrayStack.mergeWorker(ArrayStack.java:922)
at net.yacy.kelondro.blob.ArrayStack.mergeMount(ArrayStack.java:869)
at net.yacy.kelondro.rwi.IODispatcher$MergeJob.merge(IODispatcher.java:267)
at net.yacy.kelondro.rwi.IODispatcher$MergeJob.access$300(IODispatcher.java:239)
at net.yacy.kelondro.rwi.IODispatcher.run(IODispatcher.java:180)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7822 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
719777b2a7
replaced method to call getUsableSpace using reflection with direct call since we now use java 1.6
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7821 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2d4bb139d3
- added counting of links with noindex tag for solr index
...
- bugfixes for solr index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7820 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
892caccdca
added default configuration in ConfigurationSet in case of new values
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7814 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bda3eec0ff
added parsing of canonical link element to html parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7812 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b6f09a475d
- added an index profile editor in the /indexFederated_p.html servlet for solr indexes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7811 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b666a929e7
fixed Semaphore handling in case of interruptions
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7809 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
de7a054d77
added parser for such files like the new solr.key.list
...
it parses text files with the following syntax:
- all lines beginning with '##' are comments
- all non-empty lines not beginning with '#' are keyword lines
- all lines beginning with '#' and where the second character is not '#' are commented-out keyword lines
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7808 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
a17351dcfe
* navigation bar for filetype constraints
...
javascript interpreted backslashes from urlmask as escaping and didn't forward them to yacy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7806 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
96957375cc
* fix url proxy for relative links and chromium
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7805 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
9ebc75db4b
fix for channel authorization
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7803 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
267290a821
removed the semaphores from the cache dump process because I believe some of the semaphores may be lost somewhere which then causes that the cache is never flushed and then the peer dies from a OOM. The re-introduced synchronization may not be the best solution but should ensure that the caches are flushed.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7802 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6d9e5865ee
faster appearance of search result page (but complete search time is the same)
...
this was inspired by http://bugs.yacy.net/view.php?id=37
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7801 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f7ca84cfc0
enhanced template engine
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7800 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d8072d1866
added more info to DNS cache in /PerformanceMemory_p.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7798 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f803da8aae
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7797 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
84c9658644
added a file type navigator
...
added a protocol navigator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7795 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
31283ecd07
- added a search option to filter only specific network protocols. i.e. get only results from ftp servers. Just add '/ftp' to your search.
...
for example search for "passwd /ftp". This can also be done with /http /https and /smb
- fixed some search throttling processes that should protect your peer against search DoS or strong search load
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7794 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
4b425ffdd2
fix for http://bugs.yacy.net/view.php?id=41
...
added another RSS channel "PROXY". the rss feed for peer news filters this channel if there is not an authorized access on that channel
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7792 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
7db208c992
performance hacks: more pre-allocated StringBuilder
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7790 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
87bd559c42
fixed warning
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7789 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
07e89a7ae5
added @Deprecated
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7788 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
9706fc55aa
enhanced content scraper (should discover urls much faster in case of very large plain texts)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7787 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
996f0a8764
disabled assert in Base64Order which eats away too much performance during testing with -l
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7786 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f667b9c289
enhanced identificator: using AtomicInteger for counter
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7785 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
16327d1cbe
unwrapping of call depth (one call less for UTF8.String)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7784 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f30d36b101
enhanced template engine
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7783 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
aa6c32d753
enhanced UTCDiffString
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7782 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
f87865a50b
always shutdown log, fixes zombie processes in init stop script
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7780 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
115abc8917
- more attributes for search progress bar
...
- moved cache strategy to cora package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
7bfa6bb4b6
prevent getting a yacySeed from zero-length-hash-string by chance
...
(for eg.: proxy-crawls got displayed as initiated by some other peer)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7776 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bce280a308
update on options for interface graphics
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7775 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
77fe69395d
added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7774 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
df1725ef43
re-enable POST over proxy, which didn't work since update to httpcore-4.1.1
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7772 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2683162ec5
- added more options to access grid picture, web structure picture and network graphics
...
- remove test class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7770 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0c1b29f3c9
- applied many small performance hacks
...
- added a memory limitation in the zip parser and the pdf parser
- added a search throttling: if there are too many search queries are still to be computed, then new requests are not accepted for some time. if after a one second still no space is there to perform another search, the search terminates with no results. this case should only happen in case of DoS-like situations and in case of strong load on a peer like if it is integrated in metager.
- added a search cache deletion process that removes search requests in case that throttling happens
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7766 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
900dacbf97
* improve link rewriting in proxy-url
...
* only rewrites links, which are in current search domain
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7765 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
dc855d881b
* further improve proxyurl
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7762 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a7a6b392f5
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7760 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
fe0c08455b
more concurrency (enhancement) hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7759 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0e9a99cb05
another resource hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7758 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
535b6b953c
more hacks to omit superfluous string object allocation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7757 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
87082f407e
less String object creation during search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7756 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
ab5a16b957
lesse memory occupation during ranking and faster host navigator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7755 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1489ebeedf
one more hack to free ram for search events
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7753 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3c2b994bd6
write access/load time to solr index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7752 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a36fda991e
hack to increase speed of url hash computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7751 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
ddcc333acc
* fix negative result counts
...
results sorted out by add to RankingProcess were counted in
sortedout-counter, but were not added to remote_indexCount nor
local_indexCount
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7749 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
fa734bdf9f
better memory protection in search logger
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7748 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
dbea40d536
- changed snippet fetch strategy logic: do not check if entry is in cache. This should reduce IO load on the HTCACHE which is a showstopper during large number of search requests
...
- forced a possible short memory status when a search is started to flush caches that may cause search-heaps with resource contention effects
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7747 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
4bea3f9714
hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
...
used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes).
The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
746e3c3b06
Replaced a widely-used Property Object in the httpd with HashMap<String, Object> which is not synchronized like Properties
...
A synchronization is not needed here and applies an overhead to the httpd process which is now removed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7745 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
14e1666b21
* fix replacing regexes in url proxy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7742 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e28bd0d038
fix for some possible causes of memory leaks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7741 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
09ba6814c0
- non-blocking word hash computation with dynamic digest object generation (this was important!)
...
- (very) small performance enhancement in did-you-mean
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7740 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
10e2f588f8
- enhanced ybr ranking computation
...
- many speed/performance hacks
- added solr charding and new charding web interface
- added option to switch off the yacy index when using solr
- added new fail-url categories which are used to make a distinction which fail-urls to be sent to solr
- refactoring/renaming of some method names to distinguish host/url hashes better
- a large number of bug/npe fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7738 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bd55dcee50
- commented out experimental distributed ranking loading
...
- less threads for blocking threads
- disable all threads for DHT transmission for networks with zero peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7737 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
98c4d25185
fix for endless loop in FTP crawling, see http://bugs.yacy.net/view.php?id=32
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7736 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d1dbbd956a
always use a template method cache even if the template cache flag is set to false. This flag is only used to make dynamic updates to the template files, to not dynamic updates to the rewrite methods (which is not possible without recompiling). low memory usage is guaranteed by the usage of soft references which are dropped before an OOM is thrown
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7735 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0d040ff6bb
fix for bug 0000036: no crawling of https pages
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7734 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3ed4a09368
small features, some bug fixes and performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7733 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago