Michael Peter Christen
407fdf6968
more bug fixes and performance hacks for search process
13 years ago
Michael Peter Christen
a1fe65b115
performance hacks
13 years ago
Michael Peter Christen
2fe207f813
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
5aee19daa4
added show from cache in search results (not yet finished)
13 years ago
Michael Peter Christen
5e562dcdb7
adopted vocabulary usage within anotation/naviagtion feature of search
...
to new SimpleVocabulary class
13 years ago
Michael Peter Christen
514700291a
moved Vocabulary to cora package (added in git
...
964406ad17
)
13 years ago
Michael Peter Christen
0284a4d88f
more fixes for double precision of coordinates
13 years ago
Michael Peter Christen
964406ad17
added concurrency enhancement to xml parser
13 years ago
Michael Peter Christen
240045cf7c
fix for bad distance computation
13 years ago
Michael Peter Christen
e0d8643226
- performance hacks
...
- added log warnings in case that search processes run into time-out
situations
- better concurrency for Integer formatter (used a non-synchronized
formatter before)
- bugfix for search termination (a poison pill was missing)
- added timeout parameters for search (again) -> target is, that they
are never reached.
13 years ago
Michael Peter Christen
7a329465b3
using pre-compile pattern in blacklist; should enhance search speed
13 years ago
Michael Peter Christen
6e83b02b83
- bugfix for surrogate file reader
...
- bugfix for location search: suppress empty search
13 years ago
Michael Peter Christen
9b4c699526
ehanced location search:
...
- search request are now made using a map boundary
- search results are only computed for the map boundary
- the number of results is adopted to the results in the visible range
- added a double-buffering for the search result markers
- added a search query option for the search results:
/radius/<lat>/<lon>/<radius>
13 years ago
Michael Peter Christen
834dc6b263
store more data from interface access
13 years ago
Michael Peter Christen
1f48d1528b
performance hacks
13 years ago
Michael Peter Christen
c70aaccdc9
better location to generate a guid for rss messages
13 years ago
Michael Peter Christen
10da7335ea
performance hack: use a hash cache for all hashes that are computed by a
...
byte array. If this hash is used in a HashMap (which is very often the
case) then this hack eliminates a lot of re-computations of the same
hash.
13 years ago
Michael Peter Christen
f8a0cf6d7c
RSSMessages do not need a concurrent hash map -> removed overhead
13 years ago
Michael Peter Christen
07ca7e4dd1
enhanced RSS parsing by ensuring that it is parsed with a buffered input
...
stream
13 years ago
Michael Peter Christen
7c1feefb28
introduced a default 10 second time-out in rwi normalization time
...
uring search process to prevent endless deadlocks after a very long
running search
13 years ago
Michael Peter Christen
8d997d55b6
better logging
13 years ago
Michael Peter Christen
65d37e6a20
only ASCII needed in seed bitflags
13 years ago
Michael Peter Christen
0f82fb3628
using double instead float for a better release ordering
13 years ago
Michael Peter Christen
43c2c6e588
better logging
13 years ago
sixcooler
56087c1f23
bump to httpclient- httpcore-, httpmime- 4.2
13 years ago
Michael Peter Christen
71c3163f3d
- fixes to node identification
...
- added link to node in network list
- added marking of portal search node peers
13 years ago
Michael Peter Christen
4d3cc02168
replaced old bzip2 library against better documented commons-compress
...
package from http://commons.apache.org/compress/
13 years ago
Michael Peter Christen
ad222be7f8
added node state icon in network list
13 years ago
Michael Peter Christen
3c2bec681f
added a root node flag: identifies peers with short ping time
13 years ago
Michael Peter Christen
c846e9ca14
redesign of the crawler monitor page: show crawled pages instead of
...
queue of urls that shall be crawled
13 years ago
Michael Peter Christen
c15fcde1c8
add-on to latest commit
13 years ago
Michael Peter Christen
cf47d94888
performance hack to parse numbers inside of substrings without actually
...
generating a substring. This avoids the allocation of a String object
ech time a substring is parsed. Should affect CPU load during RWI
transmission.
13 years ago
Michael Peter Christen
7e0ddbd275
added a "fromCache" flag in Response object to omit one cache.has()
...
check during snippet generation. This should cause less blockings
13 years ago
Michael Peter Christen
81737dcb18
removed stack trace from swf parser since we cant do anything there
13 years ago
Michael Peter Christen
7bf421b9dd
- fixed image search page navigation
...
- removed some deadlocks and ConcurrentModificationExceptions during
DidYouMean collection
13 years ago
Michael Peter Christen
c6a09eab0b
synchronization needed
13 years ago
Michael Peter Christen
fb94b47b1a
changed queue sizes to have less memory occupied during indexing
13 years ago
Michael Peter Christen
76157dc2c3
bugfix for http://bugs.yacy.net/view.php?id=173
13 years ago
reger
6696cb1313
bugfix: lookup of peernames no result for active peer in page IndexControlRWIs_p.html -> Transfer RWI to other Peer
...
SeedDB.lookupByName searche for lowercase peerNames, while MapColumnIndex.getIndex uses peername as is in the keyset.
Changed the index init to insert lowercase peer names as key
13 years ago
Michael Peter Christen
c6558cba08
more classification bugs
13 years ago
Michael Peter Christen
082831b9d6
search contentdom was checked in wrong way - fixed
13 years ago
reger
ee553d971e
correct typo in scripts_txt comment
13 years ago
Michael Peter Christen
f294f2e295
bugfix to http://bugs.yacy.net/view.php?id=181
...
tried to make a bit less 'noise' to dns server
also included: less processes in snippet fetch to reduce load during
search on small computers
13 years ago
Michael Peter Christen
acf8d521a2
fix for http://bugs.yacy.net/view.php?id=126
13 years ago
Michael Peter Christen
bb88878b4d
the last commit was incomplete..
13 years ago
Michael Peter Christen
d320a31ae1
bugfix for http://bugs.yacy.net/view.php?id=186
13 years ago
Michael Peter Christen
fa735f4f04
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
3e1bc9477f
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
6f8a2fef1f
small speed enhancement using a column factory
13 years ago
Roland 'Quix0r' Haeder
d10627d591
More sync in close() methods
...
Conflicts:
source/net/yacy/kelondro/logging/GuiHandler.java
source/net/yacy/kelondro/workflow/InstantBusyThread.java
13 years ago
Roland 'Quix0r' Haeder
b3ae2aa41f
With or without 'final'? At least please try it in other methods
...
Conflicts:
source/de/anomic/tools/tarTools.java
13 years ago
Roland 'Quix0r' Haeder
fbb946f913
Made a method static (Eclipse suggested it), removed unused import, pk=null check does now output a warning in logfile
13 years ago
Michael Peter Christen
52d307c735
prevent that the snippet fectch process removes catchall entries
13 years ago
Michael Peter Christen
7eece0256f
moved yacy.logging to defaults according to request in
...
http://bugs.yacy.net/view.php?id=55
13 years ago
Michael Peter Christen
89142d1e8d
removed (not all) warnings
13 years ago
Michael Peter Christen
5deebd02ea
added serialization
13 years ago
reger
b2175ea4ef
Add possibility to set custom Solr field names for the YaCy default Solr attributes.
...
- Changing the format of YaCy's solr.key.list while maintainig backward compatibility
Federated index config screens adjusted accordingly
- modified the Solr update request to use a 3 min Solr autocommit intervall
13 years ago
Michael Peter Christen
15db703808
added missing serialization to remove all warnings
13 years ago
Michael Peter Christen
1795a7325b
made HandleSet serializable
13 years ago
Michael Peter Christen
e7e381d110
added configuration to switch off redirection following in crawler
13 years ago
Michael Peter Christen
2717c1b749
fixed bug in solr interface
13 years ago
Michael Peter Christen
f150bc218b
fixed bug in solr error document
13 years ago
Michael Peter Christen
cb54c1737b
solrj connector bugfix
13 years ago
Roland 'Quix0r' Haeder
a093ccf5eb
Now used synchronization in all close() methods to make sure all objects
...
are 'closed' in an ordered way
Conflicts:
source/de/anomic/http/server/ChunkedInputStream.java
source/de/anomic/http/server/ChunkedOutputStream.java
source/de/anomic/http/server/ContentLengthInputStream.java
source/net/yacy/cora/protocol/Domains.java
source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java
source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java
source/net/yacy/document/content/dao/PhpBB3Dao.java
source/net/yacy/document/parser/html/AbstractTransformer.java
source/net/yacy/kelondro/blob/BEncodedHeap.java
source/net/yacy/kelondro/blob/HeapReader.java
source/net/yacy/kelondro/index/RAMIndexCluster.java
source/net/yacy/kelondro/io/ByteCountInputStream.java
source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java
source/net/yacy/kelondro/table/SQLTable.java
13 years ago
Michael Peter Christen
49cab2b85f
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
0d58fea210
made multiple connector default
13 years ago
Michael Peter Christen
7740c02c56
- enhanced the solr connector
...
- added new multiple connector (to replace singleConnector)
13 years ago
Michael Peter Christen
0cf3d36eae
more tolerance in case of corrupted file
13 years ago
Michael Peter Christen
acc6db28ff
added missing classes for solr interface
13 years ago
Michael Peter Christen
adeb33bb36
better abstraction for solr objects
13 years ago
Michael Peter Christen
8864141872
more abstraction in solr connection classes
13 years ago
Michael Peter Christen
c00efc2717
made the solr connection more generic
13 years ago
Michael Peter Christen
ea2bd43b28
patch for broken configurations
13 years ago
Michael Peter Christen
e5ca7f22b1
enhancement in circle drawing
13 years ago
Michael Peter Christen
34f4225d7e
less 'wellformed' calls without asserts
13 years ago
Marc Nause
a691023d04
*) better formatting for network QPM
...
*) refactoring
13 years ago
Michael Peter Christen
77f8e9fb9b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
ba6aaabc51
refactoring + parser bugfixes
13 years ago
Michael Peter Christen
2a0434efa4
Merge commit 'c1f6b4fb5226d3d2f8b2bec9e361f6b3476e03ff'
13 years ago
Michael Peter Christen
942896fe46
removed methods not supported by new solrj connector for httpclient 4
...
Error was:
java.lang.UnsupportedOperationException: Client was created outside of
HttpSolrServer
at
org.apache.solr.client.solrj.impl.HttpSolrServer.setDefaultMaxConnectionsPerHost(HttpSolrServer.java:614)
at
net.yacy.cora.services.federated.solr.SolrSingleConnector.<init>(SolrSingleConnector.java:128)
at
net.yacy.cora.services.federated.solr.SolrShardingConnector.<init>(SolrShardingConnector.java:55)
at net.yacy.search.Switchboard.<init>(Switchboard.java:657)
at net.yacy.yacy.startup(yacy.java:222)
at net.yacy.yacy.main(yacy.java:1018)
13 years ago
Michael Peter Christen
22e1f68c0b
solrj user authentication patch
13 years ago
Michael Peter Christen
09484955dc
added new entry class for embed tags
13 years ago
Michael Peter Christen
62f2554a01
- fixed build problems (deprecated methods using httpclient 3.1)
...
- removed httpclient 3.1 lib which was used by solrj (solrj now uses
httpclient 4)
13 years ago
Michael Peter Christen
a6d60fc21f
concurrency enhancement in ConfigurationSet
13 years ago
Michael Peter Christen
453010bd68
- solved problems with backpath normalization
...
- redesigned in/outbound link handover
- removed iframe links from inbound/outbound in solr scheme
13 years ago
Michael Peter Christen
5f5ed33ed8
patch for media search (audio, video apps)
13 years ago
Michael Peter Christen
7860c1df80
fix needed for new solrj library
13 years ago
Michael Peter Christen
0e13022147
- enhanced solr field documentation
...
- added xml api button to IndexFederated_p - the solr schema.xml file
can be generated by YaCy
13 years ago
Michael Peter Christen
19efbf1b0f
- apply directDocByURL to NOLOAD Queue
...
- choose pushing to NOLOAD as default for site crawl
13 years ago
Michael Peter Christen
659178942f
- Redesigned crawler and parser to accept embedded links from the NOLOAD
...
queue and not from virtual documents generated by the parser.
- The parser now generates nice description texts for NOLOAD entries
which shall make it possible to find media content using the search
index and not using the media prefetch algorithm during search (which
was costly)
- Removed the media-search prefetch process from image search
13 years ago
Michael Peter Christen
a3badd3205
changed search process for images: no more media snippet load process,
...
show only links from index which had been on the text search page
before. This creates a superfast search process for images!
13 years ago
reger
c1f6b4fb52
lookupByIP: prevent comparing of port parameter if called with port -1 (=unknown)
13 years ago
Michael Peter Christen
f8cd57c92f
new indexing strategy: ALL links that appear anywhere are indexed, not
...
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
13 years ago
Michael Peter Christen
14f67f217c
refactoring of ContentDomain: now subclass of Classification
13 years ago
Michael Peter Christen
8a08c96a82
removed dependency from logging
13 years ago
Michael Peter Christen
a1a5b015d8
refactoring: moved document Classification to cora package
13 years ago
Michael Peter Christen
33d1062c79
refactoring: the cache belongs to the crawler
13 years ago
Michael Peter Christen
4d5da75814
fix for parser problem if a <a>-tag is 'within' html tags with unclosed
...
tags. That prevented the <a> tags from beeing recognized. This is a fix
for http://forum.yacy-websuche.de/viewtopic.php?p=25516#p25516
13 years ago
Michael Peter Christen
91a86f0b06
fixed to network graph testing
13 years ago
Michael Peter Christen
7b5b9baee0
added citation rank to ranking profile
13 years ago
Michael Peter Christen
046f3a7e8d
check if httpc has decompressed the release file and rename the file
...
from .tar.gz to .tar if that happened
13 years ago
Michael Christen
02e4dedff2
fix to url citation collection
13 years ago
Michael Christen
e32055aa15
added stub classes for
...
- a new database for url reference data ('seen links')
- a new database extending the references to the full url metadata
attributes set which shall replace the old metadata database if it is
finished
- migration help classes stub to use old and new metadata databases
simultanously
13 years ago
Michael Christen
ac5d124ee0
experimental implementation of a citation ranking as post-ranking
...
method. (ranking coefficient fixed, need to be made configurable)
13 years ago
Michael Christen
8fc86fe397
added storage of full anchor link structure:
...
the links between all pages are now stored. The same index structure as
used for the word index is used to make a reverse link index.
The new file(s) in SEGMENT/default/citation.index.*.blob store the
citation index. This will be used to create much more detailed link
structures for the YaCy apis and to create a better ranking. A ranking
using the citation.index should provide better results especially for
portal indexes and initranets.
13 years ago
Lotus
0b3f39136e
allow custom ppm lower than minimum button on /Crawler_p.html
...
fixes http://bugs.yacy.net/view.php?id=166
13 years ago
Michael Peter Christen
532c7cf827
added physics experiment to the graph plotter. not active by default
13 years ago
Michael Peter Christen
aba9b1bfa0
better names for elements of a linked graph
13 years ago
Michael Peter Christen
2fc8ecee36
ConcurrentLinkedQueue has a VERY long return time on the .size() method.
...
See
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html
and the following test programm:
public class QueueLengthTimeTest {
public static long countTest(Queue<Integer> q, int c) {
long t = System.currentTimeMillis();
for (int i = 0; i < c; i++) {
q.add(q.size());
}
return System.currentTimeMillis() - t;
}
public static void main(String[] args) {
int c = 1;
for (int i = 0; i < 100; i++) {
Runtime.getRuntime().gc();
long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c);
Runtime.getRuntime().gc();
long t2 = countTest(new LinkedBlockingQueue<Integer>(), c);
Runtime.getRuntime().gc();
long t3 = countTest(new ConcurrentLinkedQueue<Integer>(),
c);
System.out.println("count = " + c + ": ArrayBlockingQueue =
" + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = "
+ t3);
c = c * 2;
}
}
}
13 years ago
Michael Peter Christen
8aba045ba1
if a new pop-up page is set in config portal, then this page applies
...
also to the default page configuration for the httpd if no path is
given.
13 years ago
Michael Peter Christen
8c06925984
animation of the web structure picture
13 years ago
Michael Peter Christen
898fa7c3f3
use tld heuristic to check if a domain is local or global
13 years ago
Michael Peter Christen
213c8d97f2
use less proccesses in process pool
13 years ago
Michael Peter Christen
c639248c23
protection against strange answers from remote peers during search
13 years ago
Michael Peter Christen
36e4d82b27
changed ranking
13 years ago
Michael Peter Christen
096c17e7cd
added test code
13 years ago
Michael Peter Christen
665626a51b
catch OOM errors during scanning
13 years ago
Michael Peter Christen
1cd711d005
added classes for citation references (for new citation ranking)
13 years ago
Michael Peter Christen
33a405dab8
ipv6 bugfix
13 years ago
Michael Peter Christen
c6c61be3f0
fix for http://bugs.yacy.net/view.php?id=148
13 years ago
Michael Peter Christen
e0f1e7d904
added new citation reference data structure that shall be used for a
...
citation ranking
13 years ago
Michael Peter Christen
e18a4f6b74
more tolerant merge iterator
13 years ago
Michael Peter Christen
e101c2e0e2
added changes from copperdust (submitted by email):
...
1. Improved and fixed language detection:
1.1 Identificator.java - recognition fix (improved)
1.2 DCEntry.java - fix (changed detection order due to detection from
tld in many cases is incorrect)
1.3 MultiProtocolURI.java - fixed and enhanced language from tld
detection (all currently used top-level domains; ccTLD added but not
tested).
2. Ukrainian language update.
3. Main Slavic languages langstats (tested and works fine).
13 years ago
Michael Peter Christen
8d63a5887c
bugfixes
13 years ago
Michael Peter Christen
9ad1d8dde2
complete redesign of crawl queue monitoring: do not look at a
...
ready-prepared crawl list but at the stacks of the domains that are
stored for balanced crawling. This affects also the balancer since that
does not need to prepare the pre-selected crawl list for monitoring. As
a effect:
- it is no more possible to see the correct order of next to-be-crawled
links, since that depends on the actual state of the balancer stack the
next time another url is requested for loading
- the balancer works better since the next url can be selected according
to the current situation and not according to a pre-selected order.
13 years ago
Michael Peter Christen
7e4e3fe5b6
free some memory after parsing html
13 years ago
Michael Peter Christen
4540174fe0
memory hacks
13 years ago
Michael Peter Christen
b4409cc803
small redesign of blob column index and usage
13 years ago
Michael Peter Christen
d5c1f2746e
performance hack
13 years ago
Michael Peter Christen
803963aebd
performance hack: better space grow in CharBuffer (speeds up html
...
parser)
13 years ago
Michael Peter Christen
8b0920b0b5
tried to fix the ipv6 problem as reported in bug
...
but this did not solve all problems because a bug in the apache http
client prevented that it worked. Thread dump:
Caused by: java.lang.NumberFormatException: For input string:
"1450:400c:c01:0:0:0:69"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.http.client.utils.URIUtils.extractHost(URIUtils.java:310)
at
org.apache.http.impl.client.AbstractHttpClient.determineTarget(AbstractHttpClient.java:764)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at net.yacy.cora.protocol.http.HTTPClient.execute(HTTPClient.java:597)
at
net.yacy.cora.protocol.http.HTTPClient.getContentBytes(HTTPClient.java:558)
at net.yacy.cora.protocol.http.HTTPClient.GETbytes(HTTPClient.java:341)
at de.anomic.crawler.retrieval.HTTPLoader.load(HTTPLoader.java:131)
at de.anomic.crawler.retrieval.HTTPLoader.load(HTTPLoader.java:74)
at
net.yacy.repository.LoaderDispatcher.loadInternal(LoaderDispatcher.java:274)
at net.yacy.repository.LoaderDispatcher.load(LoaderDispatcher.java:164)
at net.yacy.repository.LoaderDispatcher.load(LoaderDispatcher.java:150)
at
net.yacy.repository.LoaderDispatcher.loadDocument(LoaderDispatcher.java:355)
at getpageinfo_p.respond(getpageinfo_p.java:97)
13 years ago
Michael Peter Christen
e2f8f263e8
changed storage of search words: keep order
13 years ago
Michael Peter Christen
ed39ef2890
changed generation of protocol information
13 years ago
Michael Peter Christen
0b67a0a5d8
added a column index for tables in blob files. This is heavily used
...
during receiving of DHT submissions and when answering remote search
requests. Both events together may have caused IO-deadlocking and this
commit shall fix that.
13 years ago
Michael Peter Christen
2e5cd6a1b2
fixed parser extension deny list generation and usage
13 years ago
Michael Peter Christen
8bee1472c9
there is no noindex, only nofollow in links
13 years ago
Michael Peter Christen
3cd6dcd352
do not add new solr fields as activated fields
13 years ago
Michael Peter Christen
e3bb73c3d6
serialized some database access methods
13 years ago
Michael Peter Christen
7e728867e5
added a synchronization around iterations to prevent IO-deadlocking
...
during concurrent remote search requests
13 years ago
Michael Peter Christen
355ecf330f
reduced target file site to 64mb
13 years ago
Michael Peter Christen
10ae6d94a1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
2ea585d616
fix for host navigator
13 years ago
Michael Peter Christen
2f6dde92e2
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
c560a582ac
fix for single-word vocabulary lines
13 years ago
Michael Peter Christen
4c5edab1ec
added option to have exception search result windows
13 years ago
Michael Peter Christen
046d7de95b
Merge remote branch 'reger/master'
13 years ago
reger
a95f645a61
Bugfix class repository.Loaddispatcher fixed download file limit of 10000
...
line 355: final Response response = this.load(request, cachePolicy, 10000, true);
13 years ago
Michael Peter Christen
ef78f22ee1
performance hack
13 years ago
Michael Peter Christen
41536eb4a2
performance hack
13 years ago
Michael Peter Christen
f91487fc50
added delete-button for host navigation
13 years ago
Michael Peter Christen
e8d24fd802
author navigator can be switched off
13 years ago
Michael Peter Christen
558ab7bd4e
made the protocol navigator reversible
13 years ago
Michael Peter Christen
96cb75f1d4
made the filetype navigator be able to deselect the search constraint
13 years ago
Michael Peter Christen
1f4f60654a
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/document/parser/pdfParser.java
13 years ago
reger
32104360ce
PDFParser - return at least first 3 pages of PDF
...
fix for pdf parsing without returning parsed text due to interruption by
time out.
13 years ago
Michael Peter Christen
ef5192f8c9
using the generic document parser for crawl starts instead of the html
...
parser. This makes it possible that every type of document can be a
crawl start point, not only text documents or html documents. Testet
this with a pdf document.
13 years ago
Michael Peter Christen
a02fdf8625
better error messages
13 years ago
Michael Peter Christen
eadb58dd87
small enhancements in pdf parser
13 years ago
Michael Peter Christen
c6ba44468e
timeout = 5000 instead 3000
13 years ago
reger
b616de5973
PDFParser - return at least first 3 pages of PDF
...
fix for pdf parsing without returning parsed text due to interruption by time out.
13 years ago
Lotus
c73af39e54
refactoring of tray icon class,
...
now uses Java 6 methods natively
13 years ago
Michael Peter Christen
4eff0e26f1
npe bugfix
13 years ago
low012
8776b84c10
*) small fix to make password change function of reconfigureYACY.sh work
...
again
13 years ago
Michael Peter Christen
1a0b6b3913
get more navigation details to search results
13 years ago
Michael Peter Christen
7f9b6b7a0c
added switches to ConfigParser to accept/deny documents by their
...
extension
13 years ago
Michael Peter Christen
4901cee3cc
suppress auto-tagged subject entries when sending out or receiving
...
metadata from other peers
13 years ago
Michael Peter Christen
83009d86f7
added the vocabulary navigator. It can be very simply tested by
...
switching on the locale dictionaries.
13 years ago
sixcooler
985b78cf89
correct 'avaiable()' to use max of young / eden
13 years ago
sixcooler
4da8746275
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
sixcooler
c9aaa9e00a
respect non-reserved Memory in GenerationMemoryStrategy
...
and enable it again
13 years ago
Michael Peter Christen
37f2d1b3e9
replaced Thread initialization with ExecutorService pool for delete
...
method. This is much faster and produces less blocking when using the
Compressor class which is used by the HTCache. I.e. picture search is
much faster now.
13 years ago
Michael Peter Christen
a58dc4a91f
added autotagging to document condenser:
...
- tags that are automatically generated now enrich the dc:subject
- auto-generated tags have a '$' at the beginning of the tag
- auto-generated tags lead the tag name with a vocabulary name
each tag has the form
$<vocabulary-name>:<tag-printname-space-replaced-by-'_'>
13 years ago
Michael Peter Christen
0d6176804b
emergency disabling of GenerationMemoryStrategy because of non-working
...
available-method
13 years ago
Lotus
411aab02e3
Windows installer now detects reliably whether YaCy runs. A file lock on
...
the yacy.running file has been implemented.
13 years ago
Michael Peter Christen
87f0210480
enriched log output to find NPE in HeapReader
13 years ago
Michael Peter Christen
987b412491
updated solr scheme: generic declaration of solr schemes
13 years ago
Michael Peter Christen
254adea51c
small fixes
13 years ago
Michael Peter Christen
49be60a7c8
WorkflowProcess is forced to make small pauses if shortMemoryStatus is
...
reached.
13 years ago
Michael Peter Christen
b7bb84c0bb
set a limit to CharBuffer object size to fight against bad/too large
...
content
13 years ago
Michael Peter Christen
c602eaaf46
enhanced search process
13 years ago
Michael Peter Christen
087f97d4c0
less noise if a browser cannot be opened
13 years ago
Michael Christen
eff966f396
fix for search process (it was aborted too early during remote search)
13 years ago
Michael Christen
e6d51363ee
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Marek Otahal
a231d0eeb9
Run from Java the whole app YACY
...
start for java webStart
allow for better integration with IDE
Conflicts:
source/net/yacy/gui/framework/Browser.java
13 years ago
Marek Otahal
72adbeae90
!Important: move from Hashtable to HashMap
...
Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better
functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue,
but I found notices that some (ugly big) helper classes had to be created in past
to compensate missing Hashtable's functionality. I'd like input if we can remove some of them.
look for //FIX: if these commits
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Marek Otahal
f40efb39af
Blacklist loadList() remove duplicates by using Set
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Marek Otahal
f75b5e40e0
little fix in copy()
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Marek Otahal
1dc5d9f0f3
make ConnectionInfo comparable and sort list of connections in Connections_p
...
ConnectionInfo compare by initTime
Connections_p implement wish to sort connections, descending
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Michael Christen
fa8da7f89d
vocabularies are now also used as source for a did-you-mean computation
13 years ago
Michael Christen
eaec14ecc4
Dictionaries from words caches can now be used as autotagging vocabulary
13 years ago
Michael Peter Christen
91940fdf56
redesign of WordCache to be prepared to hold multiple
...
independent dictionaries. Such dictionaries can then be also used as
simplified vocabularies.
13 years ago
Michael Christen
bd40a10230
added autotaggig stub .. only reading and parsing of vocabularies at
...
this time
13 years ago
Michael Peter Christen
2ee8cbeb2c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/search/Switchboard.java
13 years ago
Michael Peter Christen
992dbdf4bb
added noload statistic to servlets
13 years ago
Michael Christen
eebc02f5c1
fix
13 years ago
Michael Christen
216a287a85
Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
...
Conflicts:
source/de/anomic/crawler/CrawlQueues.java
13 years ago
stbrumm
d18095dc48
Patch fuer Issue 0000102
...
and fixes to Patch (private peer status is a property of a peer, not a
status)
13 years ago
stbrumm
9f1b1b4604
Type for Robinson-Mode/Private Perr added
13 years ago
Michael Christen
20962a4ed7
added metadata node stub for metadata from blobs
13 years ago
Michael Christen
575dbbaa93
enhancements in Blob retrieval: try to use less CPU resources by testing
...
a blog first that most certainly has wanted entries.
13 years ago
Michael Christen
585a8f3c44
fixed a bug in search sequence (caused emtpy results)
13 years ago
Michael Christen
361146dd7a
better error handling for file loader
13 years ago
Roland 'Quix0r' Haeder
6d4e08ed06
Rewrote filesize() to (hopefully) avoid a NPE, rewrote Blacklist class to concurrent classes to avoid a CME
13 years ago
Roland 'Quix0r' Haeder
fa08ed5ae5
Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check
13 years ago
Roland Haeder
319fd1f4aa
A concurrent access can happen on the blacklist (with latest introduced blacklist check in media snippet computation)
13 years ago
Roland 'Quix0r' Haeder
a3083d13bf
Blacklist checks are now always turned on, in media searches (e.g. image search) images matching blacklist entries are no longer shown to the user
13 years ago
Michael Christen
52184a1170
fix for search process
13 years ago
Michael Christen
85bd4cc8bc
better lookup for peer names
13 years ago
Michael Christen
20e3084bd4
redesign of fining of peers by ip: more leightweight method to read the
...
seed databases
13 years ago
Michael Christen
0797b0de99
new handling of remote search processes: looking for seeds will now not
...
block the whole search process any more. A deadlock with a DHT selection
process may have been the cause for interface lockings in the past.
13 years ago
Michael Christen
ee9aae5cc0
more about CreativeCommons license vocabulary
13 years ago
Michael Christen
ecd74fe34f
less dramatic upnp failures
13 years ago
Michael Christen
c75e1a3125
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Christen
13f5b5f80d
the component part in the YaCy Metadata is filled using the Dubling Core
...
vocabulary
13 years ago
Michael Peter Christen
8d2cbfb685
more vocabularies and more semantics for lod data structures
13 years ago
Michael Christen
9cd36b4c44
added vocabulary for geolocalization as used in georss
13 years ago
Michael Christen
9e5894c784
Removed handling of components objects for URIMetadataRows.
...
This is a preparation to replace this rows with nodes from the node
store.
13 years ago
Michael Christen
66ab51f89d
added rdf vocabulary
13 years ago
Michael Christen
c04bfaa51b
refactoring
13 years ago
Michael Peter Christen
136b514f52
added a Triple Store based on Nodes that fit to the new storage classes.
...
Added also a first Vocabulary for the node store - Dublin Core.
13 years ago
Michael Peter Christen
613ab6a69d
added BEncodedHeapBag and BEncodedHeapShard which are storage container
...
for a new metadata store. An abstraction of the content for this storage
is defined with MapStore. A MapStore is an abstraction of a RDF Node
store.
13 years ago
Michael Christen
6fecd0db88
one more performance hack to prevent costly md5 computation
13 years ago
Michael Christen
e13441b069
better digest pool size (smaller by default but unlimited)
13 years ago
Michael Christen
1f4afb4dc0
performance hacks
13 years ago
Michael Christen
675d557e88
removed debug logging
13 years ago
Michael Christen
e9dc99fe15
added rules to set specific RWIs as private RWIs which are not
...
transmitted to remote peers. This will be used for private index copies
and phonetic indexes.
13 years ago
Michael Peter Christen
4243ace863
added phonetic classes
13 years ago
Michael Peter Christen
0bcef2d156
added feature as requested in
...
http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461
The search can now be configured with a non-display host list.
the search will always exlude the given list of host unless they are
requested directly using the host navigation
13 years ago
Michael Christen
204c29f010
small bugfixes for search result display and cache display
13 years ago
Michael Christen
17f962fceb
translator updates:
...
- config string for chinese
- do not copy the language file to DATA/LOCALE any more (and do not use
them there, this is really confusing for new translators)
13 years ago
Michael Christen
078fcde0dd
bad initialization
13 years ago
Michael Christen
14e45e90fd
patch for a bug that I don't understand by now.
13 years ago
Michael Christen
3eccdca63c
protection against too long running snippet fetch processes
13 years ago
Michael Christen
86b3385847
fixed a deadlock during secondary remote search
13 years ago
Michael Christen
c715d19c09
fixes for dependency on svn
13 years ago
Michael Christen
404758698a
less io operations
13 years ago
Michael Christen
0bc5d76bee
ups
13 years ago
Michael Christen
044f83feed
added some pauses into the search process which shall produce
...
better-ranked search results. without that pauses the result page will
only contain links from the peer that answers first which is not a good
average picture of all the peers that provided results
13 years ago
Michael Christen
943b670738
less terrible warning if uPnP fails
13 years ago
sixcooler
448656087a
probably fix for http://bugs.yacy.net/view.php?id=94
...
(don't know how to force this exception)
13 years ago
Michael Christen
f14faf503b
better ranking because we wait a very little time during the search
...
process more to get better remote sear results into the ranking priority
stack
13 years ago
Michael Christen
762e0ecfb6
fixed localization dictionaries, see
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3418&view=next
13 years ago
Michael Christen
d35bdc2df6
removed npe
13 years ago
Michael Christen
e7e429705a
- less automatic indexing after a search (needs to reset the default
...
crawl profiles)
- fix for concurrency problem in storage of serverSwitch Properties
- markup update
13 years ago
Michael Christen
9cd469e6d6
added pull request from als plus an NPE fix
13 years ago
admin
484c4ad339
Merge branch 'master' of git://github.com/f1ori/yacy
13 years ago
orbiter
402e9d71ef
changed ording on release files: main criteria is not the svn any more; releases are now ordered by
...
- release number
- date
- svn number
additionally there is a new option to remove the svn number completely
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8135 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
admin
56ce8488e4
Merge branch 'master' of git://github.com/f1ori/yacy
13 years ago
orbiter
4b8ff84705
- search bugfixes (page counter and number of results per page; recognition of new search)
...
- experiments to speed-up the network image production (commented out)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8130 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
aeeae75b8a
the timeout of httpclient is not absolut, but till a connection is
...
established or between bytes send
trying this to reduce count of client-connections to /yacy/search.html
of other peers
13 years ago
hermens
2ac272cfbf
Fix for PeerSelection.seedsByAge() for big networks (>1000 Peers)
...
To get the most(least) recent peers search those with highest(lowest) LastSeen instead of the first by peerhash
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8129 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
83335c3b09
fix for http://bugs.yacy.net/view.php?id=78
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8127 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
Al Sutton
39898cb94a
Added try/finally protection to ensure streams are closed. Added initial size guess for the CharBuffer
13 years ago
Al Sutton
4c67a964a1
Added try/finally protection to ensure streams are closed. Added initial size guess for the CharBuffer
13 years ago
Al Sutton
3f9b9f953f
Added close() to ensure buffer close actions are invoked
13 years ago
Al Sutton
d73c84f9a0
Allow initial buffer size definition in TransformWriter, and use available() method to set it in htmlParser. In this situation a ByteArrayInputStream is used so the available() method gives a good size estimation and avoid the buffer needing to be continually grown
13 years ago
Al Sutton
f02ea27b31
Added missing closure of ByteArrayInputSteam
13 years ago
orbiter
0796b54601
- some speed hacks for network image
...
- panic patch for 'AD' hashes until it is clear where the problem comes from
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8126 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
f9216e388c
- faster ping to clean up old peers faster
...
- clean up more news
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8125 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
35a9e8f307
- fixed network graphic
...
- debuged evaluation tables
- changed cache settings in template engine
- some speed hacks
- changed int angles for peer positions in network graphic to double angles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8124 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
Al Sutton
8993cac4d8
Initial performance improvements
13 years ago
orbiter
d9c066227a
fix for npe
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8122 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
8895d8c1cd
removed unnecessary log entries
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8117 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
550c881d80
remove more news (all older than one day) because they can be a performance problem if we have too many peers sending news
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8112 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
ebd840ebf6
- enhanced description on search front page
...
- fixed language and heuristic modifier
- added hint to crawl start that we can do also ftp and smb crawls
- added a protocol extension to remote crawls to transport all search modifiers to remote peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8108 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e22f8497c9
- tested the ARC methods
...
- removed strict authentication (if password is empty; this was buggy and not useful; can be switched on if necessary globally and not for each interface method)
- increased speed of CrawlResults page (no dns lookup any more)
- increased speed of favicon display (removed dns lookup)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8104 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
bc5df0eef5
updated ranking tables (fresh computation)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8103 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
5a55397f99
some last-minute performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c9216d5adf
fixed secondary remote search (the process that finds distributed join situations)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8098 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
64fd20b857
new default ranking profile
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8097 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0cf9ebc3b0
speed enhancements when parsing RWI rows (makes search slightly faster)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8096 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c9a0dbd25a
added a security check
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8094 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
ee8b1d4de1
fixed unresolved pattern and unwanted local/global switch when using votes on search results
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8093 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c584db991f
creating a bookmark from the search results now works again .. with new YMarks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8092 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1120f0c93c
update to network graphics: slightly less crawling activity, slightly stronger color for query activity
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8089 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
6cd27473f5
- better default values for caching and cache usage
...
- set new caching and verification behavior according to use case automatically
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8087 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
709013385a
fix for language fix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8086 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1019c36dad
bug fixes and speed enhancements for search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8085 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
507c9d478d
much better timing when search globally; less blocking; more results earlier!
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8084 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
8e0b2c5832
fixed cluster search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8083 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c0c6e9e7a5
fix for bad language encoding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8082 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
564374d1fe
- included YMarks in addition to old bookmarks in yacysearchitem.html; don't get confused by the old bookmark dialog, the ymark is automatically added silently beforehand.
...
- reworked bookmark creation on crawlstart
- many smaller adjustments to ymarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8072 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
05f34a3fa7
added a full, complete, database insert, update and delete API for the tables.
...
Please see this example:
list all database tables:
http://localhost:8090/api/table_p.xml
now create a new table and insert some values into 'mytable'
http://localhost:8090/api/table_p.xml?table=mytable&pk=&commitrow=&col_termin=Release%20Machen&col_datum=24.11.2011&col_status=ongoing
list the table content:
http://localhost:8090/api/table_p.xml?table=mytable&pk=
update the table and change a single value inside. You must refer to the row using a primary key 'pk'
http://localhost:8090/api/table_p.xml?table=mytable&pk=000000000001&commitrow=&col_datum=29.11.2011
you can also select rows using a search operator
http://localhost:8090/api/table_p.xml?table=mytable&pk=&count=10&search=
now lets delete the row:
http://localhost:8090/api/table_p.xml?table=mytable&pk=&deleterows=pk_000000000001
and we can also delete the complete table:
http://localhost:8090/api/table_p.xml?table=mytable&deletetable=
You can use this to administrate the robots, bookmarks and API steering using an outside application!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8071 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
lotus
3cc93325f0
temporary remove compare search from tray
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8070 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
804e48888b
smaller bug fixes for search behavior; should produce less unnecessary removals and an exact number of results as shown in counter
...
should also be a little bit faster
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8057 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
84c3fc9d97
local/global fixes in search, better abstraction
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8054 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
06352b8d6b
more logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8047 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
017a01714d
- enhanced logging in robots.txt parser for remote debugging
...
- robots.txt is now more robust against database operations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8043 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
3a15e58e28
- increased stability when opening the robots table
...
- increased stability when deleting tables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8034 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
775b44017e
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8033 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e914a30099
fix for npe
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8032 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
78ce3b13be
typo
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8027 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
85d6bf4ac4
fixed urls to media content during indexing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8021 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0d858d48ec
replaced String with StringBuilder in suggestion process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8020 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
3a807e10cf
- added a cache for active crawl profiles to the crawl switchboard
...
- moved the domain cache for domain counter from the crawl switchboard to the crawl profiles. the crawl domain counter is now therefore relative for each crawl start, not for the whole crawler.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8018 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
37e35f2741
normalization of url using urlencoding/decoding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8017 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e58438c01c
- added a new retry connector for solr (for cases where solr responses are slow)
...
- added a new exist property into the metadataRepository which includes solr entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8016 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
d8d9735b4f
stability bugfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8012 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c31564ef08
stability bugfixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8011 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
f121f4bb45
fix for link in Supporter and Suftipps page
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8010 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
94eab08794
- updated opensearchdescription text and icon
...
- removed automatic setting of maxitems during search (can be set now elsewhere)
- updated RSSMessage.java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8009 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
279482a76d
fix for npe
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8007 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1b86d06d1e
fix for http://bugs.yacy.net/view.php?id=62
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8004 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
eb9c9edb01
enhanced table method (used by almost all yacy api interfaces)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8000 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
4ad9fc2bff
new snippet strategy for search hits in metadata: show beginning of text instead of hit position
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7999 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a9838f8b99
fix for http://bugs.yacy.net/view.php?id=59
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7997 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
hermens
d3df03838a
make sure myself-target is always inserted at its appropriate position
...
this was previously omitted if the own peer should have been the first target
or the peer was the last peer before the rotation to AAAAAAAAAAAA
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7996 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
hermens
c3e7efa846
added sender side prevention of rwi flooding as mentioned in SVN 7993
...
saves memory and speeds up enqueueContainers by limiting the size of transfer.Chunk
saves network bandwidth by not transmitting RWIs that would get discarded at the target anyway
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7995 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
5af9598bd1
enhanced exported row parsing during row import
...
this affects the search and dht receive speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7994 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
7598a9e26b
fix for thread dump
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7992 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
8eef8722d1
update to ThreadDump analysis: freerunner and thread state recognition
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7990 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1df43b137d
another performance hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7989 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
7df0643f0e
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7988 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a7df70221e
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7987 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1b45e33f04
added robots tag parser to solr scheme
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7986 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
cf4fd525ee
added directDocByURL attribute in crawl profile
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7985 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
813f297a95
another performance hack: re-use of known host addresses for isLocal property; avoids look-up in local hash
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7983 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
035ebfbf3b
- performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill)
...
- this may have also (good) performance side effects on other parts of YaCy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b250e6466d
implemented crawl restrictions for IP pattern and country lists
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7980 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
57d5529a01
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7977 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
47a8c69745
added a new feature to MultiProtocolURIs to get the locale for each url:
...
This is done using a new library InetAddressLocator.jar which is NOT added by default to YaCy because it is very old and with that library we will never get a debian package. However, some people want that functionality and it can be made available if the library is taken from http://javainetlocator.sourceforge.net/ and placed into the /lib directory where it will be found using reflection.
The new feature will be used to extend the crawler steering.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7975 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2c3161b4ac
refactoring:
...
RankingProcess -> RWIProcess
ResultFetcher -> SnippetProcess
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7974 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d2ea250d99
refactoring:
...
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
277b454a62
*) added comments
...
*) minor refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7971 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6b22865dbc
- removed some warinings
...
- removed a dead update location
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7970 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0c6d95e57b
- more tolerance against failure of table opening
...
- more connections for solrj
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7968 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6b02b696b0
- add number of search results to end of rss and json output to reflect latest status of retrieval
...
- distinguish search access with different verify state in access of search cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7965 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
ce2a76d603
performance hack for search process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7961 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
aaf7a0feaa
yet another cache strategy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7959 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
8a428d3e77
ensure termination of pdf parser to avoid deadlocking of other processes during search result preparation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7958 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2c4a672fe2
bugfixes and performance hacks for tabe index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7957 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
dad5b586a4
added a concurrent warmin-up of Table data structures. that should speed-up the start-up process but may also cause stronger CPU load at that time.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7956 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
734059d33e
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7955 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
23e81b28b2
synchronization enhancements
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7954 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
dd4635e323
patches
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7953 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
85a5487d6d
YaCy can now use the solr index to compute text snippets. This makes search result preparation MUCH faster because no document fetching and parsing is necessary any more.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7943 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0819e1d397
protection against OOM cases in image parser. See also bugs.yacy.net/view.php?id=54
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7942 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2cba860693
- fix for wrong entries in NOLOAD indexing queue (that caused that urls had been only indexed based on their url and not loaded)
...
- patch for better urls to solr admin interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7938 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2842ce30d6
added synchronization in ReferenceContainer and logging for shrinking
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7937 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
cec3836e73
added reference limitation to IndexControlRWIs_p.html servlet
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7936 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
ecb4986b38
refactored stuff from last commit to ReferenceContainer
...
see: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3353&p=23163#p23163
the limiting of references is disabled per default
to enable this set yacy.conf - index.maxReferences to a value of e.g. 100000
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7935 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
f7c4abfdd7
limit references per blob & term to the 100.000 youngest
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7934 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
28f5b79deb
added a fast mass-deletion method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7933 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a70dbce41c
added another file tool class to yacy-cora
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7932 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
49e5ca579f
added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e02bfbde56
fix for solr url
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7930 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
580beb12a5
reverting SVN 7863; the synchronization was needed and no synchronization causes repeated DNS lookup for the same hosts
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7928 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
44d6416e2d
ensure termination of shrink()
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7927 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
52230a6864
replaced catching of Exception with Throwable, which catches also Errors
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7926 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
877eaf6bcb
switched off logging of org.apache.http which was suddenly switched on by default (??)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7925 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e1a3d609aa
moved merger object from Segment to IndexCell to enable a correct shutdown sequence. This solves a bug where yacy cannot be shut down during an index merge that appears during the shutdown phase.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7924 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
610b01e1c3
- added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
...
- some refactoring for mime type discovery
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7919 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3da21c4266
protection against starting of a (second) yacy peer while another one is already running on the same port
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7917 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b5252ef91f
added new word recommendation library in DictionaryLoader_p.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7913 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1c007188ad
bugfixes in html parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7912 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
231074bf0a
fixed a parsing bug by reverting SVN 7766
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7910 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
30a8a2f76b
*) replacing one ugly hack with an extended ugly hack ;-)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7908 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
95379ce0b1
*) should fix some problems with RSS Importer (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3253 )
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7907 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
24e76a7b69
*) Replaced occurrences of "Wikimedia" with "MediaWiki" where applicable. (Thanks to the folks of 0x20.be for pointing this out.)
...
*) Added description of where to place MediaWiki dump for import.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7905 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
d40a177c05
Generation Memory Strategy fine tuning
...
add some log-output in termlist_p
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7904 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
839f407fe4
Generation Memory Strategy fine tuning:
...
- some more optimism on requests of unknown values
- avoid a premature value of 0 byte available
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7903 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a5541751a8
- added memory computation to termlist_p.xml
...
- added option to delete terms in termlist_p.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7901 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
45e497a9bd
fix for term iteration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7900 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
5dd2efc9a2
- bugfixes in html parser
...
- new fields in solr
- extended file viewer to debug parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7897 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2c595a6a47
added new methods to count the number of objects in RWIs. lots of refactoring was necessary to introduce new Rating class and to unify naming of methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7896 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
75df87832c
refactoring/better naming of methods and classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7895 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
5f8a5ca32d
- not doing merge-jobs while short on Memory
...
- using configuration-values of crawling-max-filesize also for snippetfetching and loading files into Index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7893 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
965fabfb87
enhanced sorting speed (affects all DB operations)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7892 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
41a8ee4569
added iterable implementation in KeyList
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7891 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
22d69a6368
refactoring in cora: added sorting package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7890 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
51cf697acd
refactoring: moved all score-related classes to new ranking package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7889 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a0d5e7b6e6
added new score comparator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7888 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
4fec99115b
Implementation of strategies for controlling memory resources.
...
You can toggle between previous (standard) and new (generation) strategy at PerformanceMemory_p.html.
The generation memory strategy is implemented with the objective of running more robust
but with the cost of early stopping some tasks (eg. dht) while running low on memory.
This new strategy does respect the generational way a heap is organized on most used jvms.
These changes run fine on my 3 peers for weeks now, but as I'm human, I may fail.
Please be carefull using generation memory strategy and report errors by naming
OS, jvm and java_args.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7886 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
63a375b801
do not look at external dtd, cause this make this reader stay forewer(?) on on faulty dtd-locations
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7885 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2c58af6874
- added a short memory status simulation mode
...
- added a button in PerformanceMemory_p.html to set the simulated short memory status
- bugfix: added a missing lowercase in KeyList
- better concurrency in loader dispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7883 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
c64faf41e2
addon to svn 7880
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7882 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
7b7a196243
ignore cookies in httpclient per default
...
disable cookiestore,cause the default one caused segfaults on my peers
this does not harm use of cookies via YaCy as proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7881 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
411ed159f8
do some extra sleep while running low on memory
...
(1 sec. per outofmemoryCycle)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7879 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
9ab0ba41e2
using GzipDecompressingEntity from httpclient instead of our own
...
(was just fixed there in httpclient-4.1.2 and does a proper job)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7877 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
07f5954570
try better handling of corrupt blobs
...
@developer: please revert if I'm wrong
see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=3334
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7872 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f970670a7c
- bugfix in ServerScannerList
...
- speed up of generation of scanner list avoiding forced dns lookup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7871 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
8e03b8ee8b
better integration of server list in interactive search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7870 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0a3ab7da1b
do not sort concrrently the same array
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7868 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
eb14111200
encapsulate potential expensive objects in TextSnippet to allow GC them asap
...
this reduces chance of OOMs at massive search & snippet-fetching
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7865 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0d33cf352b
removed synchronization in DNS resolve (solves a problem when loading snippets but in the past concurrent dns requests also caused deadlocks. but this is many years ago and we will give it another try)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7863 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
44d74f8f89
performance hacks for seed generation (because thread dumps showed multiple occurrences at these code points)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7861 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
5cd07d7f84
early freeing resources on deleting index reference if search-verification fails (aka Switchboard.cleanupJob)
...
doing same thingy on other methods of touched files as well
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7860 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
a311596881
finishing up my commits (7855-7858) which could be helpful for
...
not declaring inside loops (helps GC of some VMs)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7859 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
9170a434ed
throwing an exception again in FileUtils.copy(reader, writer)
...
OOMs could occour here and should not be ignored
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7858 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
ce248cc8dd
less byte-arrays of response-content, less byte-array <-> stream conversation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7856 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
59b767eebd
stop loading via http at defined maximum of bytes - even size is unknown before loading
...
using max-file-size of type int for parsing documents
(since content is used as byte-arrays, 'integer' should be maximum)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7855 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
916d79111e
Runtime.maxMemory() DOES change @ runtime:
...
I wondered getting Total-ram > Max-ram and MemoryControl.available() < 0
MemoryControl.available() < 0 causes some errors where its value is used for dimension of buffers for eg.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7852 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
299af4943c
added another memory protection hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7849 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1f300217f8
more protection for the cleanup thread
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7848 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d13103a0a7
changed the way how the index cache is flushed: do not flush when a put was made because that could cause that many put calls synchronize for a long time when the dump or a merge is performed. Instead a watchdog thread is doing the dump and therefore puts cannot block any more which is good when a put happens during a search result preparation.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7847 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b06faab9d3
do not allocate a StringBuilder object in case that there is not enough memory for that
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7846 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6a6f27eaf3
do not sort arrays again if arrays are already sorted
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7845 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3d043ce9d6
- refactoring
...
- do not start worker threads in Array class if concurrency is not used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7844 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
48b78e9ff4
disabling concurrency in new sort since that is not working yet correctly
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7843 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
62ac73a108
fixed bugs and deadlocks in core database indexing structures:
...
- added new Array class that contains an abstraction of the java Arrrays class which replaces the home-brew quicksort algorithm.
- the new class is about four times slower than the old one, but it works correct (the old one had errors)
- fixed a synchronization problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7842 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1912d0cccc
changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7840 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bb8e3f8523
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7839 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
11dc653de3
added a visualization of peer pings to the performance graphic
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7837 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3a191cdf14
because newbies are scared about the memory consumption in the performance graph and arguments about high memory consumption according to bad knowledge about java garbage collection techniques, the memory display had been removed from the performance graph shown on the Status.html page. The memory graph can still be seen on the Performance page where the memory graph is just like it was.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7836 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
52d799e7c8
fix for solr auth
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7833 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
9eb8e9acd9
no error message about missing browser in headless environments
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7832 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d3c89b90ce
temporary adding the old httpclient-3.1 again because the solrj classes need them. should be removed as soon solrj supports httpclient-4
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7831 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bd99969758
fixed bad query
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7830 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
768c59740c
- replaced solrj 3.1 with solrj 3.3
...
- updated also slf4j
- added authentication for solrj
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7829 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
c7b95e8c81
*) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
...
*) Corrupt crawlProfilesPassive.heap would cause crawlProfilesActive.heap to be deleted. Don't know if this ever happend, but will not happen anymore.
*) Cleaned up a little bit.
*) Added some comments.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7827 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6d2e252bcf
fix for:
...
java.lang.NullPointerException
at net.yacy.kelondro.index.RowCollection.<init>(RowCollection.java:97)
at net.yacy.kelondro.index.RowSet.<init>(RowSet.java:48)
at net.yacy.kelondro.rwi.ReferenceContainer.<init>(ReferenceContainer.java:58)
at net.yacy.kelondro.rwi.ReferenceIterator.next(ReferenceIterator.java:69)
at net.yacy.kelondro.rwi.ReferenceIterator.next(ReferenceIterator.java:43)
at net.yacy.kelondro.blob.ArrayStack.merge(ArrayStack.java:1023)
at net.yacy.kelondro.blob.ArrayStack.mergeWorker(ArrayStack.java:922)
at net.yacy.kelondro.blob.ArrayStack.mergeMount(ArrayStack.java:869)
at net.yacy.kelondro.rwi.IODispatcher$MergeJob.merge(IODispatcher.java:267)
at net.yacy.kelondro.rwi.IODispatcher$MergeJob.access$300(IODispatcher.java:239)
at net.yacy.kelondro.rwi.IODispatcher.run(IODispatcher.java:180)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7822 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2d4bb139d3
- added counting of links with noindex tag for solr index
...
- bugfixes for solr index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7820 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
892caccdca
added default configuration in ConfigurationSet in case of new values
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7814 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bda3eec0ff
added parsing of canonical link element to html parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7812 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b6f09a475d
- added an index profile editor in the /indexFederated_p.html servlet for solr indexes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7811 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b666a929e7
fixed Semaphore handling in case of interruptions
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7809 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
de7a054d77
added parser for such files like the new solr.key.list
...
it parses text files with the following syntax:
- all lines beginning with '##' are comments
- all non-empty lines not beginning with '#' are keyword lines
- all lines beginning with '#' and where the second character is not '#' are commented-out keyword lines
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7808 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
267290a821
removed the semaphores from the cache dump process because I believe some of the semaphores may be lost somewhere which then causes that the cache is never flushed and then the peer dies from a OOM. The re-introduced synchronization may not be the best solution but should ensure that the caches are flushed.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7802 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d8072d1866
added more info to DNS cache in /PerformanceMemory_p.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7798 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f803da8aae
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7797 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
84c9658644
added a file type navigator
...
added a protocol navigator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7795 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
31283ecd07
- added a search option to filter only specific network protocols. i.e. get only results from ftp servers. Just add '/ftp' to your search.
...
for example search for "passwd /ftp". This can also be done with /http /https and /smb
- fixed some search throttling processes that should protect your peer against search DoS or strong search load
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7794 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
7db208c992
performance hacks: more pre-allocated StringBuilder
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7790 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
07e89a7ae5
added @Deprecated
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7788 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
9706fc55aa
enhanced content scraper (should discover urls much faster in case of very large plain texts)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7787 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
996f0a8764
disabled assert in Base64Order which eats away too much performance during testing with -l
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7786 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f667b9c289
enhanced identificator: using AtomicInteger for counter
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7785 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
16327d1cbe
unwrapping of call depth (one call less for UTF8.String)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7784 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f30d36b101
enhanced template engine
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7783 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
aa6c32d753
enhanced UTCDiffString
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7782 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
f87865a50b
always shutdown log, fixes zombie processes in init stop script
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7780 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
115abc8917
- more attributes for search progress bar
...
- moved cache strategy to cora package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
77fe69395d
added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7774 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
df1725ef43
re-enable POST over proxy, which didn't work since update to httpcore-4.1.1
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7772 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2683162ec5
- added more options to access grid picture, web structure picture and network graphics
...
- remove test class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7770 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0c1b29f3c9
- applied many small performance hacks
...
- added a memory limitation in the zip parser and the pdf parser
- added a search throttling: if there are too many search queries are still to be computed, then new requests are not accepted for some time. if after a one second still no space is there to perform another search, the search terminates with no results. this case should only happen in case of DoS-like situations and in case of strong load on a peer like if it is integrated in metager.
- added a search cache deletion process that removes search requests in case that throttling happens
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7766 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
fe0c08455b
more concurrency (enhancement) hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7759 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
87082f407e
less String object creation during search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7756 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3c2b994bd6
write access/load time to solr index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7752 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a36fda991e
hack to increase speed of url hash computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7751 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
dbea40d536
- changed snippet fetch strategy logic: do not check if entry is in cache. This should reduce IO load on the HTCACHE which is a showstopper during large number of search requests
...
- forced a possible short memory status when a search is started to flush caches that may cause search-heaps with resource contention effects
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7747 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
4bea3f9714
hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
...
used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes).
The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
746e3c3b06
Replaced a widely-used Property Object in the httpd with HashMap<String, Object> which is not synchronized like Properties
...
A synchronization is not needed here and applies an overhead to the httpd process which is now removed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7745 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e28bd0d038
fix for some possible causes of memory leaks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7741 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
09ba6814c0
- non-blocking word hash computation with dynamic digest object generation (this was important!)
...
- (very) small performance enhancement in did-you-mean
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7740 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
10e2f588f8
- enhanced ybr ranking computation
...
- many speed/performance hacks
- added solr charding and new charding web interface
- added option to switch off the yacy index when using solr
- added new fail-url categories which are used to make a distinction which fail-urls to be sent to solr
- refactoring/renaming of some method names to distinguish host/url hashes better
- a large number of bug/npe fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7738 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bd55dcee50
- commented out experimental distributed ranking loading
...
- less threads for blocking threads
- disable all threads for DHT transmission for networks with zero peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7737 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
98c4d25185
fix for endless loop in FTP crawling, see http://bugs.yacy.net/view.php?id=32
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7736 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3ed4a09368
small features, some bug fixes and performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7733 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b45701d20f
this is a re-implementation of the YaCy Block Rank feature
...
This time it works like this:
- each peer provides its ranking information using the yacy/idx.json servlet
- peers with more than 1 GB ram will load this information from all other peers, combine that into one ranking table and store it locally. This happens during the start-up of the peer concurrently. The new generated file with the ranking information is at DATA/INDEX/<network>/QUEUES/hostIndex.blob
- this index is then computed to generate a new fresh ranking table. Peers which can calculate their own ranking table will do that every start-up to get latest feature updates until the feature is stable
- I computed new ranking tables as part of the distribition and commit it here also
- the YBR feature must be enabled manually by setting the YBR value in the ranking servlet to level 15. A default configuration for that is also in the commit but it does not affect your current installation only fresh peers
- a recursive block rank refinement is implemented but disabled at this point. it needs more testing
Please play around with the ranking settings and see if this helped to make search results better.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7729 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago