Michael Peter Christen
461a0ce052
removed warnings
13 years ago
Michael Peter Christen
62ae9bbfda
allow more POIs, get more at once
13 years ago
Michael Peter Christen
407fdf6968
more bug fixes and performance hacks for search process
13 years ago
Michael Peter Christen
a1fe65b115
performance hacks
13 years ago
Michael Peter Christen
2fe207f813
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
5aee19daa4
added show from cache in search results (not yet finished)
13 years ago
Michael Peter Christen
5e562dcdb7
adopted vocabulary usage within anotation/naviagtion feature of search
...
to new SimpleVocabulary class
13 years ago
Michael Peter Christen
514700291a
moved Vocabulary to cora package (added in git
...
964406ad17
)
13 years ago
Michael Peter Christen
0284a4d88f
more fixes for double precision of coordinates
13 years ago
Michael Peter Christen
964406ad17
added concurrency enhancement to xml parser
13 years ago
Michael Peter Christen
240045cf7c
fix for bad distance computation
13 years ago
Michael Peter Christen
e0d8643226
- performance hacks
...
- added log warnings in case that search processes run into time-out
situations
- better concurrency for Integer formatter (used a non-synchronized
formatter before)
- bugfix for search termination (a poison pill was missing)
- added timeout parameters for search (again) -> target is, that they
are never reached.
13 years ago
Michael Peter Christen
7a329465b3
using pre-compile pattern in blacklist; should enhance search speed
13 years ago
Michael Peter Christen
6e83b02b83
- bugfix for surrogate file reader
...
- bugfix for location search: suppress empty search
13 years ago
Michael Peter Christen
9b4c699526
ehanced location search:
...
- search request are now made using a map boundary
- search results are only computed for the map boundary
- the number of results is adopted to the results in the visible range
- added a double-buffering for the search result markers
- added a search query option for the search results:
/radius/<lat>/<lon>/<radius>
13 years ago
Michael Peter Christen
834dc6b263
store more data from interface access
13 years ago
Michael Peter Christen
1f48d1528b
performance hacks
13 years ago
Michael Peter Christen
c70aaccdc9
better location to generate a guid for rss messages
13 years ago
Michael Peter Christen
10da7335ea
performance hack: use a hash cache for all hashes that are computed by a
...
byte array. If this hash is used in a HashMap (which is very often the
case) then this hack eliminates a lot of re-computations of the same
hash.
13 years ago
Michael Peter Christen
f8a0cf6d7c
RSSMessages do not need a concurrent hash map -> removed overhead
13 years ago
Michael Peter Christen
07ca7e4dd1
enhanced RSS parsing by ensuring that it is parsed with a buffered input
...
stream
13 years ago
Michael Peter Christen
7c1feefb28
introduced a default 10 second time-out in rwi normalization time
...
uring search process to prevent endless deadlocks after a very long
running search
13 years ago
Michael Peter Christen
8d997d55b6
better logging
13 years ago
Michael Peter Christen
65d37e6a20
only ASCII needed in seed bitflags
13 years ago
Michael Peter Christen
0f82fb3628
using double instead float for a better release ordering
13 years ago
Michael Peter Christen
43c2c6e588
better logging
13 years ago
sixcooler
56087c1f23
bump to httpclient- httpcore-, httpmime- 4.2
13 years ago
Michael Peter Christen
71c3163f3d
- fixes to node identification
...
- added link to node in network list
- added marking of portal search node peers
13 years ago
Michael Peter Christen
4d3cc02168
replaced old bzip2 library against better documented commons-compress
...
package from http://commons.apache.org/compress/
13 years ago
Michael Peter Christen
ad222be7f8
added node state icon in network list
13 years ago
Michael Peter Christen
3c2bec681f
added a root node flag: identifies peers with short ping time
13 years ago
Michael Peter Christen
c846e9ca14
redesign of the crawler monitor page: show crawled pages instead of
...
queue of urls that shall be crawled
13 years ago
Michael Peter Christen
c15fcde1c8
add-on to latest commit
13 years ago
Michael Peter Christen
cf47d94888
performance hack to parse numbers inside of substrings without actually
...
generating a substring. This avoids the allocation of a String object
ech time a substring is parsed. Should affect CPU load during RWI
transmission.
13 years ago
Michael Peter Christen
7e0ddbd275
added a "fromCache" flag in Response object to omit one cache.has()
...
check during snippet generation. This should cause less blockings
13 years ago
Michael Peter Christen
81737dcb18
removed stack trace from swf parser since we cant do anything there
13 years ago
Michael Peter Christen
7bf421b9dd
- fixed image search page navigation
...
- removed some deadlocks and ConcurrentModificationExceptions during
DidYouMean collection
13 years ago
Michael Peter Christen
c6a09eab0b
synchronization needed
13 years ago
Michael Peter Christen
fb94b47b1a
changed queue sizes to have less memory occupied during indexing
13 years ago
Michael Peter Christen
76157dc2c3
bugfix for http://bugs.yacy.net/view.php?id=173
13 years ago
reger
6696cb1313
bugfix: lookup of peernames no result for active peer in page IndexControlRWIs_p.html -> Transfer RWI to other Peer
...
SeedDB.lookupByName searche for lowercase peerNames, while MapColumnIndex.getIndex uses peername as is in the keyset.
Changed the index init to insert lowercase peer names as key
13 years ago
Michael Peter Christen
c6558cba08
more classification bugs
13 years ago
Michael Peter Christen
082831b9d6
search contentdom was checked in wrong way - fixed
13 years ago
reger
ee553d971e
correct typo in scripts_txt comment
13 years ago
Michael Peter Christen
f294f2e295
bugfix to http://bugs.yacy.net/view.php?id=181
...
tried to make a bit less 'noise' to dns server
also included: less processes in snippet fetch to reduce load during
search on small computers
13 years ago
Michael Peter Christen
acf8d521a2
fix for http://bugs.yacy.net/view.php?id=126
13 years ago
Michael Peter Christen
bb88878b4d
the last commit was incomplete..
13 years ago
Michael Peter Christen
d320a31ae1
bugfix for http://bugs.yacy.net/view.php?id=186
13 years ago
Michael Peter Christen
fa735f4f04
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
3e1bc9477f
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
6f8a2fef1f
small speed enhancement using a column factory
13 years ago
Roland 'Quix0r' Haeder
d10627d591
More sync in close() methods
...
Conflicts:
source/net/yacy/kelondro/logging/GuiHandler.java
source/net/yacy/kelondro/workflow/InstantBusyThread.java
13 years ago
Roland 'Quix0r' Haeder
b3ae2aa41f
With or without 'final'? At least please try it in other methods
...
Conflicts:
source/de/anomic/tools/tarTools.java
13 years ago
Roland 'Quix0r' Haeder
fbb946f913
Made a method static (Eclipse suggested it), removed unused import, pk=null check does now output a warning in logfile
13 years ago
Michael Peter Christen
52d307c735
prevent that the snippet fectch process removes catchall entries
13 years ago
Michael Peter Christen
7eece0256f
moved yacy.logging to defaults according to request in
...
http://bugs.yacy.net/view.php?id=55
13 years ago
Michael Peter Christen
89142d1e8d
removed (not all) warnings
13 years ago
Michael Peter Christen
5deebd02ea
added serialization
13 years ago
reger
b2175ea4ef
Add possibility to set custom Solr field names for the YaCy default Solr attributes.
...
- Changing the format of YaCy's solr.key.list while maintainig backward compatibility
Federated index config screens adjusted accordingly
- modified the Solr update request to use a 3 min Solr autocommit intervall
13 years ago
Michael Peter Christen
15db703808
added missing serialization to remove all warnings
13 years ago
Michael Peter Christen
1795a7325b
made HandleSet serializable
13 years ago
Michael Peter Christen
e7e381d110
added configuration to switch off redirection following in crawler
13 years ago
Michael Peter Christen
2717c1b749
fixed bug in solr interface
13 years ago
Michael Peter Christen
f150bc218b
fixed bug in solr error document
13 years ago
Michael Peter Christen
cb54c1737b
solrj connector bugfix
13 years ago
Roland 'Quix0r' Haeder
a093ccf5eb
Now used synchronization in all close() methods to make sure all objects
...
are 'closed' in an ordered way
Conflicts:
source/de/anomic/http/server/ChunkedInputStream.java
source/de/anomic/http/server/ChunkedOutputStream.java
source/de/anomic/http/server/ContentLengthInputStream.java
source/net/yacy/cora/protocol/Domains.java
source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java
source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java
source/net/yacy/document/content/dao/PhpBB3Dao.java
source/net/yacy/document/parser/html/AbstractTransformer.java
source/net/yacy/kelondro/blob/BEncodedHeap.java
source/net/yacy/kelondro/blob/HeapReader.java
source/net/yacy/kelondro/index/RAMIndexCluster.java
source/net/yacy/kelondro/io/ByteCountInputStream.java
source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java
source/net/yacy/kelondro/table/SQLTable.java
13 years ago
Michael Peter Christen
49cab2b85f
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
0d58fea210
made multiple connector default
13 years ago
Michael Peter Christen
7740c02c56
- enhanced the solr connector
...
- added new multiple connector (to replace singleConnector)
13 years ago
Michael Peter Christen
0cf3d36eae
more tolerance in case of corrupted file
13 years ago
Michael Peter Christen
acc6db28ff
added missing classes for solr interface
13 years ago
Michael Peter Christen
adeb33bb36
better abstraction for solr objects
13 years ago
Michael Peter Christen
8864141872
more abstraction in solr connection classes
13 years ago
Michael Peter Christen
c00efc2717
made the solr connection more generic
13 years ago
Michael Peter Christen
ea2bd43b28
patch for broken configurations
13 years ago
Michael Peter Christen
e5ca7f22b1
enhancement in circle drawing
13 years ago
Michael Peter Christen
34f4225d7e
less 'wellformed' calls without asserts
13 years ago
Marc Nause
a691023d04
*) better formatting for network QPM
...
*) refactoring
13 years ago
Michael Peter Christen
77f8e9fb9b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
ba6aaabc51
refactoring + parser bugfixes
13 years ago
Michael Peter Christen
2a0434efa4
Merge commit 'c1f6b4fb5226d3d2f8b2bec9e361f6b3476e03ff'
13 years ago
Michael Peter Christen
942896fe46
removed methods not supported by new solrj connector for httpclient 4
...
Error was:
java.lang.UnsupportedOperationException: Client was created outside of
HttpSolrServer
at
org.apache.solr.client.solrj.impl.HttpSolrServer.setDefaultMaxConnectionsPerHost(HttpSolrServer.java:614)
at
net.yacy.cora.services.federated.solr.SolrSingleConnector.<init>(SolrSingleConnector.java:128)
at
net.yacy.cora.services.federated.solr.SolrShardingConnector.<init>(SolrShardingConnector.java:55)
at net.yacy.search.Switchboard.<init>(Switchboard.java:657)
at net.yacy.yacy.startup(yacy.java:222)
at net.yacy.yacy.main(yacy.java:1018)
13 years ago
Michael Peter Christen
22e1f68c0b
solrj user authentication patch
13 years ago
Michael Peter Christen
09484955dc
added new entry class for embed tags
13 years ago
Michael Peter Christen
62f2554a01
- fixed build problems (deprecated methods using httpclient 3.1)
...
- removed httpclient 3.1 lib which was used by solrj (solrj now uses
httpclient 4)
13 years ago
Michael Peter Christen
a6d60fc21f
concurrency enhancement in ConfigurationSet
13 years ago
Michael Peter Christen
453010bd68
- solved problems with backpath normalization
...
- redesigned in/outbound link handover
- removed iframe links from inbound/outbound in solr scheme
13 years ago
Michael Peter Christen
5f5ed33ed8
patch for media search (audio, video apps)
13 years ago
Michael Peter Christen
7860c1df80
fix needed for new solrj library
13 years ago
Michael Peter Christen
0e13022147
- enhanced solr field documentation
...
- added xml api button to IndexFederated_p - the solr schema.xml file
can be generated by YaCy
13 years ago
Michael Peter Christen
19efbf1b0f
- apply directDocByURL to NOLOAD Queue
...
- choose pushing to NOLOAD as default for site crawl
13 years ago
Michael Peter Christen
659178942f
- Redesigned crawler and parser to accept embedded links from the NOLOAD
...
queue and not from virtual documents generated by the parser.
- The parser now generates nice description texts for NOLOAD entries
which shall make it possible to find media content using the search
index and not using the media prefetch algorithm during search (which
was costly)
- Removed the media-search prefetch process from image search
13 years ago
Michael Peter Christen
a3badd3205
changed search process for images: no more media snippet load process,
...
show only links from index which had been on the text search page
before. This creates a superfast search process for images!
13 years ago
reger
c1f6b4fb52
lookupByIP: prevent comparing of port parameter if called with port -1 (=unknown)
13 years ago
Michael Peter Christen
f8cd57c92f
new indexing strategy: ALL links that appear anywhere are indexed, not
...
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
13 years ago
Michael Peter Christen
14f67f217c
refactoring of ContentDomain: now subclass of Classification
13 years ago
Michael Peter Christen
8a08c96a82
removed dependency from logging
13 years ago
Michael Peter Christen
a1a5b015d8
refactoring: moved document Classification to cora package
13 years ago
Michael Peter Christen
33d1062c79
refactoring: the cache belongs to the crawler
13 years ago
Michael Peter Christen
4d5da75814
fix for parser problem if a <a>-tag is 'within' html tags with unclosed
...
tags. That prevented the <a> tags from beeing recognized. This is a fix
for http://forum.yacy-websuche.de/viewtopic.php?p=25516#p25516
13 years ago
Michael Peter Christen
91a86f0b06
fixed to network graph testing
13 years ago
Michael Peter Christen
7b5b9baee0
added citation rank to ranking profile
13 years ago
Michael Peter Christen
046f3a7e8d
check if httpc has decompressed the release file and rename the file
...
from .tar.gz to .tar if that happened
13 years ago
Michael Christen
02e4dedff2
fix to url citation collection
13 years ago
Michael Christen
e32055aa15
added stub classes for
...
- a new database for url reference data ('seen links')
- a new database extending the references to the full url metadata
attributes set which shall replace the old metadata database if it is
finished
- migration help classes stub to use old and new metadata databases
simultanously
13 years ago
Michael Christen
ac5d124ee0
experimental implementation of a citation ranking as post-ranking
...
method. (ranking coefficient fixed, need to be made configurable)
13 years ago
Michael Christen
8fc86fe397
added storage of full anchor link structure:
...
the links between all pages are now stored. The same index structure as
used for the word index is used to make a reverse link index.
The new file(s) in SEGMENT/default/citation.index.*.blob store the
citation index. This will be used to create much more detailed link
structures for the YaCy apis and to create a better ranking. A ranking
using the citation.index should provide better results especially for
portal indexes and initranets.
13 years ago
Lotus
0b3f39136e
allow custom ppm lower than minimum button on /Crawler_p.html
...
fixes http://bugs.yacy.net/view.php?id=166
13 years ago
Michael Peter Christen
532c7cf827
added physics experiment to the graph plotter. not active by default
13 years ago
Michael Peter Christen
aba9b1bfa0
better names for elements of a linked graph
13 years ago
Michael Peter Christen
2fc8ecee36
ConcurrentLinkedQueue has a VERY long return time on the .size() method.
...
See
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html
and the following test programm:
public class QueueLengthTimeTest {
public static long countTest(Queue<Integer> q, int c) {
long t = System.currentTimeMillis();
for (int i = 0; i < c; i++) {
q.add(q.size());
}
return System.currentTimeMillis() - t;
}
public static void main(String[] args) {
int c = 1;
for (int i = 0; i < 100; i++) {
Runtime.getRuntime().gc();
long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c);
Runtime.getRuntime().gc();
long t2 = countTest(new LinkedBlockingQueue<Integer>(), c);
Runtime.getRuntime().gc();
long t3 = countTest(new ConcurrentLinkedQueue<Integer>(),
c);
System.out.println("count = " + c + ": ArrayBlockingQueue =
" + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = "
+ t3);
c = c * 2;
}
}
}
13 years ago
Michael Peter Christen
8aba045ba1
if a new pop-up page is set in config portal, then this page applies
...
also to the default page configuration for the httpd if no path is
given.
13 years ago
Michael Peter Christen
8c06925984
animation of the web structure picture
13 years ago
Michael Peter Christen
898fa7c3f3
use tld heuristic to check if a domain is local or global
13 years ago
Michael Peter Christen
213c8d97f2
use less proccesses in process pool
13 years ago
Michael Peter Christen
c639248c23
protection against strange answers from remote peers during search
13 years ago
Michael Peter Christen
36e4d82b27
changed ranking
13 years ago
Michael Peter Christen
096c17e7cd
added test code
13 years ago
Michael Peter Christen
665626a51b
catch OOM errors during scanning
13 years ago
Michael Peter Christen
1cd711d005
added classes for citation references (for new citation ranking)
13 years ago
Michael Peter Christen
33a405dab8
ipv6 bugfix
13 years ago
Michael Peter Christen
c6c61be3f0
fix for http://bugs.yacy.net/view.php?id=148
13 years ago
Michael Peter Christen
e0f1e7d904
added new citation reference data structure that shall be used for a
...
citation ranking
13 years ago
Michael Peter Christen
e18a4f6b74
more tolerant merge iterator
13 years ago
Michael Peter Christen
e101c2e0e2
added changes from copperdust (submitted by email):
...
1. Improved and fixed language detection:
1.1 Identificator.java - recognition fix (improved)
1.2 DCEntry.java - fix (changed detection order due to detection from
tld in many cases is incorrect)
1.3 MultiProtocolURI.java - fixed and enhanced language from tld
detection (all currently used top-level domains; ccTLD added but not
tested).
2. Ukrainian language update.
3. Main Slavic languages langstats (tested and works fine).
13 years ago
Michael Peter Christen
8d63a5887c
bugfixes
13 years ago
Michael Peter Christen
9ad1d8dde2
complete redesign of crawl queue monitoring: do not look at a
...
ready-prepared crawl list but at the stacks of the domains that are
stored for balanced crawling. This affects also the balancer since that
does not need to prepare the pre-selected crawl list for monitoring. As
a effect:
- it is no more possible to see the correct order of next to-be-crawled
links, since that depends on the actual state of the balancer stack the
next time another url is requested for loading
- the balancer works better since the next url can be selected according
to the current situation and not according to a pre-selected order.
13 years ago
Michael Peter Christen
7e4e3fe5b6
free some memory after parsing html
13 years ago
Michael Peter Christen
4540174fe0
memory hacks
13 years ago
Michael Peter Christen
b4409cc803
small redesign of blob column index and usage
13 years ago
Michael Peter Christen
d5c1f2746e
performance hack
13 years ago
Michael Peter Christen
803963aebd
performance hack: better space grow in CharBuffer (speeds up html
...
parser)
13 years ago
Michael Peter Christen
8b0920b0b5
tried to fix the ipv6 problem as reported in bug
...
but this did not solve all problems because a bug in the apache http
client prevented that it worked. Thread dump:
Caused by: java.lang.NumberFormatException: For input string:
"1450:400c:c01:0:0:0:69"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.http.client.utils.URIUtils.extractHost(URIUtils.java:310)
at
org.apache.http.impl.client.AbstractHttpClient.determineTarget(AbstractHttpClient.java:764)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at net.yacy.cora.protocol.http.HTTPClient.execute(HTTPClient.java:597)
at
net.yacy.cora.protocol.http.HTTPClient.getContentBytes(HTTPClient.java:558)
at net.yacy.cora.protocol.http.HTTPClient.GETbytes(HTTPClient.java:341)
at de.anomic.crawler.retrieval.HTTPLoader.load(HTTPLoader.java:131)
at de.anomic.crawler.retrieval.HTTPLoader.load(HTTPLoader.java:74)
at
net.yacy.repository.LoaderDispatcher.loadInternal(LoaderDispatcher.java:274)
at net.yacy.repository.LoaderDispatcher.load(LoaderDispatcher.java:164)
at net.yacy.repository.LoaderDispatcher.load(LoaderDispatcher.java:150)
at
net.yacy.repository.LoaderDispatcher.loadDocument(LoaderDispatcher.java:355)
at getpageinfo_p.respond(getpageinfo_p.java:97)
13 years ago
Michael Peter Christen
e2f8f263e8
changed storage of search words: keep order
13 years ago
Michael Peter Christen
ed39ef2890
changed generation of protocol information
13 years ago
Michael Peter Christen
0b67a0a5d8
added a column index for tables in blob files. This is heavily used
...
during receiving of DHT submissions and when answering remote search
requests. Both events together may have caused IO-deadlocking and this
commit shall fix that.
13 years ago
Michael Peter Christen
2e5cd6a1b2
fixed parser extension deny list generation and usage
13 years ago
Michael Peter Christen
8bee1472c9
there is no noindex, only nofollow in links
13 years ago
Michael Peter Christen
3cd6dcd352
do not add new solr fields as activated fields
13 years ago
Michael Peter Christen
e3bb73c3d6
serialized some database access methods
13 years ago
Michael Peter Christen
7e728867e5
added a synchronization around iterations to prevent IO-deadlocking
...
during concurrent remote search requests
13 years ago
Michael Peter Christen
355ecf330f
reduced target file site to 64mb
13 years ago
Michael Peter Christen
10ae6d94a1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
2ea585d616
fix for host navigator
13 years ago
Michael Peter Christen
2f6dde92e2
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
c560a582ac
fix for single-word vocabulary lines
13 years ago
Michael Peter Christen
4c5edab1ec
added option to have exception search result windows
13 years ago
Michael Peter Christen
046d7de95b
Merge remote branch 'reger/master'
13 years ago
reger
a95f645a61
Bugfix class repository.Loaddispatcher fixed download file limit of 10000
...
line 355: final Response response = this.load(request, cachePolicy, 10000, true);
13 years ago
Michael Peter Christen
ef78f22ee1
performance hack
13 years ago
Michael Peter Christen
41536eb4a2
performance hack
13 years ago
Michael Peter Christen
f91487fc50
added delete-button for host navigation
13 years ago
Michael Peter Christen
e8d24fd802
author navigator can be switched off
13 years ago
Michael Peter Christen
558ab7bd4e
made the protocol navigator reversible
13 years ago
Michael Peter Christen
96cb75f1d4
made the filetype navigator be able to deselect the search constraint
13 years ago
Michael Peter Christen
1f4f60654a
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/document/parser/pdfParser.java
13 years ago
reger
32104360ce
PDFParser - return at least first 3 pages of PDF
...
fix for pdf parsing without returning parsed text due to interruption by
time out.
13 years ago
Michael Peter Christen
ef5192f8c9
using the generic document parser for crawl starts instead of the html
...
parser. This makes it possible that every type of document can be a
crawl start point, not only text documents or html documents. Testet
this with a pdf document.
13 years ago
Michael Peter Christen
a02fdf8625
better error messages
13 years ago
Michael Peter Christen
eadb58dd87
small enhancements in pdf parser
13 years ago
Michael Peter Christen
c6ba44468e
timeout = 5000 instead 3000
13 years ago
reger
b616de5973
PDFParser - return at least first 3 pages of PDF
...
fix for pdf parsing without returning parsed text due to interruption by time out.
13 years ago
Lotus
c73af39e54
refactoring of tray icon class,
...
now uses Java 6 methods natively
13 years ago
Michael Peter Christen
4eff0e26f1
npe bugfix
13 years ago
low012
8776b84c10
*) small fix to make password change function of reconfigureYACY.sh work
...
again
13 years ago
Michael Peter Christen
1a0b6b3913
get more navigation details to search results
13 years ago
Michael Peter Christen
7f9b6b7a0c
added switches to ConfigParser to accept/deny documents by their
...
extension
13 years ago
Michael Peter Christen
4901cee3cc
suppress auto-tagged subject entries when sending out or receiving
...
metadata from other peers
13 years ago
Michael Peter Christen
83009d86f7
added the vocabulary navigator. It can be very simply tested by
...
switching on the locale dictionaries.
13 years ago
sixcooler
985b78cf89
correct 'avaiable()' to use max of young / eden
13 years ago
sixcooler
4da8746275
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
sixcooler
c9aaa9e00a
respect non-reserved Memory in GenerationMemoryStrategy
...
and enable it again
13 years ago
Michael Peter Christen
37f2d1b3e9
replaced Thread initialization with ExecutorService pool for delete
...
method. This is much faster and produces less blocking when using the
Compressor class which is used by the HTCache. I.e. picture search is
much faster now.
13 years ago
Michael Peter Christen
a58dc4a91f
added autotagging to document condenser:
...
- tags that are automatically generated now enrich the dc:subject
- auto-generated tags have a '$' at the beginning of the tag
- auto-generated tags lead the tag name with a vocabulary name
each tag has the form
$<vocabulary-name>:<tag-printname-space-replaced-by-'_'>
13 years ago
Michael Peter Christen
0d6176804b
emergency disabling of GenerationMemoryStrategy because of non-working
...
available-method
13 years ago
Lotus
411aab02e3
Windows installer now detects reliably whether YaCy runs. A file lock on
...
the yacy.running file has been implemented.
13 years ago
Michael Peter Christen
87f0210480
enriched log output to find NPE in HeapReader
13 years ago
Michael Peter Christen
987b412491
updated solr scheme: generic declaration of solr schemes
13 years ago
Michael Peter Christen
254adea51c
small fixes
13 years ago
Michael Peter Christen
49be60a7c8
WorkflowProcess is forced to make small pauses if shortMemoryStatus is
...
reached.
13 years ago
Michael Peter Christen
b7bb84c0bb
set a limit to CharBuffer object size to fight against bad/too large
...
content
13 years ago
Michael Peter Christen
c602eaaf46
enhanced search process
13 years ago
Michael Peter Christen
087f97d4c0
less noise if a browser cannot be opened
13 years ago
Michael Christen
eff966f396
fix for search process (it was aborted too early during remote search)
13 years ago
Michael Christen
e6d51363ee
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Marek Otahal
a231d0eeb9
Run from Java the whole app YACY
...
start for java webStart
allow for better integration with IDE
Conflicts:
source/net/yacy/gui/framework/Browser.java
13 years ago
Marek Otahal
72adbeae90
!Important: move from Hashtable to HashMap
...
Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better
functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue,
but I found notices that some (ugly big) helper classes had to be created in past
to compensate missing Hashtable's functionality. I'd like input if we can remove some of them.
look for //FIX: if these commits
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Marek Otahal
f40efb39af
Blacklist loadList() remove duplicates by using Set
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Marek Otahal
f75b5e40e0
little fix in copy()
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Marek Otahal
1dc5d9f0f3
make ConnectionInfo comparable and sort list of connections in Connections_p
...
ConnectionInfo compare by initTime
Connections_p implement wish to sort connections, descending
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Michael Christen
fa8da7f89d
vocabularies are now also used as source for a did-you-mean computation
13 years ago
Michael Christen
eaec14ecc4
Dictionaries from words caches can now be used as autotagging vocabulary
13 years ago
Michael Peter Christen
91940fdf56
redesign of WordCache to be prepared to hold multiple
...
independent dictionaries. Such dictionaries can then be also used as
simplified vocabularies.
13 years ago
Michael Christen
bd40a10230
added autotaggig stub .. only reading and parsing of vocabularies at
...
this time
13 years ago
Michael Peter Christen
2ee8cbeb2c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/search/Switchboard.java
13 years ago
Michael Peter Christen
992dbdf4bb
added noload statistic to servlets
13 years ago
Michael Christen
eebc02f5c1
fix
13 years ago
Michael Christen
216a287a85
Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
...
Conflicts:
source/de/anomic/crawler/CrawlQueues.java
13 years ago
stbrumm
d18095dc48
Patch fuer Issue 0000102
...
and fixes to Patch (private peer status is a property of a peer, not a
status)
13 years ago
stbrumm
9f1b1b4604
Type for Robinson-Mode/Private Perr added
13 years ago