Michael Peter Christen
4716546ef5
- reduced memory usage in index transmission using a transformation of
...
Node to Row objects
- removed peerDeparture in solr remote search in case that peer does not
answer (this may be normal because it is allowed to switch this off)
12 years ago
Michael Peter Christen
06b0081fdc
fix for NPE during host navigation computation
12 years ago
orbiter
acb9f04e80
removed unused classes
12 years ago
Michael Peter Christen
755f5e76cf
removed strange assert statements and simplified code in metadata
...
transformation
12 years ago
orbiter
ee01c12e56
fixes for putDocument and putMetadata
12 years ago
Michael Peter Christen
f9fc5cfaba
better check for bad urls in url transmission
12 years ago
Michael Peter Christen
40c0856489
refactoring
12 years ago
Michael Peter Christen
9bece5ac5f
enhanced snippet fetch - removed a bug that caused documents to be
...
parsed even if a solr text was available
12 years ago
Michael Peter Christen
395b78a0d8
using the solr search index to concurrently search within solr and the
...
rwis during local search requests.
12 years ago
Michael Peter Christen
e5ef840f40
- renamed DoubleSolrConnector to MirrorSolrConnector and added a
...
hit/miss/document cache to the MirrorSolrConnector.
- more abstraction to SolrDocument in Connector interface
- bugfixes in Solr field reader
12 years ago
Michael Peter Christen
94a334f128
another fix to the Solr metadata reading process and to the shutdown
...
process
12 years ago
Michael Peter Christen
b51df6c7e8
- added coordinate storage in solr schema
...
- fixed shutdown process
- fixed some solr-to-metadata reading
- added a large number of metadata attributes in ViewFile.html
12 years ago
Michael Peter Christen
f9c0e6e950
- Implemented and integrated the URIMetadataNode object which is a
...
metadata representation from the solr index. This shall replace metadata
from the built-in database in the future.
- added the Solr-driven metadata into the search index of YaCy which
makes it now possible to run YaCy without the old metadata index. This
is a major stept forward to a full migration to Solr.
12 years ago
Michael Peter Christen
dcc72799c4
better abstraction for result writers using controlled vocabularies and
...
URIRefs
12 years ago
Michael Peter Christen
a12f693ec9
added two response writer for embedded solr interface:
...
a rss/opensearch writer and an enhanced solr xml writer.
The enhanced solr writer has less configuration overhead than the
original writer and should by slightly faster. The rss/opensearch writer
is at this time slightly incomplete compared with the already existing
rss search result form YaCy and also snippets are missing at this time.
To test the new interface, open for example:
http://localhost:8090/solr/select?wt=rss&q=olympia
The wt-code for the new result writers are=
wt=rss for opensearch
wt=exml for the enhanced solr xml writer.
Additionally, the SRU search parameters had been added to the solr
interface which can now also be used for a normal solr/xml search.
12 years ago
sixcooler
f32aa9a49c
prevent merge of blobs that can't be handled in memory
12 years ago
Michael Peter Christen
1687737771
Abstraction of HandleMap and HandleSet
12 years ago
Michael Peter Christen
e432bb9cd9
better calculation of possible saving in HeapReader index data structure
12 years ago
Michael Peter Christen
9549984c65
documentation/comments
12 years ago
Michael Peter Christen
826967513b
changed options in IndexFederated_p to switch on/off parts of the index
...
individually. The settings are experimental and the values of the
settings will be overwritten when an index migration from urldb to solr
starts.
12 years ago
orbiter
69e743d9e3
- more abstraction for the RWI index as preparation for solr integration
...
- added options in search index to switch parts of the index on or off
12 years ago
Michael Peter Christen
f0a079ac9f
allow larger log entries
13 years ago
Michael Peter Christen
784a4abb18
enhancement in internal data organization which should generate less
...
synchronizations in database access
13 years ago
Michael Peter Christen
f78ce93a80
collection of speed and memory saving hacks
13 years ago
orbiter
a196f24f60
prevent enqueueing of non-loggeable logging entries
13 years ago
orbiter
482afed07c
reduced logging overhead (a bit)
13 years ago
orbiter
e76159040b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
orbiter
bbfa497a3c
replaced more size() > 0 by !isEmpty()
13 years ago
Michael Peter Christen
83da68c4c1
fixed a memory leak inside the logger which appeared if the log was
...
writter faster that the logger is able to print this out to its out
stream. A very large collection of unwritten log outputs had been seen
during strong crawling. The new ArrayBlockingQueue is limited to prevent
this case.
13 years ago
orbiter
0cbda0b2b8
- replaced all length() == 0 and size() == 0 with isEmpty()
...
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
13 years ago
Michael Peter Christen
1addbc792c
use less memory for md5 cache
13 years ago
Michael Peter Christen
f32de94723
more logging
13 years ago
Michael Peter Christen
8efc1c1078
- fixed a memory leak (or bad usage) during parsing/snippet fetch
...
- more logging for errors
13 years ago
Michael Peter Christen
b0c408788b
made class methods static where possible
13 years ago
Michael Peter Christen
5bd3c90907
- removed unnecessary semicolons
...
- added default case for switch
13 years ago
Michael Peter Christen
132afaf687
removed unaccessible code
13 years ago
Michael Peter Christen
7c1ba99755
removed more unused method parameters
13 years ago
Michael Peter Christen
83701a1b4c
removed unused ImageReference package
13 years ago
Michael Peter Christen
0301aba1e9
removed unused method parameters
13 years ago
Michael Peter Christen
d3964253ae
- added @SuppressWarnings to unused servlet method parameters
...
- removed unnecessary casts
- removed unnecessary throw statements
13 years ago
Michael Peter Christen
ea10766bfd
cleaned unnecessary nested code
13 years ago
Michael Peter Christen
1481037820
replaced non-generic array with collection
13 years ago
Michael Peter Christen
613b45f604
- better data structures in secondary search
...
- fixed a big memory leak in secondary search
13 years ago
Michael Peter Christen
8a82609360
- smaller caches to save memory
...
- close cloneable iterators to free memory
13 years ago
Michael Peter Christen
ce8d4b87d9
fixes for new eclipse 'Juno' warning 'Resource leak'.
13 years ago
Michael Peter Christen
0c345d1559
giving threads name so its easier to see whats happening during
...
debugging and within a thread dump
13 years ago
Michael Peter Christen
b9d42fd9c8
using com.google.common.io.Files instead of homebrew methods
13 years ago
Michael Peter Christen
de3ef8ad73
removed unimportant warnings
13 years ago
Michael Peter Christen
9264d8b4af
removed old navigation practice using subject tags in favor of
...
triplestore-tags
13 years ago
Michael Peter Christen
61bb52d55c
- using http://purl.org/dc/terms/references to refer from an
...
auto-annotated document to a 'pseudo-linked' document which has an url
created with an object-prefix as defined in the vocabulary file
13 years ago
Michael Peter Christen
8b53771db2
changed behavior of navigation processing:
...
- vocabulary annotation is not done any more into the metadata of urldb
- vocabularies are written into the jena triplestore using a rdf
vocabulary
- vocabularies for rdf tripel must be updated; refactoring done
- with the new navigation tags in the triplestore a faster
pre-urldb-lookup is possible: navigation is processed now within the RWI
during pre-ranking retrieval
- added also a Owl vocabulary stub to add the plain-text url to the
triplestore using the owl:sameas predicate
13 years ago
Michael Peter Christen
bef823c247
close the reader if finished
13 years ago
cominch
9cbfc1a1c0
augmentedProxy, which forwards every proxy request to a
...
rewrite engine to customize existing webpages. originally implemented by
Florian Richter.
Conflicts:
source/de/anomic/http/server/HTTPDProxyHandler.java
13 years ago
Michael Peter Christen
3b992e6b00
using utf8 String compression in Webstructure database
13 years ago
Michael Peter Christen
2280a7b276
- changed initialization order to prefer allocation of memory for table
...
files first
- bugfixes in memory amount calculation
13 years ago
Michael Peter Christen
0746308bc2
only the metadata tables shall be able to use the tail cache
13 years ago
Michael Peter Christen
7ec9bef0c3
fix for OOM
13 years ago
Michael Peter Christen
41c02cb10e
- less restrictions for usage of Table RAM copy
...
- new limit to use the table copy (instead of flag): 400MB available. If
less is available, then a copy is never used. If more is available, then
it can be used if there is a remaining space of at least 200MB
- flush caches more often: flush the Digest cache
13 years ago
Michael Peter Christen
b8f56a9803
npe bugfix
13 years ago
Michael Peter Christen
ba10caf89a
lazy initialization of database tables
13 years ago
Michael Peter Christen
701b9a28a0
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
htroot/PerformanceMemory_p.java
13 years ago
Michael Peter Christen
10c9c17d51
fixed handlemap spread factor and null iterator handling
13 years ago
Michael Peter Christen
b0095c8d3c
flush the compressor cache when a cleanup is done
13 years ago
Michael Peter Christen
96e9d77270
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java
13 years ago
Michael Peter Christen
00f2df1120
a variety of possible memory leak fixes
13 years ago
Michael Peter Christen
3dd8376825
added automatic cleaning of cache if metadata and file database size is
...
not equal. It might happen that these data is different because one of
that caches is cleaned after a while or when it is too big. The metadata
is then not cleaned, but now wiped after a checkup process at every
application start. This should cause a bit less memory usage.
13 years ago
Michael Peter Christen
6bb07afcc3
accept also files with other file prefix; used to read 'foreign' cache
...
files
13 years ago
Michael Peter Christen
461a0ce052
removed warnings
13 years ago
Michael Peter Christen
407fdf6968
more bug fixes and performance hacks for search process
13 years ago
Michael Peter Christen
a1fe65b115
performance hacks
13 years ago
Michael Peter Christen
e0d8643226
- performance hacks
...
- added log warnings in case that search processes run into time-out
situations
- better concurrency for Integer formatter (used a non-synchronized
formatter before)
- bugfix for search termination (a poison pill was missing)
- added timeout parameters for search (again) -> target is, that they
are never reached.
13 years ago
Michael Peter Christen
9b4c699526
ehanced location search:
...
- search request are now made using a map boundary
- search results are only computed for the map boundary
- the number of results is adopted to the results in the visible range
- added a double-buffering for the search result markers
- added a search query option for the search results:
/radius/<lat>/<lon>/<radius>
13 years ago
Michael Peter Christen
1f48d1528b
performance hacks
13 years ago
Michael Peter Christen
10da7335ea
performance hack: use a hash cache for all hashes that are computed by a
...
byte array. If this hash is used in a HashMap (which is very often the
case) then this hack eliminates a lot of re-computations of the same
hash.
13 years ago
Michael Peter Christen
7c1feefb28
introduced a default 10 second time-out in rwi normalization time
...
uring search process to prevent endless deadlocks after a very long
running search
13 years ago
Michael Peter Christen
8d997d55b6
better logging
13 years ago
Michael Peter Christen
43c2c6e588
better logging
13 years ago
Michael Peter Christen
c15fcde1c8
add-on to latest commit
13 years ago
Michael Peter Christen
cf47d94888
performance hack to parse numbers inside of substrings without actually
...
generating a substring. This avoids the allocation of a String object
ech time a substring is parsed. Should affect CPU load during RWI
transmission.
13 years ago
Michael Peter Christen
7e0ddbd275
added a "fromCache" flag in Response object to omit one cache.has()
...
check during snippet generation. This should cause less blockings
13 years ago
Michael Peter Christen
c6a09eab0b
synchronization needed
13 years ago
reger
6696cb1313
bugfix: lookup of peernames no result for active peer in page IndexControlRWIs_p.html -> Transfer RWI to other Peer
...
SeedDB.lookupByName searche for lowercase peerNames, while MapColumnIndex.getIndex uses peername as is in the keyset.
Changed the index init to insert lowercase peer names as key
13 years ago
Michael Peter Christen
f294f2e295
bugfix to http://bugs.yacy.net/view.php?id=181
...
tried to make a bit less 'noise' to dns server
also included: less processes in snippet fetch to reduce load during
search on small computers
13 years ago
Michael Peter Christen
acf8d521a2
fix for http://bugs.yacy.net/view.php?id=126
13 years ago
Michael Peter Christen
fa735f4f04
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
3e1bc9477f
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen
6f8a2fef1f
small speed enhancement using a column factory
13 years ago
Roland 'Quix0r' Haeder
d10627d591
More sync in close() methods
...
Conflicts:
source/net/yacy/kelondro/logging/GuiHandler.java
source/net/yacy/kelondro/workflow/InstantBusyThread.java
13 years ago
Roland 'Quix0r' Haeder
fbb946f913
Made a method static (Eclipse suggested it), removed unused import, pk=null check does now output a warning in logfile
13 years ago
Michael Peter Christen
89142d1e8d
removed (not all) warnings
13 years ago
Michael Peter Christen
15db703808
added missing serialization to remove all warnings
13 years ago
Michael Peter Christen
1795a7325b
made HandleSet serializable
13 years ago
Roland 'Quix0r' Haeder
a093ccf5eb
Now used synchronization in all close() methods to make sure all objects
...
are 'closed' in an ordered way
Conflicts:
source/de/anomic/http/server/ChunkedInputStream.java
source/de/anomic/http/server/ChunkedOutputStream.java
source/de/anomic/http/server/ContentLengthInputStream.java
source/net/yacy/cora/protocol/Domains.java
source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java
source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java
source/net/yacy/document/content/dao/PhpBB3Dao.java
source/net/yacy/document/parser/html/AbstractTransformer.java
source/net/yacy/kelondro/blob/BEncodedHeap.java
source/net/yacy/kelondro/blob/HeapReader.java
source/net/yacy/kelondro/index/RAMIndexCluster.java
source/net/yacy/kelondro/io/ByteCountInputStream.java
source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java
source/net/yacy/kelondro/table/SQLTable.java
13 years ago
Michael Peter Christen
0cf3d36eae
more tolerance in case of corrupted file
13 years ago
Michael Peter Christen
34f4225d7e
less 'wellformed' calls without asserts
13 years ago
Michael Peter Christen
ba6aaabc51
refactoring + parser bugfixes
13 years ago
Michael Christen
e32055aa15
added stub classes for
...
- a new database for url reference data ('seen links')
- a new database extending the references to the full url metadata
attributes set which shall replace the old metadata database if it is
finished
- migration help classes stub to use old and new metadata databases
simultanously
13 years ago
Michael Peter Christen
2fc8ecee36
ConcurrentLinkedQueue has a VERY long return time on the .size() method.
...
See
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html
and the following test programm:
public class QueueLengthTimeTest {
public static long countTest(Queue<Integer> q, int c) {
long t = System.currentTimeMillis();
for (int i = 0; i < c; i++) {
q.add(q.size());
}
return System.currentTimeMillis() - t;
}
public static void main(String[] args) {
int c = 1;
for (int i = 0; i < 100; i++) {
Runtime.getRuntime().gc();
long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c);
Runtime.getRuntime().gc();
long t2 = countTest(new LinkedBlockingQueue<Integer>(), c);
Runtime.getRuntime().gc();
long t3 = countTest(new ConcurrentLinkedQueue<Integer>(),
c);
System.out.println("count = " + c + ": ArrayBlockingQueue =
" + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = "
+ t3);
c = c * 2;
}
}
}
13 years ago
Michael Peter Christen
213c8d97f2
use less proccesses in process pool
13 years ago
Michael Peter Christen
c639248c23
protection against strange answers from remote peers during search
13 years ago
Michael Peter Christen
1cd711d005
added classes for citation references (for new citation ranking)
13 years ago
Michael Peter Christen
e0f1e7d904
added new citation reference data structure that shall be used for a
...
citation ranking
13 years ago
Michael Peter Christen
e18a4f6b74
more tolerant merge iterator
13 years ago
Michael Peter Christen
7e4e3fe5b6
free some memory after parsing html
13 years ago
Michael Peter Christen
4540174fe0
memory hacks
13 years ago
Michael Peter Christen
b4409cc803
small redesign of blob column index and usage
13 years ago
Michael Peter Christen
d5c1f2746e
performance hack
13 years ago
Michael Peter Christen
803963aebd
performance hack: better space grow in CharBuffer (speeds up html
...
parser)
13 years ago
Michael Peter Christen
e2f8f263e8
changed storage of search words: keep order
13 years ago
Michael Peter Christen
0b67a0a5d8
added a column index for tables in blob files. This is heavily used
...
during receiving of DHT submissions and when answering remote search
requests. Both events together may have caused IO-deadlocking and this
commit shall fix that.
13 years ago
Michael Peter Christen
e3bb73c3d6
serialized some database access methods
13 years ago
Michael Peter Christen
2ea585d616
fix for host navigator
13 years ago
Michael Peter Christen
ef78f22ee1
performance hack
13 years ago
Michael Peter Christen
a02fdf8625
better error messages
13 years ago
Michael Peter Christen
c6ba44468e
timeout = 5000 instead 3000
13 years ago
low012
8776b84c10
*) small fix to make password change function of reconfigureYACY.sh work
...
again
13 years ago
Michael Peter Christen
4901cee3cc
suppress auto-tagged subject entries when sending out or receiving
...
metadata from other peers
13 years ago
sixcooler
985b78cf89
correct 'avaiable()' to use max of young / eden
13 years ago
sixcooler
4da8746275
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
sixcooler
c9aaa9e00a
respect non-reserved Memory in GenerationMemoryStrategy
...
and enable it again
13 years ago
Michael Peter Christen
37f2d1b3e9
replaced Thread initialization with ExecutorService pool for delete
...
method. This is much faster and produces less blocking when using the
Compressor class which is used by the HTCache. I.e. picture search is
much faster now.
13 years ago
Michael Peter Christen
0d6176804b
emergency disabling of GenerationMemoryStrategy because of non-working
...
available-method
13 years ago
Michael Peter Christen
87f0210480
enriched log output to find NPE in HeapReader
13 years ago
Michael Peter Christen
254adea51c
small fixes
13 years ago
Michael Peter Christen
49be60a7c8
WorkflowProcess is forced to make small pauses if shortMemoryStatus is
...
reached.
13 years ago
Michael Peter Christen
b7bb84c0bb
set a limit to CharBuffer object size to fight against bad/too large
...
content
13 years ago
Marek Otahal
72adbeae90
!Important: move from Hashtable to HashMap
...
Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better
functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue,
but I found notices that some (ugly big) helper classes had to be created in past
to compensate missing Hashtable's functionality. I'd like input if we can remove some of them.
look for //FIX: if these commits
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Marek Otahal
f75b5e40e0
little fix in copy()
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
13 years ago
Michael Christen
216a287a85
Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
...
Conflicts:
source/de/anomic/crawler/CrawlQueues.java
13 years ago
Michael Christen
20962a4ed7
added metadata node stub for metadata from blobs
13 years ago
Michael Christen
575dbbaa93
enhancements in Blob retrieval: try to use less CPU resources by testing
...
a blog first that most certainly has wanted entries.
13 years ago
Roland 'Quix0r' Haeder
6d4e08ed06
Rewrote filesize() to (hopefully) avoid a NPE, rewrote Blacklist class to concurrent classes to avoid a CME
13 years ago
Roland 'Quix0r' Haeder
fa08ed5ae5
Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check
13 years ago
Michael Christen
9e5894c784
Removed handling of components objects for URIMetadataRows.
...
This is a preparation to replace this rows with nodes from the node
store.
13 years ago
Michael Christen
c04bfaa51b
refactoring
13 years ago
Michael Peter Christen
613ab6a69d
added BEncodedHeapBag and BEncodedHeapShard which are storage container
...
for a new metadata store. An abstraction of the content for this storage
is defined with MapStore. A MapStore is an abstraction of a RDF Node
store.
13 years ago
Michael Christen
6fecd0db88
one more performance hack to prevent costly md5 computation
13 years ago
Michael Christen
e13441b069
better digest pool size (smaller by default but unlimited)
13 years ago
Michael Christen
1f4afb4dc0
performance hacks
13 years ago
Michael Christen
e9dc99fe15
added rules to set specific RWIs as private RWIs which are not
...
transmitted to remote peers. This will be used for private index copies
and phonetic indexes.
13 years ago
Michael Peter Christen
0bcef2d156
added feature as requested in
...
http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461
The search can now be configured with a non-display host list.
the search will always exlude the given list of host unless they are
requested directly using the host navigation
13 years ago
Michael Christen
204c29f010
small bugfixes for search result display and cache display
13 years ago
Michael Christen
078fcde0dd
bad initialization
13 years ago
Michael Christen
14e45e90fd
patch for a bug that I don't understand by now.
13 years ago
Michael Christen
86b3385847
fixed a deadlock during secondary remote search
13 years ago
Michael Christen
404758698a
less io operations
13 years ago
Michael Christen
044f83feed
added some pauses into the search process which shall produce
...
better-ranked search results. without that pauses the result page will
only contain links from the peer that answers first which is not a good
average picture of all the peers that provided results
13 years ago
sixcooler
448656087a
probably fix for http://bugs.yacy.net/view.php?id=94
...
(don't know how to force this exception)
13 years ago
Michael Christen
d35bdc2df6
removed npe
13 years ago
Michael Christen
e7e429705a
- less automatic indexing after a search (needs to reset the default
...
crawl profiles)
- fix for concurrency problem in storage of serverSwitch Properties
- markup update
13 years ago