Michael Peter Christen
0716a24737
added more / all new crawl profile fields into crawl profile editor
12 years ago
Michael Peter Christen
4a14122ba7
in case that a crawl profile has a collection assigned, use the
...
collection to show a name in the web interface. This should prevent that
much too long names make the interface unusable.
12 years ago
Michael Peter Christen
0fe8be7981
enhaced data structures for balancer and latency computation which
...
should produce a bit better prognosis about forced waiting times.
12 years ago
Michael Peter Christen
ac9540dfb6
removed options for stopwords which are not used
12 years ago
Michael Peter Christen
ce3fed8882
added the Google Search Appliance (GSA) api interface to the main menu.
...
See:
https://developers.google.com/search-appliance/documentation/68/xml_reference#request_overview
12 years ago
Michael Peter Christen
b2ffd49817
less latency
12 years ago
Michael Peter Christen
0833937c1c
better balancing and duetime-cumputation also for no-delay intranet
...
hosts
12 years ago
Michael Peter Christen
c326aa8f67
disabled writing new entries to crawl stacks to prevent that a domain
...
with many documents block refreshing of the crawl queue
12 years ago
Michael Peter Christen
6905182d41
- fix for number of words log message
...
- adding meta:refresh also to crawler stack
12 years ago
Michael Peter Christen
c25d7bcb80
- added concurrency for robots.txt loading
...
- changed data model for domain counter
12 years ago
Michael Peter Christen
a94c537afc
fixed getSize() which can use the cache size while the crawl is running
12 years ago
Michael Peter Christen
96912c9471
enhancement to solr caching: consider that during a get() the document
...
is not in solr but the cache points out that a commit is needed to get
the document.
12 years ago
Michael Peter Christen
a87811bc38
more auto-commit calls when a search interface is opened, but not when a
...
search is done there to prevent blocking during search-time.
12 years ago
Michael Peter Christen
3d3d654e88
if a network configuration is choosed which does not allow DHT and no
...
P2P communication is in robinson mode) then some menu entries are
disabled which have no use in this mode.
12 years ago
Michael Peter Christen
2d9e577ad0
replaced the custom robots.txt loader by the standard http loader
12 years ago
Michael Peter Christen
799d71bc67
enhanced solr caching:
...
- increased cache size which is needed for longer solr commit time
- speed hacks on cache write code
12 years ago
Michael Peter Christen
a33e2742cb
- removed unnecessary synchronized and deadlock in crawler
...
- removed problem with monitoring object on Balancer.wait
- added missing user agent settings
12 years ago
orbiter
8952153ecf
update to Balancer algorithm:
...
- create a load list from the current list of known hosts
- do not create this list for each Balancer.pop access
- create the list from those hosts which have a zero-waiting time
- select 1/3 from that list which have the most urls waiting
- get hosts from the wainting list in random order
- fixes for some delta-time computations
- always load all urls from hosts which have never been loaded before
12 years ago
orbiter
354f0d9acd
moved static method from ClusteredScoreMap to MapDataMining because it
...
was not used in the ClusteredScoreMap class but only in MapDataMining
12 years ago
Michael Peter Christen
8e1248ffe3
force a commit in advance of a search for the administrator to get most
...
recent results even if commit time is high and an indexing is ongoing.
12 years ago
Michael Peter Christen
3b48c78190
added an option to force a commit to solr.
...
may be used by a search front-end in case that the commitWithinMs time
is too short to get recently indexed documents.
12 years ago
sixcooler
2d972f289a
rise commitWithinMs to default-value from SwitchBoard
...
(result in lower hd-io)
no dots in memory-graph (there are to much of them)
12 years ago
orbiter
8fde1dd3b6
another performance and memory hack to graphics: this makes it possible
...
to produce a 100-Megapixel png network graphic image on my 6 year old
laptop in standard configuration in 10 seconds.
12 years ago
Michael Peter Christen
1baf498d59
- show more lines in online log
...
- reverse order is default now
12 years ago
Michael Peter Christen
55bdafbaf1
more image processing hacks
12 years ago
Michael Peter Christen
f2d0418218
because the new PngEncoder had a problem with the PixelGrabber which is
...
caused by a JRE bug, the PixelGrabber had to be circumvented using an
own frame buffer which can be read without a PixelGrabber. This resulted
in ultra-fast and much less memory-consuming transformation. YaCy images
are now generated really fast!
12 years ago
Michael Peter Christen
d5d64019e5
- added a method for the RasterPlotter to draw arrow endings to lines
...
- replaced the dot in the NetworkGraph with arrows
- enhanced the image drawing speed using pre-computed color values
- added more attention for OOM cases during very large image painting
12 years ago
Michael Peter Christen
342543a6c4
fix for host browser
12 years ago
Michael Peter Christen
85ca07b90e
when a new crawl is started, an equal crawl, if still running, is
...
terminated and the corresponding crawl profile is deleted (this also
clears the crawl queue entries for that crawl profile)
12 years ago
Michael Peter Christen
906e51214a
the web structure image shows the pivot dot in a different color
12 years ago
Michael Peter Christen
b3ffcde0c7
- prepared PngEncoder for concurrency: PixelGrabber.grabPixels is the
...
main time-consuming process. This shall be done in concurrency.
- added concurrent processes to call the PixelGrabber and framework to
do that (queues)
It is now possible to create 4k-Images (3840x2160) i.e. with the Network
Graphics servlet
12 years ago
Michael Peter Christen
e9c6f4ce2e
- new order of data computation: first compute the size of
...
compressed deflater output, then assign an exact-sized byte[] which
makes resizing afterwards superfluous
- after all enhancements all class objects were removed; result is just
one short static method
- made objects final where possible
12 years ago
orbiter
c6a1b21399
added a 9-year old png encoder from David Eisenberg which I rewrote
...
quite a bit to remove all code that handles transparency. With this
highly specialized png writer it is possible to write png images much
faster that with the JRE built-in png writer.
In a second step it can be possible to add concurrency to increase
computation speed further.
12 years ago
orbiter
276dd6452b
removed warnings
12 years ago
orbiter
59bf4677b6
added option to view the complete directory structure in host browser
12 years ago
Michael Peter Christen
b991685782
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
12 years ago
Michael Peter Christen
7602fce0b9
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen
ea11a1efea
fix for highlighting in gsa search
12 years ago
Michael Peter Christen
9eaede50e7
enhanced web structure images
12 years ago
Michael Peter Christen
b7ac1da6a3
gsa results shall have only one title in metadata and that should be the
...
visible title in the <title>-tag
12 years ago
sixcooler
206e7bcf94
whitelist yacyportalsearch aka search.yacy.net
12 years ago
Michael Peter Christen
ae6feb5610
showing the web structure graph as animation in the crawl monitor
12 years ago
reger
87aab9aa7c
- fix: with augmented parsing = on; missing metadata in index (like title) due to overwriting metadata by adding multiple result docs from augmentparser with same url
...
- fix Document.addsubdocuments: sections might be initialized as Arrays.toList which does not provide the used .addAll methode
see e.g. http://kamleshkr.wordpress.com/2010/02/17/inside-java-arrays-aslistt-a/
12 years ago
Michael Peter Christen
39317a6c66
enhanced webstructure image: introduced
...
- multiple hosts can be listed (comma-separated) as host argument
- new 'bf'-attribut (branch factor): the maximum number of edges per
node
- the bf-value is computed automatically
- ordering of nodes when the graphic is drawed: mostly the drawing ends
with an limitation eg. number of nodes. When this happens, it should be
ensured that more 'interesting' nodes are painted in advance. This is
now done by sorting all nodes by the number of links they have in de
distant sub-graph.
12 years ago
sixcooler
47ae7e322e
smaller dhtDispatcher.cloudSize
...
@Orbiter: we talked about this times ago - please revert if I'm wrong
12 years ago
sixcooler
57ddd63888
not hold a expensive cache of references for DHT-out,but but load them
...
on demand
see: http://forum.yacy-websuche.de/viewtopic.php?f=8&t=4530
12 years ago
reger
1dc6482feb
format crawler timeout output string in seconds (was days)
12 years ago
Michael Peter Christen
ef937af35d
more custom field usage in gsa search result
12 years ago
Michael Peter Christen
ea27d2e5f6
fixed more getSolrFieldName usages
12 years ago
Michael Peter Christen
ce0e5b1e17
- more refactoring / private methods
...
- fix for usage of custom solr field names
12 years ago