bhoerdzn
c27fabc88e
fixed wrong parameter check
11 years ago
bhoerdzn
2214bf5396
Remove some post parameters, if they are set to default values, as their values are already set by YaCy. Added some documentation.
11 years ago
bhoerdzn
405878182f
Use list template for all other option lists. Fixed some template expressions.
11 years ago
bhoerdzn
8e74098cd4
Use list template for "reloadIfOlderNumber".
11 years ago
bhoerdzn
52bad7b908
Dynamic toggling of form fields, based on passed in and selected values. This will also cut down the post string by disabling not needed fields.
11 years ago
bhoerdzn
45cf553bc3
try to guess default crawling mode, if none set
11 years ago
bhoerdzn
b4f0c822f2
assign strings before checking contents
11 years ago
bhoerdzn
499abe8f91
set default values for string parameters
11 years ago
Jens Bertram
85316b3ac6
Merge branch 'master' into crawlexpert-post
11 years ago
bhoerdzn
42ea56eaad
made crawStartExpert_p aware of post variables; extended template where needed
11 years ago
orbiter
ba3c173077
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger
fd119deb00
fix NPE on modified since check ( Response.requestHeader allowed to be null)
11 years ago
orbiter
a3b5d84c81
Merge remote-tracking branch 'origin/master'
...
Conflicts:
.classpath
11 years ago
orbiter
adfae074cf
added classpath for debugging
11 years ago
Michael Peter Christen
b28d43decc
added two more fields source_cr_host_norm_i,target_cr_host_norm_i in
...
webgraph and an addition to postprocessing to copy all cr ranking
attributes to the link edges associated to the postprocessing documents
11 years ago
Michael Peter Christen
a52f3a597e
fix for canonical-from-http-header feature
11 years ago
Michael Peter Christen
2dd7c5be44
added parsing of http-canonical tags (untested, could not find an
...
example page)
11 years ago
Michael Peter Christen
4476dea5ba
do not fail if a wrong boost key is used; instead, print only a warning
...
See also: http://bugs.yacy.net/view.php?id=293
11 years ago
Michael Peter Christen
3bf0104199
fix for crawl domain counter limitation (limit was reached too early)
11 years ago
Michael Peter Christen
82bfd9e00a
- crawl profiles shall be deleted from active and passive stacks if they
...
are deleted to terminate the crawl because otherwise the crawl will go
on after the load-from-passive stack policy.
- better check if a crawl is terminated using the loader queue.
11 years ago
Michael Peter Christen
1b3d26dd23
hack to remove most of the warning: deprecated messages (but not all,
...
one is left)
11 years ago
Michael Peter Christen
a496313248
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
sixcooler
3c48fc65fd
reverted RemoteInstance to deprecated methods of httpClient-4.2
...
this should work with current remote-Solr-Instances
11 years ago
Michael Peter Christen
91a875dff5
self-healing of mistakenly deactivated crawl profiles. This fixes a bug
...
which can happen in rare cases when a crawl start and a cleanup process
happen at the same time.
11 years ago
Michael Peter Christen
095053a9b4
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
sixcooler
0cae420d8e
some dns-timing changes:
...
since httpclient uses the domain-cache it is useful not to clean the
domain cache until crawling is running (domains are filled into this
cache)
On huge crawl-starts (eg. from file) my DNS did not follow the high
rates - so I reduced the rate and give some more time(-out)
11 years ago
sixcooler
15b1bb2513
bump to httpClient-4.3
11 years ago
Michael Peter Christen
4f83d5f18c
added the new field harvestkey_s to the collection index and the
...
webgraph index which is temporary filled with the crawl profile key.
This is used to select a set of documents for post-processing as soon as
a crawl is finished. Now the postprocessing for a specific crawl is
started when that specific crawl is finished and not at the end of all
post-processing steps.
11 years ago
orbiter
14442efa6d
when profiles are cleaned, there shall be first a callback showing which
...
profiles are cleaned. This shall enable a profile-termination-driven
postprocessing. To do this, index writings must carry the profile key
which will be implemented in another (next) step.
11 years ago
orbiter
0013d0d0bb
removed superfluous class
11 years ago
orbiter
f90d5296cb
Added new data structure to be used by the balancer (not used yet).
...
These data structures will enable the balancer to store the crawl queue
into individual queues, one each for a single host.
11 years ago
orbiter
0e8d752462
refactoring
11 years ago
orbiter
8ac2e8c8c9
added location navigator which causes that the image to the map search
...
is visible whenever a location is available in the search result.
To activate this, the search.navigation property in yacy.conf must be
modified to the new default values.
11 years ago
orbiter
d86d2be5c3
automatically removed Places autotagging if no location library is
...
wanted
11 years ago
orbiter
214a087cdf
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen
96ed0c980e
- added hosthash to all documents (also fail documents which is needed
...
there for deletion), this fixes a problem for the deletion of old
documents for new crawl starts
- added clickdepth and citation computation for fail documents
11 years ago
Michael Peter Christen
179ad281f9
close include byte buffer after usage
11 years ago
reger
6b9a624808
remove double declaration of TLD_any_zone_filter
11 years ago
orbiter
d2effd21db
fix for npe during location search
11 years ago
orbiter
828603e4f1
fix for 100%CPU problem in error cache cleaning process
11 years ago
orbiter
c64b51134e
hack to add all tokens from the url to text_t. This was working for the
...
RWI index (and still is working) but not for solr-only search indexes.
Maybe we should find a solution using a separate search field instead.
11 years ago
orbiter
6e8377b8ad
do not check all words with synonym library if the library is empty
11 years ago
orbiter
70ba74b23a
disabled ipv4 preference to enable ipv6-only networks like freifunk
11 years ago
orbiter
f3be1930cb
CPU problem when pusing to the error cache; wrong class,
...
ConcurrentHashMap needed for concurrency
11 years ago
Michael Peter Christen
e40671ddb7
better and consistent deletions for error urls
11 years ago
Michael Peter Christen
2602be8d1e
- removed ZURL data structure; removed also the ZURL data file
...
- replaced load failure logging by information which is stored in Solr
- fixed a bug with crawling of feeds: added must-match pattern
application to feed urls to filter out such urls which shall not be in a
wanted domain
- delegatedURLs, which also used ZURLs are now temporary objects in
memory
11 years ago
Michael Peter Christen
31920385f7
set anchor rel attribute of all links to "nofollow" if the html meta
...
contains a robots:nofollow or if the http header contains a
"X-Robots-Tag: nofollow"
11 years ago
Michael Peter Christen
57e00baf26
fix for parsing of image links inside of anchor links (image-links)
11 years ago
Michael Peter Christen
61c5e40687
- replaced the properties object in AnchorURL with distinct variables
...
for anchor attributes.
- this caused that large portions of the parser code had to be adopted
as well
- added a counter target_order_i for anchor links in webgraph
computation
11 years ago
Michael Peter Christen
3ea9bb4427
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago